WO2020105423A1 - 情報処理装置および方法、並びにプログラム - Google Patents
情報処理装置および方法、並びにプログラムInfo
- Publication number
- WO2020105423A1 WO2020105423A1 PCT/JP2019/043360 JP2019043360W WO2020105423A1 WO 2020105423 A1 WO2020105423 A1 WO 2020105423A1 JP 2019043360 W JP2019043360 W JP 2019043360W WO 2020105423 A1 WO2020105423 A1 WO 2020105423A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- objects
- pass
- data
- information processing
- priority
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 230000010365 information processing Effects 0.000 title claims abstract description 39
- 238000009877 rendering Methods 0.000 claims description 99
- 230000008569 process Effects 0.000 claims description 37
- 238000003672 processing method Methods 0.000 claims description 3
- 230000005236 sound signal Effects 0.000 description 68
- 238000005516 engineering process Methods 0.000 description 27
- 101100365087 Arabidopsis thaliana SCRA gene Proteins 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present technology relates to an information processing device and method, and a program, and particularly to an information processing device and method, and a program that can reduce the total number of objects while suppressing the influence on sound quality.
- the MPEG (Moving Picture Experts Group) -H3D Audio standard is known (for example, refer to Non-Patent Document 1 and Non-Patent Document 2).
- 3D Audio which is handled by the MPEG-H3D Audio standard, etc., it is possible to reproduce the three-dimensional direction, distance, spread, etc. of sound, enabling more realistic audio playback than conventional stereo playback. Become.
- the present technology has been made in view of such a situation, and it is possible to reduce the total number of objects while suppressing the influence on the sound quality.
- An information processing device is to obtain data of L objects, and from the L objects, a pass-through object selection unit that selects M pass-through objects that output the data as they are.
- An object generation unit that generates the data of N new objects less than (LM) based on the data of a plurality of non-pass-through objects that are not the pass-through objects among the L objects. Equipped with.
- An information processing method or program obtains data of L objects, selects M pass-through objects that output the data as is from the L objects, Generating less than (LM) N new objects of the data based on the data of a plurality of non-passthrough objects of the objects that are not the passthrough objects.
- data of L objects is acquired, from the L objects, M pass-through objects that directly output the data are selected, and the L objects are Based on the data of a plurality of non-pass-through objects that are not the pass-through objects, the data of N new objects, which is less than (LM), is generated.
- FIG. 19 is a diagram illustrating a configuration example of a computer.
- the object may be any object as long as it has object data such as an audio object or an image object.
- the object data referred to here is, for example, the object signal and metadata of the object.
- the audio signal as the object signal and the metadata are the data of the audio object
- the object is an image object
- the image signal as the object signal and the meta-data are the data of the image object.
- the object is an audio object
- the object is an audio object
- the audio signal and metadata of the object are handled as the object data.
- the metadata includes, for example, position information indicating the position of the object in the three-dimensional space, priority information indicating the priority of the object, gain information of the audio signal of the object, spread information indicating the spread of the sound image of the sound of the object. Etc. are included.
- the position information of the object includes, for example, a radius indicating the distance from the reference position to the object, a horizontal angle indicating the horizontal position of the object, and a vertical angle indicating the vertical position of the object.
- the present technology is applied to, for example, a pre-rendering processing apparatus that inputs a plurality of objects that make up content, more specifically object data, and outputs an appropriate number of objects according to the input, more specifically, object data. Can be applied.
- the number of objects at the time of input is nobj_in
- the number of objects at the time of output is nobj_out.
- nobj_out ⁇ nobj_in. That is, the number of output objects is smaller than the number of input objects.
- some of the input nobj_in objects are output as they are without any change, that is, passed through.
- a pass-through object such an object that is passed through will be referred to as a pass-through object.
- the objects that are not the pass-through objects are the non-pass-through objects that are not the pass-through objects.
- the data of the non-passthrough object is used to generate the data of the new object.
- nobj_out objects which are less than the input nobj_in objects, are output, and the total number of objects is reduced.
- the number of pass-through objects will be nobj_dynamic.
- the number of pass-through objects nobj_dynamic can be set by the user or the like within a range satisfying the condition shown in the following expression (1).
- the number of pass-through objects nobj_dynamic is 0 or more and less than nobj_out.
- the number of pass-through objects nobj_dynamic can be a predetermined number or a number specified by a user's input operation or the like.
- the number nobj_dynamic of pass-through objects may be dynamically determined so as to be equal to or less than the predetermined maximum number based on the data amount (data size) of the entire content and the calculation amount of the process at the time of decoding.
- the predetermined maximum number is less than nobj_out.
- the data volume of the entire content is the total data volume (data size) of the metadata and audio signal of the pass-through object and the metadata and audio signal of the newly created object.
- the calculation amount of the decoding process to be considered when determining the number nobj_dynamic may be the calculation amount of only the decoding process of the encoded data (metadata and audio signal) of the object, or the calculation of the decoding process. It may be the sum of the amount and the calculation amount of the rendering process.
- the number of finally output objects nobj_out may be determined based on the data amount of the entire content or the calculation amount of the decryption process.
- nobj_out may be specified. Further, the number nobj_out may be predetermined.
- the index indicating the time frame of the audio signal is set to ifrm
- the index indicating the object is set to iobj.
- a time frame whose index is ifrm is also referred to as a time frame ifrm
- an object whose index is iobj is also referred to as an object iobj.
- priority information is included in the metadata of each object, and the priority information included in the metadata of the time frame ifrm of the object iobj is described as priority_raw [ifrm] [iobj]. That is, it is assumed that the metadata given in advance to the object includes the priority information priority_raw [ifrm] [iobj].
- the value of the priority information priority [ifrm] [iobj] shown in the following equation (2) is obtained for each object for each time frame.
- priority_gen [ifrm] [iobj] is priority information of the time frame ifrm of the object iobj, which is obtained based on information other than priority_raw [ifrm] [iobj].
- the priority information priority_gen [ifrm] [iobj] it is possible to use gain information, position information, spread information included in the metadata, audio signals of objects, etc. alone or in any combination. it can. Furthermore, not only the gain information, position information, spread information, and audio signal of the current time frame, but also the gain information, position information, spread information, and audio signal of the time frame immediately preceding the current time frame, such as the time frame immediately before the current time frame. May be used to calculate the priority information priority_gen [ifrm] [iobj] of the current time frame.
- priority_gen [ifrm] [iobj] As a specific method of calculating the priority information priority_gen [ifrm] [iobj], for example, the method described in International Publication No. 2018/198789 may be used.
- the reciprocal of the radius forming the position information included in the metadata can be used as priority information priority_gen [ifrm] [iobj] so that objects closer to the user have higher priorities.
- the reciprocal of the absolute value of the horizontal angle forming the position information included in the metadata is set as priority information priority_gen [ifrm] [iobj] so that the object in front of the user has higher priority. be able to.
- the moving speed of the object may be used as the priority information priority_gen [ifrm] [iobj], or the gain information itself included in the metadata may be used as the priority information. It may be priority_gen [ifrm] [iobj].
- the square value of spread information included in the metadata may be used as the priority information priority_gen [ifrm] [iobj], or the priority information priority_gen [ifrm] [iobj] may be set based on the attribute information of the object. It may be calculated.
- weight is a parameter that determines the ratio of priority information priority_raw [ifrm] [iobj] and priority information priority_gen [ifrm] [iobj] in the calculation of priority information priority [ifrm] [iobj]. Yes, and is set to 0.5, for example.
- the priority information priority_raw [ifrm] [iobj] may not be given to an object, so in such a case, the value of priority information priority_raw [ifrm] [iobj] May be set to 0 and the calculation of the equation (2) may be performed.
- the priority information priority [ifrm] [iobj] is obtained for each object by the expression (2)
- the priority information priority [ifrm] [iobj] of each object is arranged in descending order of their values for each time frame ifrm. Be sorted. Then, the upper nobj_dynamic objects having the larger values of the priority information priority [ifrm] [iobj] are selected as pass-through objects in the time frame ifrm, and the remaining objects are non-pass-through objects.
- nobj_in objects are nobj_dynamic pass-through objects and (nobj_in-nobj_dynamic) non-pass-through objects. Be sorted into
- rendering processing that is, pre-rendering processing is performed for these non-pass-through objects. This generates new (nobj_out-nobj_dynamic) object metadata and audio signals.
- rendering processing by VBAP Vector Base Amplitude Panning
- the non-passthrough object is rendered to (nobj_out-nobj_dynamic) virtual speakers.
- the virtual speaker corresponds to a new object, and the arrangement positions of these virtual speakers in the three-dimensional space are different from each other.
- index indicating a virtual speaker be spk
- virtual speaker indicated by the index spk be described as virtual speaker spk
- an audio signal in the time frame ifrm of the non-pass-through object whose index is iobj will be described as sig [ifrm] [iobj].
- VBAP is performed for each non-pass-through object iobj based on the position information included in the metadata and the position of the virtual speaker in the three-dimensional space.
- the gain gain [ifrm] [iobj] [spk] of the (nobj_out-nobj_dynamic) virtual speaker spk is obtained for each non-pass-through object iobj.
- the sum of the audio signals sig [ifrm] [iobj] multiplied by the gain gain [ifrm] [iobj] [spk] of the virtual speaker spk for each non-pass-through object iobj is obtained, and the sum thereof is obtained.
- the resulting audio signal is taken as the audio signal of the new object corresponding to that virtual speaker spk.
- the position of the virtual speaker corresponding to the new object is determined by the k-means method. That is, the position information included in the metadata of the non-pass-through object is divided into (nobj_out-nobj_dynamic) clusters for each time frame by the k-means method, and the position of the center of gravity of each cluster is the position of the virtual speaker. It is said that.
- circles without hatching represent non-pass-through objects, and these non-pass-through objects are arranged at the positions indicated by the position information included in the metadata in the three-dimensional space.
- the positions are from SP11-1 to virtual speaker SP11-5.
- the virtual speakers SP11-1 to SP11-5 are arranged at the positions of the centers of gravity of the clusters corresponding to those virtual speakers. Note that, hereinafter, the virtual speakers SP11-1 to SP11-5 will be simply referred to as virtual speakers SP11 unless it is necessary to distinguish them.
- the audio signal of the new object corresponding to the virtual speaker SP11 is obtained by the rendering process, but the position information included in the metadata of the new object is the information indicating the position of the virtual speaker SP11 corresponding to the new object. To be done.
- Information other than the position information included in the metadata of the new object is information about the metadata of non-pass-through objects included in the cluster corresponding to the new object. It is said to be the average value or maximum value of. That is, for example, the average value or the maximum value of the gain information of the non-pass-through objects belonging to the cluster is set as the gain information included in the metadata of the new object corresponding to the cluster.
- nobj_out objects that are less than the input nobj_in objects will be output, and the total number of objects can be reduced.
- the output can be the number of objects determined by operation, etc., so that the content consisting of the data of the output object is used. Etc. will be able to handle.
- an object with high priority information priority [ifrm] [iobj] is regarded as a pass-through object, and the audio signal and metadata are output as it is. Therefore, the sound quality of the sound of the content does not deteriorate for the pass-through object. ..
- non-pass-through objects new objects are generated based on those non-pass-through objects, so the influence on the sound quality of the sound of the content can be minimized.
- the sound of the content will include the sound components of all the objects.
- the non-pass-through objects are grouped (clustering) by a method other than the k-means method according to the degree of concentration of the non-pass-through objects in the three-dimensional space, and the barycentric position of each group and the non-pass-through objects belonging to the group are grouped.
- the average position of the positions may be used as the position of the virtual speaker.
- the degree of concentration of objects in the three-dimensional space indicates how concentrated (dense) the objects are arranged in the three-dimensional space.
- the number of groups at the time of grouping may be set according to the degree of concentration of non-pass-through objects so that the number becomes a predetermined number smaller than (nobj_in-nobj_dynamic).
- the k-means method it depends on the degree of concentration of the position of the non-pass-through object, the number designation operation by the user, the data amount (data size) of the entire content, and the calculation amount of the decoding process.
- the number of newly created objects may be determined so that the number is less than or equal to the predetermined maximum number. In such a case, the number of newly generated objects may be smaller than (nobj_in-nobj_dynamic), and then the condition of the above-mentioned formula (1) is satisfied.
- the position of the virtual speaker may be a fixed position determined in advance. In this case, for example, if the position of each virtual speaker is set to the position where each speaker is arranged in the speaker arrangement of 22 channels, a new object can be easily handled in the subsequent stage.
- the positions of some virtual speakers of the plurality of virtual speakers may be fixed positions that are determined in advance, and the positions of the remaining virtual speakers may be determined by the k-means method or the like.
- the sound quality of the sound of the finally obtained content may be little affected. Therefore, in such a case, the sound quality is hardly affected even if only a part of the objects that have not been made pass-through objects are made non-pass-through objects.
- pass-through objects may be selected based on the degree of concentration (density) of objects in the three-dimensional space.
- the objects are grouped based on the position information included in the metadata of each object. Then, the objects are sorted based on the result of the grouping.
- an object whose distance from any other object is a predetermined value or more can be a pass-through object, and an object whose distance from another object is less than a predetermined value can be a non-pass-through object.
- clustering is performed by the k-means method or the like based on the position information included in the metadata of each object, and when only one object belongs to a cluster, the object belonging to that cluster is a pass-through object. May be done.
- all the objects belonging to the cluster may be non-pass-through objects, and among the objects belonging to the cluster, the object with the highest priority indicated by the priority information is
- the objects may be pass-through objects and the remaining objects may be non-pass-through objects.
- the number of pass-through objects nobj_dynamic depends on the result of grouping or clustering, the data amount (data size) of the entire content, the calculation amount of decoding processing, etc. May be dynamically determined.
- the average value or linear combination value of audio signals of non-pass-through objects may be used as the audio signal of the new object.
- the method of generating a new object based on the average value or the like is particularly useful when the number of newly generated objects is one.
- a pre-rendering processing device to which the present technology described above is applied will be described.
- Such a pre-rendering processing device is configured, for example, as shown in FIG.
- the pre-rendering processing apparatus 11 shown in FIG. 2 is an information processing apparatus that inputs data of a plurality of objects and outputs data of less objects than the input, and includes a priority calculation unit 21, a pass-through object selection unit 22, and an object. It has a generation unit 23.
- data of nobj_in objects that is, metadata of objects and audio signals are supplied to the priority calculation unit 21.
- the pass-through object selection unit 22 and the object generation unit 23 are supplied with the number information indicating the number of input objects nobj_in, the number of output objects nobj_out, and the number of pass-through objects nobj_dynamic.
- the priority calculation unit 21 calculates priority information priority [ifrm] [iobj] of each object based on the supplied metadata and audio signal of the object, and priority information priority [ifrm] of each of those objects.
- the [iobj], the metadata, and the audio signal are supplied to the pass-through object selection unit 22.
- the pass-through object selection unit 22 is supplied with the object metadata, the audio signal, and the priority information priority [ifrm] [iobj] from the priority calculation unit 21 and the number information from the outside. In other words, the pass-through object selection unit 22 acquires the object data and the priority information priority [ifrm] [iobj] from the priority calculation unit 21 and also acquires the number information from the outside.
- the pass-through object selection unit 22 selects a pass-through object based on the supplied number information and the priority information priority [ifrm] [iobj] supplied from the priority calculation unit 21.
- the pass-through object selection unit 22 outputs the metadata and the audio signal of the pass-through object supplied from the priority calculation unit 21 to the subsequent stage as they are, and the metadata and the audio signal of the non-pass-through object supplied from the priority calculation unit 21. Is supplied to the object generation unit 23.
- the object generation unit 23 generates the metadata and the audio signal of the new object based on the supplied number information and the metadata and the audio signal of the non-pass-through object supplied from the pass-through object selection unit 22. Output to.
- step S11 the priority calculation unit 21 calculates priority information priority [ifrm] [iobj] of each object based on the supplied metadata and audio signal of each object in a predetermined time frame.
- the priority calculation unit 21 calculates the priority information priority_gen [ifrm] [iobj] for each object based on the metadata and the audio signal, and at the same time, the priority information priority_raw [ifrm] [iobj included in the metadata. ] And the calculated priority information priority_gen [ifrm] [iobj], the equation (2) is calculated to calculate the priority information priority [ifrm] [iobj].
- the priority calculation unit 21 supplies the priority information priority [ifrm] [iobj] of each object, the metadata, and the audio signal to the pass-through object selection unit 22.
- step S12 the pass-through object selection unit 22 selects nobj_dynamic objects from nobj_in objects based on the supplied number information and the priority information priority [ifrm] [iobj] supplied from the priority calculation unit 21. Select the pass-through object of. That is, the objects are sorted.
- the pass-through object selection unit 22 sorts the priority information priority [ifrm] [iobj] of each object, and determines the top nobj_dynamic objects having a large priority information priority [ifrm] [iobj] as pass-through objects. To choose as. In this case, of the input nobj_in objects, all the objects that are not pass-through objects are non-pass-through objects, but only some of the objects that are not pass-through objects may be non-pass-through objects.
- step S13 the pass-through object selection unit 22 outputs the metadata and audio signal of the pass-through object selected in the process of step S12 to the subsequent stage, out of the metadata and audio signal of each object supplied from the priority calculation unit 21. To do.
- the pass-through object selection unit 22 supplies the metadata and audio signals of the (nobj_in-nobj_dynamic) non-pass-through objects obtained by classifying the objects to the object generation unit 23.
- pass-through objects may be selected based on the degree of concentration of object positions.
- step S14 the object generation unit 23 determines the positions of (nobj_out-nobj_dynamic) virtual speakers based on the metadata and audio signal of the non-pass-through object supplied from the pass-through object selection unit 22 and the supplied number information. decide.
- the object generation unit 23 performs clustering of the position information of the non-pass-through objects by the k-means method, and the obtained (nobj_out-nobj_dynamic) centroid positions of the respective clusters are used as virtual speakers corresponding to those clusters. Position.
- the method of determining the position of the virtual speaker is not limited to the k-means method, and may be determined by another method, or a predetermined fixed position may be the position of the virtual speaker.
- step S15 the object generation unit 23 performs a rendering process based on the metadata and audio signal of the non-passthrough object supplied from the passthrough object selection unit 22 and the position of the virtual speaker obtained in step S14.
- the object generation unit 23 obtains the gain gain [ifrm] [iobj] [spk] of each virtual speaker by performing VBAP as a rendering process. Further, the object generation unit 23 obtains the sum of the audio signals sig [ifrm] [iobj] of the non-pass-through object multiplied by the gain gain [ifrm] [iobj] [spk] for each virtual speaker, and the obtained audio is obtained.
- the signal be the audio signal of the new object corresponding to the virtual speaker.
- the object generation unit 23 generates the metadata of the new object based on the clustering result obtained when the position of the virtual speaker is determined and the metadata of the non-pass-through object.
- Metadata and audio signals can be obtained for (nobj_out-nobj_dynamic) new objects.
- the method of generating the audio signal of the new object may be rendering processing other than VBAP.
- step S16 the object generation unit 23 outputs the (nobj_out-nobj_dynamic) new object metadata and audio signals obtained in the process of step S15 to the subsequent stage.
- the metadata and audio signals of a total of nobj_out objects are output as the metadata and audio signals of the object after the pre-rendering process.
- step S17 the pre-rendering processing apparatus 11 determines whether or not processing has been performed for all time frames.
- step S17 If it is determined in step S17 that the process has not been performed for all time frames, the process returns to step S11, and the above-described process is repeated. That is, the process is performed for the next time frame.
- each unit of the pre-rendering processing device 11 stops the processing being performed and the object output processing ends.
- the pre-rendering processing apparatus 11 classifies objects based on priority information, outputs metadata and audio signals as they are for high-priority pass-through objects, and performs rendering processing for non-pass-through objects. To generate and output the metadata and audio signal of the new object.
- Metadata and audio signals are output as they are for objects with high priority information that have a large effect on the sound quality of the content audio, and new objects are generated by rendering processing for other objects, and the effect on the sound quality is affected.
- the total number of objects is reduced while suppressing
- the priority calculation unit 21 obtains the priority information priority [ifrm] [iobj] of all time frames for the object, and the priority information priority [ifrm] [iobj] obtained for all the time frames. ] Is the sum of the object priority information priority [iobj]. Then, the priority calculation unit 21 sorts the priority information priority [iobj] of each object, and selects the upper nobj_dynamic objects having a large priority information priority [iobj] value as pass-through objects.
- objects may be sorted for each section composed of a plurality of continuous time frames. Even in such a case, the priority information of each object for each section may be obtained similarly to the priority information priority [iobj].
- the encoding device 51 shown in FIG. 4 has a pre-rendering processing unit 61 and a 3D audio encoding unit 62.
- the pre-rendering processing unit 61 corresponds to the pre-rendering processing apparatus 11 shown in FIG. 2 and has the same configuration as the pre-rendering processing apparatus 11. That is, the pre-rendering processing unit 61 has the above-mentioned priority calculation unit 21, pass-through object selection unit 22, and object generation unit 23.
- the metadata and audio signals of a plurality of objects are supplied to the pre-rendering processing unit 61.
- the pre-rendering processing unit 61 performs pre-rendering processing to reduce the total number of objects, and supplies the reduced metadata and audio signal of each object to the 3D audio encoding unit 62.
- the 3D Audio encoding unit 62 encodes the object metadata and audio signal supplied from the pre-rendering processing unit 61, and outputs the 3D Audio code string obtained as a result.
- the pre-rendering processing unit 61 is supplied with metadata and audio signals of nobj_in objects.
- the pre-rendering processing unit 61 performs the same processing as the object output processing described with reference to FIG.
- the metadata of the object and the audio signal are supplied to the 3D audio encoding unit 62.
- the total number of objects is reduced in the encoding device 51, and each reduced object is encoded. Therefore, the size (code amount) of the output 3D Audio code string can be reduced, and the calculation amount and memory amount of the encoding process can be reduced. Further, also on the decoding side of the 3D Audio code string, it is possible to reduce the calculation amount and memory amount in the 3D Audio decoding unit that decodes the 3D Audio code string and the rendering processing unit that follows it.
- the pre-rendering processing unit 61 may be arranged outside the encoding device 51, that is, in the front stage of the encoding device 51, or may be arranged in the frontmost stage inside the 3D Audio encoding unit 62. You can
- ⁇ Application example 2 of the present technology to an encoding device> ⁇ Configuration example of encoding device> Further, when the present technology is applied to an encoding device, a pre-rendering processing flag indicating whether the object is a pass-through object or a newly generated object may be included in the 3D Audio code string. ..
- the encoding device is configured as shown in FIG. 5, for example.
- FIG. 5 portions corresponding to those in FIG. 4 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
- the encoding device 91 shown in FIG. 5 has a pre-rendering processing unit 101 and a 3D audio encoding unit 62.
- the pre-rendering processing unit 101 corresponds to the pre-rendering processing apparatus 11 shown in FIG. 2 and has the same configuration as the pre-rendering processing apparatus 11. That is, the pre-rendering processing unit 101 has the above-described priority calculation unit 21, pass-through object selection unit 22, and object generation unit 23.
- the pass-through object selecting unit 22 and the object generating unit 23 generate a pre-rendering processing flag for each object and output metadata, an audio signal, and a pre-rendering processing flag for each object. ..
- the pre-rendering processing flag is flag information indicating whether it is a pass-through object or a newly generated object, that is, whether it is a pre-rendering processed object.
- the value of the pre-rendering processing flag of that object is set to 0.
- the value of the pre-rendering processing flag of the object is set to 1.
- the pre-rendering processing unit 101 performs the same processing as the object output processing described with reference to FIG. 3 to reduce the total number of objects, and also generates a pre-rendering processing flag for each object after the total number reduction. ..
- the pre-rendering processing unit 101 supplies metadata, audio signals, and a pre-rendering processing flag having a value of 0 to the 3D Audio encoding unit 62 for the nobj_dynamic pass-through objects.
- the pre-rendering processing unit 101 sends the metadata, the audio signal, and the pre-rendering processing flag whose value is 1 to the 3D Audio encoding unit 62. Supply.
- the 3D Audio encoding unit 62 encodes the metadata, audio signals, and pre-rendering processing flags of a total of nobj_out objects supplied from the pre-rendering processing unit 101, and outputs the resulting 3D Audio code string. ..
- a decoding device that performs decoding using the 3D Audio code string including the pre-rendering processing flag output from the encoding device 91 as an input is configured as illustrated in FIG. 6, for example.
- the decoding device 131 shown in FIG. 6 has a 3D audio decoding unit 141 and a rendering processing unit 142.
- the 3D Audio decoding unit 141 acquires the 3D Audio code string output from the encoding device 91 by reception and the like, decodes the acquired 3D Audio code string, and the obtained object metadata, audio signal, And a pre-rendering processing flag to the rendering processing unit 142.
- the rendering processing unit 142 performs a rendering process based on the metadata, the audio signal, and the pre-rendering processing flag supplied from the 3D Audio decoding unit 141 to generate a speaker driving signal for each speaker used for playing the content, Output.
- the speaker drive signal is a signal for reproducing the sound of each object constituting the content by the speaker.
- the decoding device 131 having such a configuration, it is possible to reduce the calculation amount and memory amount of the processing in the 3D Audio decoding unit 141 and the rendering processing unit 142 by using the pre-rendering processing flag.
- the calculation amount and memory amount at the time of decoding can be further reduced as compared with the case of the encoding device 51 shown in FIG.
- 3D Audio code string includes object metadata, audio signal, and pre-rendering processing flag.
- the metadata includes priority information and the like, but in some cases, the metadata may not include priority information.
- the priority information mentioned here is the priority information priority_raw [ifrm] [iobj] described above.
- the value of the pre-rendering processing flag is set based on the priority information priority [ifrm] [iobj] calculated by the pre-rendering processing unit 101 in the preceding stage of the 3D audio encoding unit 62. Therefore, it can be said that, for example, a pass-through object having a pre-rendering process flag value of 0 has a high priority, and a newly-generated object having a pre-rendering process flag value of 1 has a priority of It can be said to be a low object.
- the pre-rendering processing flag can be used instead of the priority information.
- the 3D Audio decoding unit 141 decodes only objects with high priority.
- the 3D Audio decoding unit 141 determines that the value of the priority information of the object is 0, and the object is included in the 3D Audio code string. The audio signal or the like being reproduced is not decoded.
- the 3D Audio decoding unit 141 determines that the value of the priority information of the object is 1, and the object is included in the 3D Audio code string. Decoding of metadata and audio signals that are being performed.
- the pre-rendering processing unit 101 of the encoding device 91 may generate the priority information of the metadata based on the pre-rendering processing flag, that is, the selection result of the pass-through object.
- the rendering processing unit 142 may perform spread processing based on the spread information included in the metadata.
- the spread process is a process of expanding the sound image of the sound of the object based on the value of the spread information included in the metadata of each object, and is used to enhance the realism.
- an object whose pre-rendering processing flag value is 1 is an object newly generated in the pre-rendering processing unit 101 of the encoding device 91, that is, an object in which a plurality of non-pass-through objects are mixed. ing. Then, the value of the spread information of such a newly generated object is one value obtained by the average value of the spread information of a plurality of non-pass-through objects.
- the spread process is performed on the object having the pre-rendering process flag value of 1, the spread process is performed on the basis of one piece of spread information that is not necessarily appropriate for the plurality of objects originally. It will be held and the sense of presence may be diminished.
- the rendering processing unit 142 performs the spread processing based on the spread information for the object whose pre-rendering processing flag value is 0, and does not perform the spread processing for the object whose pre-rendering processing flag value is 1. can do. By doing so, it is possible to prevent the sense of presence from decreasing and to reduce the amount of calculation and the amount of memory by that amount without performing unnecessary spread processing.
- the pre-rendering processing device to which the present technology is applied may be provided in a device that reproduces or edits content including a plurality of objects, a device on the decoding side, or the like.
- a device that reproduces or edits content including a plurality of objects e.g., a device on the decoding side, or the like.
- the series of processes described above can be executed by hardware or software.
- a program forming the software is installed in the computer.
- the computer includes a computer incorporated in dedicated hardware and, for example, a general-purpose personal computer capable of executing various functions by installing various programs.
- FIG. 7 is a block diagram showing a configuration example of hardware of a computer that executes the series of processes described above by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input / output interface 505 is further connected to the bus 504.
- An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
- the input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like.
- the output unit 507 includes a display, a speaker and the like.
- the recording unit 508 includes a hard disk, a non-volatile memory, or the like.
- the communication unit 509 includes a network interface or the like.
- the drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input / output interface 505 and the bus 504, and executes the program. Is processed.
- the program executed by the computer (CPU 501) can be provided by being recorded in a removable recording medium 511 such as a package medium, for example.
- the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 508 via the input / output interface 505 by mounting the removable recording medium 511 in the drive 510. Further, the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
- the program executed by the computer may be a program in which processing is performed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
- the present technology can be configured as cloud computing in which one function is shared by a plurality of devices via a network and jointly processes.
- each step described in the above-mentioned flowchart can be executed by one device or shared by a plurality of devices.
- one step includes a plurality of processes
- the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.
- present technology can also be configured as below.
- (1) Obtaining data of L objects, from among the L objects, a pass-through object selection unit that selects M pass-through objects that output the data as they are, An object generation unit that generates the data of N new objects less than (LM) based on the data of a plurality of non-pass-through objects that are not the pass-through objects of the L objects.
- An information processing device provided.
- the object generation unit generates the data of the N new objects arranged at different positions by rendering processing based on the data of the plurality of non-pass-through objects (1) or ( The information processing device according to 2).
- the information processing device determines positions of the N new objects based on position information included in the data of the plurality of non-pass-through objects.
- the information generation device determines the positions of the N new objects by a k-means method based on the position information.
- the information processing apparatus determines the positions of the N new objects by a k-means method based on the position information.
- the information processing apparatus determines the positions of the N new objects by a k-means method based on the position information.
- the positions of the N new objects are predetermined positions.
- the information processing device according to any one of (3) to (6), wherein the data is an object signal and metadata of the object.
- the information processing apparatus according to (7), wherein the object is an audio object.
- the information generation device performs VBAP as the rendering process.
- the pass-through object selection unit determines the number M of the pass-through objects based on a calculation amount of a process of decoding the data of the pass-through object and the data of the new object.
- the information processing device according to any one of claims.
- the information processing device Get the data of L objects, From the L objects, select M pass-through objects that output the data as they are, An information processing method for generating the data of N new objects, which is less than (LM), based on the data of a plurality of non-pass-through objects that are not the pass-through objects of the L objects.
- 11 pre-rendering processing device 21 priority calculation unit, 22 pass-through object selection unit, 23 object generation unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
Abstract
Description
〈本技術について〉
本技術は、複数のオブジェクトをパススルーオブジェクトと非パススルーオブジェクトに分別し、非パススルーオブジェクトに基づいて新たなオブジェクトを生成することで、音質に与える影響を抑えつつ、オブジェクトの総数を削減できるようにするものである。
続いて、以上において説明した本技術を適用したプリレンダリング処理装置について説明する。そのようなプリレンダリング処理装置は、例えば図2に示すように構成される。
次に、プリレンダリング処理装置11の動作について説明する。すなわち、以下、図3のフローチャートを参照して、プリレンダリング処理装置11によるオブジェクト出力処理について説明する。
〈符号化装置の構成例〉
ところで、以上において説明した本技術は、3D Audioの符号化を行う3D Audio符号化部を有する符号化装置に適用することが可能である。そのような符号化装置は、例えば図4に示すように構成される。
〈符号化装置の構成例〉
また、本技術を符号化装置に適用する場合、オブジェクトがパススルーオブジェクトであるか、または新たに生成されたオブジェクトであるかを示すプリレンダリング処理フラグも3D Audio符号列に含められるようにしてもよい。
また、符号化装置91から出力された、プリレンダリング処理フラグが含まれる3D Audio符号列を入力として復号を行う復号装置は、例えば図6に示すように構成される。
ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。
L個のオブジェクトのデータを取得し、前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択するパススルーオブジェクト選択部と、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成するオブジェクト生成部と
を備える情報処理装置。
(2)
前記オブジェクト生成部は、(L-M)個の前記非パススルーオブジェクトの前記データに基づいて、前記新たなオブジェクトの前記データを生成する
(1)に記載の情報処理装置。
(3)
前記オブジェクト生成部は、前記複数の前記非パススルーオブジェクトの前記データに基づいて、レンダリング処理により、互いに異なる位置に配置される前記N個の前記新たなオブジェクトの前記データを生成する
(1)または(2)に記載の情報処理装置。
(4)
前記オブジェクト生成部は、前記複数の前記非パススルーオブジェクトの前記データに含まれる位置情報に基づいて、前記N個の前記新たなオブジェクトの位置を決定する
(3)に記載の情報処理装置。
(5)
前記オブジェクト生成部は、前記位置情報に基づいてk-means手法により前記N個の前記新たなオブジェクトの位置を決定する
(4)に記載の情報処理装置。
(6)
前記N個の前記新たなオブジェクトの位置は予め定められた位置とされる
(3)に記載の情報処理装置。
(7)
前記データは、前記オブジェクトのオブジェクト信号およびメタデータである
(3)乃至(6)の何れか一項に記載の情報処理装置。
(8)
前記オブジェクトはオーディオオブジェクトである
(7)に記載の情報処理装置。
(9)
前記オブジェクト生成部は、前記レンダリング処理としてVBAPを行う
(8)に記載の情報処理装置。
(10)
前記パススルーオブジェクト選択部は、前記L個の前記オブジェクトの優先度情報に基づいて、前記M個の前記パススルーオブジェクトを選択する
(1)乃至(9)の何れか一項に記載の情報処理装置。
(11)
前記パススルーオブジェクト選択部は、前記L個の前記オブジェクトの空間内における集中度合いに基づいて、前記M個の前記パススルーオブジェクトを選択する
(1)乃至(9)の何れか一項に記載の情報処理装置。
(12)
前記パススルーオブジェクトの個数Mは、指定された個数である
(1)乃至(11)の何れか一項に記載の情報処理装置。
(13)
前記パススルーオブジェクト選択部は、前記パススルーオブジェクトの前記データおよび前記新たなオブジェクトの前記データの合計のデータサイズに基づいて、前記パススルーオブジェクトの個数Mを決定する
(1)乃至(11)の何れか一項に記載の情報処理装置。
(14)
前記パススルーオブジェクト選択部は、前記パススルーオブジェクトの前記データおよび前記新たなオブジェクトの前記データの復号時の処理の計算量に基づいて、前記パススルーオブジェクトの個数Mを決定する
(1)乃至(11)の何れか一項に記載の情報処理装置。
(15)
情報処理装置が、
L個のオブジェクトのデータを取得し、
前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択し、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成する
情報処理方法。
(16)
L個のオブジェクトのデータを取得し、
前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択し、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成する
ステップを含む処理をコンピュータに実行させるプログラム。
Claims (16)
- L個のオブジェクトのデータを取得し、前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択するパススルーオブジェクト選択部と、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成するオブジェクト生成部と
を備える情報処理装置。 - 前記オブジェクト生成部は、(L-M)個の前記非パススルーオブジェクトの前記データに基づいて、前記新たなオブジェクトの前記データを生成する
請求項1に記載の情報処理装置。 - 前記オブジェクト生成部は、前記複数の前記非パススルーオブジェクトの前記データに基づいて、レンダリング処理により、互いに異なる位置に配置される前記N個の前記新たなオブジェクトの前記データを生成する
請求項1に記載の情報処理装置。 - 前記オブジェクト生成部は、前記複数の前記非パススルーオブジェクトの前記データに含まれる位置情報に基づいて、前記N個の前記新たなオブジェクトの位置を決定する
請求項3に記載の情報処理装置。 - 前記オブジェクト生成部は、前記位置情報に基づいてk-means手法により前記N個の前記新たなオブジェクトの位置を決定する
請求項4に記載の情報処理装置。 - 前記N個の前記新たなオブジェクトの位置は予め定められた位置とされる
請求項3に記載の情報処理装置。 - 前記データは、前記オブジェクトのオブジェクト信号およびメタデータである
請求項3に記載の情報処理装置。 - 前記オブジェクトはオーディオオブジェクトである
請求項7に記載の情報処理装置。 - 前記オブジェクト生成部は、前記レンダリング処理としてVBAPを行う
請求項8に記載の情報処理装置。 - 前記パススルーオブジェクト選択部は、前記L個の前記オブジェクトの優先度情報に基づいて、前記M個の前記パススルーオブジェクトを選択する
請求項1に記載の情報処理装置。 - 前記パススルーオブジェクト選択部は、前記L個の前記オブジェクトの空間内における集中度合いに基づいて、前記M個の前記パススルーオブジェクトを選択する
請求項1に記載の情報処理装置。 - 前記パススルーオブジェクトの個数Mは、指定された個数である
請求項1に記載の情報処理装置。 - 前記パススルーオブジェクト選択部は、前記パススルーオブジェクトの前記データおよび前記新たなオブジェクトの前記データの合計のデータサイズに基づいて、前記パススルーオブジェクトの個数Mを決定する
請求項1に記載の情報処理装置。 - 前記パススルーオブジェクト選択部は、前記パススルーオブジェクトの前記データおよび前記新たなオブジェクトの前記データの復号時の処理の計算量に基づいて、前記パススルーオブジェクトの個数Mを決定する
請求項1に記載の情報処理装置。 - 情報処理装置が、
L個のオブジェクトのデータを取得し、
前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択し、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成する
情報処理方法。 - L個のオブジェクトのデータを取得し、
前記L個の前記オブジェクトのなかから、前記データをそのまま出力するM個のパススルーオブジェクトを選択し、
前記L個の前記オブジェクトのうちの前記パススルーオブジェクトではない複数の非パススルーオブジェクトの前記データに基づいて、(L-M)個よりも少ないN個の新たなオブジェクトの前記データを生成する
ステップを含む処理をコンピュータに実行させるプログラム。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/293,904 US20220020381A1 (en) | 2018-11-20 | 2019-11-06 | Information processing device and method, and program |
BR112021009306-0A BR112021009306A2 (pt) | 2018-11-20 | 2019-11-06 | dispositivo e método de processamento de informações, e, programa. |
EP19886482.9A EP3886089A4 (en) | 2019-11-06 | Information processing device and method, and program | |
CN201980075019.3A CN113016032A (zh) | 2018-11-20 | 2019-11-06 | 信息处理装置和方法以及程序 |
KR1020217013161A KR20210092728A (ko) | 2018-11-20 | 2019-11-06 | 정보 처리 장치 및 방법, 그리고 프로그램 |
JP2020558243A JP7468359B2 (ja) | 2018-11-20 | 2019-11-06 | 情報処理装置および方法、並びにプログラム |
JP2024047716A JP2024079768A (ja) | 2018-11-20 | 2024-03-25 | 情報処理装置および方法、プログラム、並びに情報処理システム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018-217180 | 2018-11-20 | ||
JP2018217180 | 2018-11-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020105423A1 true WO2020105423A1 (ja) | 2020-05-28 |
Family
ID=70773982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/043360 WO2020105423A1 (ja) | 2018-11-20 | 2019-11-06 | 情報処理装置および方法、並びにプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220020381A1 (ja) |
JP (2) | JP7468359B2 (ja) |
KR (1) | KR20210092728A (ja) |
CN (1) | CN113016032A (ja) |
BR (1) | BR112021009306A2 (ja) |
WO (1) | WO2020105423A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20240042125A (ko) * | 2017-04-26 | 2024-04-01 | 소니그룹주식회사 | 신호 처리 장치 및 방법, 및 프로그램 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120230497A1 (en) * | 2011-03-09 | 2012-09-13 | Srs Labs, Inc. | System for dynamically creating and rendering audio objects |
WO2015056383A1 (ja) * | 2013-10-17 | 2015-04-23 | パナソニック株式会社 | オーディオエンコード装置及びオーディオデコード装置 |
JP2016522911A (ja) * | 2013-05-24 | 2016-08-04 | ドルビー・インターナショナル・アーベー | オーディオ・オブジェクトを含むオーディオ・シーンの効率的な符号化 |
JP2016525699A (ja) * | 2013-05-24 | 2016-08-25 | ドルビー・インターナショナル・アーベー | オーディオ・オブジェクトを含むオーディオ・シーンの効率的な符号化 |
WO2018047667A1 (ja) * | 2016-09-12 | 2018-03-15 | ソニー株式会社 | 音声処理装置および方法 |
JP2018510532A (ja) * | 2015-02-06 | 2018-04-12 | ドルビー ラボラトリーズ ライセンシング コーポレイション | 適応オーディオ・コンテンツのためのハイブリッドの優先度に基づくレンダリング・システムおよび方法 |
WO2018198789A1 (ja) | 2017-04-26 | 2018-11-01 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5883976A (en) * | 1994-12-28 | 1999-03-16 | Canon Kabushiki Kaisha | Selectively utilizing multiple encoding methods |
JP2004093771A (ja) * | 2002-08-30 | 2004-03-25 | Sony Corp | 情報処理方法および情報処理装置、記録媒体、並びにプログラム |
CN101542597B (zh) * | 2007-02-14 | 2013-02-27 | Lg电子株式会社 | 用于编码和解码基于对象的音频信号的方法和装置 |
US9479886B2 (en) * | 2012-07-20 | 2016-10-25 | Qualcomm Incorporated | Scalable downmix design with feedback for object-based surround codec |
EP2830045A1 (en) * | 2013-07-22 | 2015-01-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Concept for audio encoding and decoding for audio channels and audio objects |
CN106162500B (zh) * | 2015-04-08 | 2020-06-16 | 杜比实验室特许公司 | 音频内容的呈现 |
EP3345409B1 (en) * | 2015-08-31 | 2021-11-17 | Dolby International AB | Method for frame-wise combined decoding and rendering of a compressed hoa signal and apparatus for frame-wise combined decoding and rendering of a compressed hoa signal |
US9913061B1 (en) * | 2016-08-29 | 2018-03-06 | The Directv Group, Inc. | Methods and systems for rendering binaural audio content |
-
2019
- 2019-11-06 JP JP2020558243A patent/JP7468359B2/ja active Active
- 2019-11-06 KR KR1020217013161A patent/KR20210092728A/ko unknown
- 2019-11-06 WO PCT/JP2019/043360 patent/WO2020105423A1/ja unknown
- 2019-11-06 CN CN201980075019.3A patent/CN113016032A/zh active Pending
- 2019-11-06 BR BR112021009306-0A patent/BR112021009306A2/pt unknown
- 2019-11-06 US US17/293,904 patent/US20220020381A1/en active Pending
-
2024
- 2024-03-25 JP JP2024047716A patent/JP2024079768A/ja active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120230497A1 (en) * | 2011-03-09 | 2012-09-13 | Srs Labs, Inc. | System for dynamically creating and rendering audio objects |
JP2016522911A (ja) * | 2013-05-24 | 2016-08-04 | ドルビー・インターナショナル・アーベー | オーディオ・オブジェクトを含むオーディオ・シーンの効率的な符号化 |
JP2016525699A (ja) * | 2013-05-24 | 2016-08-25 | ドルビー・インターナショナル・アーベー | オーディオ・オブジェクトを含むオーディオ・シーンの効率的な符号化 |
WO2015056383A1 (ja) * | 2013-10-17 | 2015-04-23 | パナソニック株式会社 | オーディオエンコード装置及びオーディオデコード装置 |
JP2018510532A (ja) * | 2015-02-06 | 2018-04-12 | ドルビー ラボラトリーズ ライセンシング コーポレイション | 適応オーディオ・コンテンツのためのハイブリッドの優先度に基づくレンダリング・システムおよび方法 |
WO2018047667A1 (ja) * | 2016-09-12 | 2018-03-15 | ソニー株式会社 | 音声処理装置および方法 |
WO2018198789A1 (ja) | 2017-04-26 | 2018-11-01 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Also Published As
Publication number | Publication date |
---|---|
JP7468359B2 (ja) | 2024-04-16 |
US20220020381A1 (en) | 2022-01-20 |
KR20210092728A (ko) | 2021-07-26 |
EP3886089A1 (en) | 2021-09-29 |
CN113016032A (zh) | 2021-06-22 |
BR112021009306A2 (pt) | 2021-08-10 |
JP2024079768A (ja) | 2024-06-11 |
JPWO2020105423A1 (ja) | 2021-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6012884B2 (ja) | 知覚的基準に基づいてオブジェクト・ベースのオーディオ・コンテンツをレンダリングするためのオブジェクト・クラスタリング | |
US9712939B2 (en) | Panning of audio objects to arbitrary speaker layouts | |
EP3145220A1 (en) | Rendering virtual audio sources using loudspeaker map deformation | |
US20200252739A1 (en) | Apparatus, method or computer program for rendering sound scenes defined by spatial audio content to a user | |
CN110537220B (zh) | 信号处理设备和方法及程序 | |
JP2024079768A (ja) | 情報処理装置および方法、プログラム、並びに情報処理システム | |
EP3622730B1 (en) | Spatializing audio data based on analysis of incoming audio data | |
JP2016534411A (ja) | マルチチャネル・オーディオのチャネルの選択的透かし入れ | |
WO2022014326A1 (ja) | 信号処理装置および方法、並びにプログラム | |
JP2004144912A (ja) | 音声情報変換方法、音声情報変換プログラム、および音声情報変換装置 | |
US20060012831A1 (en) | Electronic watermarking method and storage medium for storing electronic watermarking program | |
US20230360665A1 (en) | Method and apparatus for processing audio for scene classification | |
KR102677399B1 (ko) | 신호 처리 장치 및 방법, 그리고 프로그램 | |
US11386913B2 (en) | Audio object classification based on location metadata | |
US11962989B2 (en) | Multi-stage processing of audio signals to facilitate rendering of 3D audio via a plurality of playback devices | |
WO2021014933A1 (ja) | 信号処理装置および方法、並びにプログラム | |
KR20210066807A (ko) | 정보 처리 장치 및 방법, 그리고 프로그램 | |
CN118140492A (zh) | 信息处理装置、方法和程序 | |
KR20230157225A (ko) | 장면 분류를 위한 오디오 처리 방법 및 장치 | |
KR20230153226A (ko) | 다채널 오디오 신호 처리 장치 및 방법 | |
JP2023514121A (ja) | ビデオ情報に基づく空間オーディオ拡張 | |
WO2019027812A1 (en) | CLASSIFICATION OF AUDIO OBJECT BASED ON LOCATION METADATA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19886482 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020558243 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112021009306 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2019886482 Country of ref document: EP Effective date: 20210621 |
|
ENP | Entry into the national phase |
Ref document number: 112021009306 Country of ref document: BR Kind code of ref document: A2 Effective date: 20210513 |