WO2023074800A1 - 情報処理装置および方法、並びにプログラム - Google Patents
情報処理装置および方法、並びにプログラム Download PDFInfo
- Publication number
- WO2023074800A1 WO2023074800A1 PCT/JP2022/040170 JP2022040170W WO2023074800A1 WO 2023074800 A1 WO2023074800 A1 WO 2023074800A1 JP 2022040170 W JP2022040170 W JP 2022040170W WO 2023074800 A1 WO2023074800 A1 WO 2023074800A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- directivity
- model
- information
- modeling
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 537
- 230000010365 information processing Effects 0.000 title claims abstract description 127
- 238000004364 calculation method Methods 0.000 claims abstract description 100
- 238000009826 distribution Methods 0.000 claims description 299
- 239000000203 mixture Substances 0.000 claims description 161
- 238000012545 processing Methods 0.000 claims description 109
- 230000008569 process Effects 0.000 claims description 56
- 230000001174 ascending effect Effects 0.000 claims description 9
- 238000003672 processing method Methods 0.000 claims description 8
- 230000008707 rearrangement Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 32
- 230000005540 biological transmission Effects 0.000 abstract description 26
- 239000013598 vector Substances 0.000 description 87
- 238000010586 diagram Methods 0.000 description 25
- 230000006870 function Effects 0.000 description 22
- 239000000654 additive Substances 0.000 description 13
- 230000000996 additive effect Effects 0.000 description 13
- 230000002441 reversible effect Effects 0.000 description 13
- 238000009877 rendering Methods 0.000 description 12
- 238000002156 mixing Methods 0.000 description 10
- 238000013500 data storage Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 238000005562 fading Methods 0.000 description 5
- 230000002123 temporal effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 101100365087 Arabidopsis thaliana SCRA gene Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000001934 delay Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 2
- 238000012856 packing Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
Definitions
- the present technology relates to an information processing device, method, and program, and more particularly to an information processing device, method, and program capable of reducing the transmission amount of directional data.
- directivity data representing the directivity of sound from the object together with the audio data of the object
- the user can arbitrarily select a directivity direction during recording, and during playback, the user can select a desired directivity direction other than the directivity direction at the time of recording.
- a technique of playing back by using the sound see, for example, Patent Document 1.
- the directional characteristics differ for each sound source, when providing audio data of an object and directional data of the object as content, the directional data must be provided for each type of sound source, that is, each type of object. need to prepare.
- information about directivity is provided for more directions and frequencies, the amount of directivity data will increase.
- the amount of directional data transmitted to the content distribution destination increases, which may cause transmission delays or increase the transmission rate.
- This technology has been developed in view of such circumstances, and is capable of reducing the amount of directional data transmitted.
- An information processing device includes an acquisition unit that acquires model data obtained by modeling directivity data representing the directivity of a sound source, and based on the model data, the directivity and a calculation unit for calculating the data.
- An information processing method or program acquires model data obtained by modeling directivity data representing the directivity of a sound source, and based on the model data, the directivity data including the step of calculating
- model data obtained by modeling directivity data representing directivity of a sound source is obtained, and the directivity data is calculated based on the model data.
- An information processing apparatus includes a modeling unit that models directivity data representing the directivity of a sound source using a mixture model composed of a plurality of distributions; and a model data generation unit that generates model data including model parameters that constitute the model.
- An information processing method or program models directivity data representing the directivity of a sound source using a mixture model composed of a plurality of distributions, and constructs the mixture model obtained by the modeling. generating model data including model parameters for
- directivity data representing directivity of a sound source is modeled by a mixed model consisting of a plurality of distributions, and model parameters constituting the mixed model obtained by the modeling are used as Model data is generated containing
- FIG. 4 is a diagram showing an example of directivity; It is a figure explaining a data point.
- FIG. 4 is a diagram showing an example of model data; It is a figure explaining the relationship between a band and a bin.
- FIG. 10 is a diagram showing an example of reducing the data amount of directivity data; It is a figure explaining the residual of directivity data. It is a figure which shows the structural example of a server. 4 is a flowchart for explaining encoding processing; It is a figure which shows the structural example of an information processing apparatus.
- FIG. 4 is a flowchart for explaining directivity data generation processing; 4 is a flowchart for explaining output audio data generation processing; It is a figure explaining the appearance probability of difference information.
- FIG. 4 is a diagram showing an example of model data;
- FIG. 4 is a diagram showing an example of model data;
- FIG. 4 is a diagram explaining transmission of a Huffman coding table;
- FIG. 10 is a diagram showing an example of a Huffman coding table;
- FIG. It is a figure which shows the structural example of a server.
- 4 is a flowchart for explaining directivity data generation processing;
- FIG. 4 is a diagram showing a configuration example of a directional data encoding unit;
- FIG. 4 is a diagram illustrating a configuration example of a differential encoding unit; 4 is a flowchart for explaining model data generation processing; FIG. 10 is a diagram showing a configuration example of a distribution model decoding unit; FIG. 4 is a diagram showing an example of model data; FIG. 4 is a diagram showing an example arrangement of data points; It is a figure which shows the example of description of a data point.
- FIG. 4 is a diagram showing an example of scale factors for each bin;
- FIG. 10 is a diagram showing an example of the minimum value of each bin;
- FIG. 4 is a diagram showing an example of model data;
- FIG. 10 is a diagram showing an example of Syntax of SymmetricDir( ); It is a figure explaining rotation operation. It is a figure explaining symmetry operation.
- FIG. 10 is a diagram showing an example of Syntax of NonSymmetricDir();
- FIG. 4 is a diagram showing an example of model data;
- FIG. 10 is a diagram showing an example of weights used to calculate the output value of the mixture model for each bin;
- FIG. 4 is a diagram showing an example of model data;
- FIG. 10 is a diagram showing an example of Syntax of NonSymmetricDir();
- FIG. 10 is a diagram showing an example of Syntax of LeftRightLineSymmetricDir(); It is a figure explaining distribution according to weight. It is a figure which shows the structural example of a computer.
- the present technology is intended to reduce the transmission amount of directional data by modeling the directional data.
- 3D sound source audio data and directivity data are provided as content.
- the sound of one or more audio objects is picked up (recorded) as a 3D sound source, and audio data of each object is generated.
- directivity data representing directivity of an object (sound source), ie, directivity, is prepared for each type of object, ie, sound source type.
- audio data for each object and directivity data for each sound source type are provided as content data. That is, the directivity data is transmitted to the reproduction side device together with the audio data of the object. Then, on the reproduction side, audio reproduction is performed in consideration of the directivity data based on the audio data and the directivity data forming the content.
- Directivity data can be obtained, for example, by recording the sound of an object with multiple microphones.
- the recording of directivity data may be performed at the same time as the recording of the audio data of the object, or may be performed at a timing different from the recording of the audio data of the object.
- Directivity data is prepared for each sound source type, such as voice, musical instrument, and speaker.
- Directivity data is data containing information on the amplitude and phase of the sound from the sound source, for each target frequency in the entire frequency band from the DC frequency to the Nyquist frequency, for example, for each position in each direction viewed from the sound source. is.
- the direction seen from the sound source is represented by the horizontal angle seen from the sound source position, that is, the azimuth angle, and the vertical angle seen from the sound source position, that is, the elevation angle.
- the range of azimuth angles is set to 0 degrees to 360 degrees
- the range of elevation angles is set to -90 degrees to +90 degrees.
- the directivity data to be modeled is obtained by appropriately discretizing and normalizing the directivity data obtained by recording or the like.
- the directivity data to be modeled consists of gains (hereinafter referred to as directivity gains) that indicate the directivity characteristics of a plurality of discrete frequencies of a sound source at each of a plurality of data points.
- the position of a data point is represented by coordinates (polar coordinates) in a polar coordinate system with the sound source position as the origin.
- the distance (radius) from the sound source position may be used to represent the position of the data point.
- the directional gain can also be obtained by normalizing the sound amplitude (sound pressure) from the sound source at the data points.
- the directional data is modeled using vMF (von Mises Fisher) distribution on the sphere, Kent distribution, or vMF distribution and Kent distribution, which are equivalent to multivariate/univariate Gaussian distributions defined on the plane
- vMF von Mises Fisher
- the part indicated by arrow Q11 in Fig. 1 shows a two-dimensional Gaussian distribution.
- Curve L13 indicates a mixed Gaussian distribution obtained by mixing the Gaussian distribution indicated by curve L11 and the Gaussian distribution indicated by curve L12.
- the portion indicated by the arrow Q12 in FIG. 1 shows three distributions on the plane. Multiple distributions on such a plane can also be mixed.
- the mixed Gaussian distribution is used to express the probability density distribution (pdf (Probability Density Function)) on a plane. It is possible to reduce the amount of information by expressing the desired pdf with a small number of model parameters and as few mixtures as possible.
- pdf Probability Density Function
- a mixed model of the vMF distribution and Kent distribution which correspond to the Gaussian distribution defined on the spherical surface, is used to model the directivity data on the spherical surface, that is, the shape (distribution) of the directivity gain.
- a mixture model may be composed of one or more vMF distributions, may be composed of one or more Kent distributions, or may be composed of one or more vMF distributions and one or more Kent distributions. That is, the mixture model is composed of one or more distributions including at least one of the vMF distribution and the Kent distribution.
- x be a position vector indicating the position of the spherical surface, i.e., the coordinates of the Cartesian coordinate system.
- the distribution value f(x) can be expressed by the following equation (1).
- ⁇ indicates parameter concentration and ⁇ indicates ellipticity. Also, ⁇ 1 indicates a vector that defines the center of the mean direction distribution, ⁇ 2 indicates a major axis vector, and ⁇ 3 indicates a minor axis vector.
- c( ⁇ , ⁇ ) is a normalization constant shown in the following equation (2).
- ⁇ indicates a gamma function
- I indicates a modified Bessel function of the first kind.
- the value of the vMF distribution at the position indicated by the position vector x can also be expressed by a formula similar to formula (1).
- the value of ellipticity ⁇ in equation (1) is assumed to be zero.
- Fig. 2 shows examples of vMF distribution and Kent distribution.
- FIG. 2 an example of vMF distribution is shown in the portion indicated by arrow Q21.
- vector V11 represents vector ⁇ 1 shown in equation (1).
- the vMF distribution does not have ellipticity ⁇ , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 as parameters, and is isotropically centered on the position indicated by vector V11 (vector ⁇ 1 ) on the spherical surface. It has a circular distribution that spreads out. That is, a circular distribution can be reproduced by using the vMF distribution (vMF distribution model).
- vectors V21 through V23 represent vector ⁇ 1 , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 shown in equation (1).
- the Kent distribution is an elliptical shape centered at the position indicated by vector V21 (vector ⁇ 1 ) on the surface of the sphere, with major and minor axes ⁇ 2 and ⁇ 3 on the spherical surface. distribution. That is, by using the Kent distribution (Kent distribution model), it is possible to reproduce an elliptical distribution determined by the ellipticity ⁇ , the major axis vector ⁇ 2 , and the minor axis vector ⁇ 3 .
- the Kent distribution has a high degree of freedom because the shape of the ellipse can be changed by parameters such as the ellipticity ⁇ , but the number of parameters is greater than the vMF distribution.
- the output value F (x; (3) That is, the mixture model F(x; ⁇ ) can be represented by weighted addition of N Kent distributions f(x; ⁇ i ).
- Equation (3) the Kent distribution f(x; ⁇ i ) in Equation (3) is the same as that shown in Equation (1) above, and represents the i-th Kent distribution among the N Kent distributions to be mixed. ing.
- ⁇ i is a parameter constituting the Kent distribution f(x; ⁇ i ), more specifically, a set of parameters, and the parameter ⁇ i is the parameter concentration ⁇ , ellipticity ⁇ , vector ⁇ 1 , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 .
- a parameter ⁇ of a mixture model F(x; ⁇ ) represents a set of parameters ⁇ i of N Kent distributions f(x; ⁇ i ).
- ⁇ i represents the weight (weight coefficient) of the i-th Kent distribution f(x; ⁇ i ) when mixing N Kent distributions, and is shown in the following equation (4).
- the sum of weights ⁇ i of N Kent distributions f(x; ⁇ i ) is set to one.
- the directional data used in this technology can be obtained by recording (sound pickup) with a microphone array consisting of multiple microphones placed around the object.
- the directivity shown in Fig. 3 is observed.
- the left side of the drawing shows the directivity of each frequency on the horizontal plane, that is, the plane where the elevation angle is 0 degrees
- the right side of the drawing shows the directivity of each frequency on the median plane.
- the general shape of the directivity changes depending on the frequency (pitch), and although the directivity is small at the lower frequencies, the directivity becomes larger (sharp) as the frequency increases. I know it's going to happen. For example, on the horizontal plane, at 8000 Hz, there is a maximum sound pressure difference of about 25 dB depending on the direction.
- a plurality of data points are provided on a spherical surface centered on the sound source position.
- one dot represents one data point, and it can be seen that there are many data points over the spherical surface.
- the amount of directional data transmitted increases, but an increase in the amount of directional data transmitted causes transmission delays and increases the transmission rate. Therefore, in some cases, it may not be possible to reproduce the directivity according to the sound source type, frequency, object-to-listener orientation, and the like.
- the amount of transmission of directional data can be reduced by modeling directional data using a mixed model as described above.
- the directional data when transmitting directional data, the directional data is modeled based on a mixture model consisting of the vMF distribution and the Kent distribution. is generated. Then, the model data is transmitted to the apparatus on the content reproduction side. This eliminates the need to transmit the original directional data having a large data size. In other words, it is possible to reduce the data amount (transmission amount) at the time of transmitting directional data.
- Fig. 5 shows an example of model data for one sound source type specified by num_sound_types_id.
- model data for one sound source type is described as directivityConfig.
- the model data contains the azimuth “azimuth_table[i]” and elevation “elevation_table[i]” positions of the data points in the original directivity data before modeling, with the number indicated by the number of data points "num_point_indices”. , and the radius "distance[i]".
- the position of the data point is the azimuth “azimuth_table[i]", which is the horizontal angle of the data point seen from the sound source position, and the elevation “elevation_table[i]”, which is the vertical angle of the data point seen from the sound source position.
- the radius “distance[i]” which is the distance from the sound source position to the data point, in a polar coordinate system with the sound source position as the origin.
- the model data includes the number of frequency points "bin_count” and the frequency "freq[i_bin]".
- the entire frequency band of interest is divided into frequency bins, or bins, which are the number of frequency bands (frequencies) indicated by the number of frequency points "bin_count”, and
- the center frequency (Hz) of the i-th bin among the bins of is set as the frequency "freq[i_bin]”.
- the original directional data before modeling contains directional gains for each of one or more bins (frequency bins) at each of the plurality of data points.
- model data includes parameters related to the Kent distribution and vMF distribution: the number of bands to be modeled ⁇ band_count'', the number of mixtures in each band ⁇ mix_count[i_band]'', and the original orientation Bin information "bin_range_per_band[i_band]" of gender data is included.
- the entire frequency band of interest is divided into bands, which are the number of frequency bands indicated by the number of bands "band_count", and the distribution of directional gain for each band is represented by a mixture model. be.
- model parameters are estimated that constitute a mixture model representing the distribution of directional gain in each band.
- the frequency band indicated by each band always includes (belongs to) the frequency indicated by one or more bins, that is, the center frequency "freq[i_bin]" of the bins.
- the number of mixtures "mix_count[i_band]" indicates the number of distributions constituting the mixture model representing the distribution of the directional gain of the i-th band, that is, the number of Kent distributions and vMF distributions. Corresponds to N in 3).
- the bin information "bin_range_per_band[i_band]" of the directivity data is information indicating the bin of the original directivity data before modeling, which is included in the i-th band.
- the bin information is index information indicating the highest frequency bin belonging to the i-th band.
- model data includes the above-mentioned weight ⁇ i , parameter concentration ⁇ , and vector ⁇ 1 is included.
- 'weight[i_band][i_mix]' and 'kappa[i_band][i_mix]' are the weights ⁇ i and parameter concentrations of the distribution denoted by 'i_mix' for the i-th band denoted by 'i_band'. degree ⁇ .
- ⁇ gamma1[i_band][i_mix][x]'' and ⁇ gamma1[i_band][i_mix][y]'' represent the vector ⁇ 1 of the distribution indicated by ⁇ i_mix'' for the i-th band ⁇ i_band''.
- the constituent X component (X coordinate) and Y component (Y coordinate) are shown.
- the model data includes a selection flag "dist_flag” indicating whether the distribution indicated by "i_mix” for the i-th band "i_band” that constitutes the mixture model is the Kent distribution or the vMF distribution.
- the value "1" of the selection flag "dist_flag” indicates that the distribution is the Kent distribution, and the value “0" of the selection flag “dist_flag” indicates that the distribution is the vMF distribution.
- the model data includes the above-described ellipticity ⁇ , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 .
- beta[i_band][i_mix] indicates the ellipticity ⁇ of the distribution (Kent distribution) indicated by "i_mix” for the i-th band indicated by "i_band”. Also, "gamma2[i_band][i_mix][x]” and “gamma2[i_band][i_mix][y]” are the distribution (Kent distribution) indicated by "i_mix” for the i-th band "i_band”.
- the X component (X coordinate) and Y component (Y coordinate) that constitute the major axis vector ⁇ 2 are shown.
- 'gamma3[i_band][i_mix][x]' and 'gamma3[i_band][i_mix][y]' are the distribution (Kent distribution) indicated by 'i_mix' for the i-th band 'i_band' shows the X component (X coordinate) and the Y component (Y coordinate) that constitute the minor axis vector ⁇ 3 of .
- the model data contains the directional data in each bin, more specifically, the scale factor "scale_factor[i_bin]” that indicates the dynamic range of the directional gain, and the offset value of the directional data (directional gain) in each bin, i.e.
- the minimum value "offset[i_bin]” is also included.
- a set of parameters is also called a model parameter.
- the model data includes the original directivity data value (directivity gain) at the data point and the directivity data value indicated by the mixed model obtained by modeling (directivity gain).
- Difference information "diff_data[i_point]” indicating the difference between the two is also included.
- the difference information is information indicating the difference between the directivity data before modeling and the directivity data after modeling at the data point.
- "diff_data[i_point]" stored in the model data may be Huffman-encoded difference information.
- the output value F(x; ⁇ ) of the mixture model at each data point is calculated based on the model data having the configuration (format) shown in FIG. be.
- Each bin of the original directional data before modeling contains a number of bands described by the number of bands in modeling, "band_count”, determined by considering the similarity of the shape of the directional data. Belongs to one of our bands.
- bin information "bin_range_per_band[i_band]"
- maximum index which is index information indicating the highest frequency bin belonging to the band
- the number of bins belonging to each band may be different for each band.
- the first band 0 (band0), which has the lowest frequency, belongs to two bins 0 (bin0) and bin 1, the next band 1 belongs to one bin 2, and so on.
- Two bins 3 and 4 belong to band 2 of .
- the mixture model F'(x; ⁇ ) for each band can be obtained from the model parameters.
- the mixture model F'(x;[theta]) corresponds to the binwise mixture model F(x;[theta]) shown in equation (3).
- the directional data before modeling has a directional gain value for each data point bin. Therefore, the bandwise mixture model F'(x; ⁇ ) obtained from the model parameters, more specifically the mixture model output value F'(x; ⁇ ), is replaced with the original binwise mixture model F(x; ⁇ ).
- the mixture model F'(x; ⁇ ) for each band the scale factor "scale_factor[i_bin]" for each bin, and the minimum value for each bin "offset[i_bin] , the output value F(x; ⁇ ) of the mixture model for each bin at the data point is calculated.
- F(x; ⁇ ) F'(x; ⁇ ) x scale_factor[i_bin] + offset[i_bin] is calculated.
- the band-by-band mixture model output values F'(x; ⁇ ) are corrected for the dynamic range of each bin.
- the original directivity data before modeling is restored from the model data.
- the position of each data point and the frequency of each bin are obtained from the azimuth “azimuth_table[i]", elevation “elevation_table[i]", and radius “distance[i]” stored in the model data. , and the frequency “freq[i_bin]”.
- FIG. 7 shows the amount of model data when the directivity data is actually modeled so that the model data has the structure shown in FIG.
- the original directional data before modeling has 2522 data points and 29 bins.
- the number of bands "band_count” is set to "3"
- modeling is performed with a mixed model consisting of a vMF distribution (ellipticity ⁇ , major axis vector ⁇ 2 , minor axis vector ⁇ 3 ).
- the model data includes difference information as necessary, and the difference information is used to restore the directivity data as appropriate.
- each of the plurality of straight lines drawn on the surface of the sphere represents vector ⁇ 1 described above.
- vector V51 represents one vector ⁇ 1 .
- the value (residual error) at each data point of the residual data indicated by the arrow Q43 is stored in the model data as difference information "diff_data[i_point]".
- HOA Higher Order Ambisonics
- Directivity generally has a more complex shape and a higher degree of convexity in the high frequencies.
- the usefulness of phase information is relatively low in high frequencies. Therefore, when reducing the amount of directional data, it is more advantageous to adopt a method of modeling using a mixed distribution model as in this technology rather than using HOA.
- the shape of the directivity is relatively gentle in the low frequency range, and physical phenomena such as diffraction and interference can be reproduced by recording the phase. You may make it use the method to carry out.
- the directivity data (amplitude data) generated (restored) based on the model data has directivity only at specific discrete frequency points, that is, specific bins. Gain exists. In other words, since there are frequencies at which there is no directivity gain, rendering processing may not be possible if directivity data generated from model data is used as is.
- the data points are also arranged discretely, if the user's viewpoint position (listening position) or the object moves and the positional relationship between the user and the object changes, the directionality data used for rendering processing will change. Data points also change. In such cases, if the spacing between adjacent data points is large, glitches (waveform discontinuities) will occur.
- directivity gains may be obtained for more frequencies (bins) and directions (data points) by performing interpolation processing in the frequency direction and the time direction on the directivity data.
- interpolation processing in the frequency direction it is conceivable to perform primary interpolation processing or secondary interpolation processing using directional gains of bins indicating a plurality of frequencies near a specific frequency to be obtained.
- bilinear interpolation processing in the azimuth direction and elevation direction using the directional gain for each bin at a plurality of data points near the direction (position) to be obtained may be performed. can be considered.
- the amount of computation when modeling directional data depends on various parameters such as the frame length of audio data (number of samples/frame), the number of mixtures in the mixture model, the model to be selected (distribution), and the number of data points.
- the effect on sound quality changes (trade-off).
- interpolation processing in the time direction suppresses the occurrence of waveform discontinuities, resulting in higher quality audio. Regeneration can be achieved.
- content creators increase the number of data points of directivity data according to, for example, the shape of the directivity of a sound source (object), or cope with the small number of data points by interpolation processing during playback. You can also decide whether
- an arbitrary code such as Huffman coding is used to express the difference information indicating the error (difference) between the original directivity data to be modeled (encoded) and the mixed model, that is, the directivity data after modeling. It may be encoded by an encoding method and transmitted.
- a flag or the like to switch whether or not to use various types of information such as difference information and the method of using directivity data (rendering method), such as interpolation processing in the frequency direction and interpolation processing in the time direction.
- a flag may be used to switch between low-precision parameters for low-resource reproduction devices and high-precision parameters for high-resource reproduction devices, that is, to switch parameter precision.
- the parameters are switched according to, for example, the resources of the playback device and the network environment at the time of content distribution.
- this technology can also be applied to texture data in video, such as color and transparency information for volumetric point cloud data.
- FIG. 9 is a diagram illustrating a configuration example of a server to which the present technology is applied.
- the server 11 shown in FIG. 9 is an information processing device such as a computer, and distributes content.
- the content consists of audio data of one or more objects (object audio data), and directivity data prepared for each sound source type, representing the directivity of the sound source (object), that is, the directional characteristics.
- Such content can be obtained, for example, by recording directivity data with the sound of a 3D sound source using a microphone array or the like. Also, the content may include video data corresponding to the audio data.
- the server 11 has a modeling unit 21, a model data generation unit 22, an audio data encoding unit 23, and an output unit 24.
- the modeling unit 21 models the input directivity data of each sound source type, and supplies the model parameters and difference information obtained as a result to the model data generation unit 22 .
- the model data generation unit 22 generates model data based on the model parameters and difference information supplied from the modeling unit 21 and supplies the model data to the output unit 24 .
- the audio data encoding unit 23 encodes the input audio data of each object and supplies the resulting encoded audio data to the output unit 24 .
- the output unit 24 multiplexes the model data supplied from the model data generation unit 22 and the encoded audio data supplied from the audio data encoding unit 23 to generate and output an encoded bitstream.
- model data and the encoded audio data are output at the same time
- the model data and the encoded audio data are generated separately and output at different timings.
- the model data and the encoded audio data may be generated by different devices.
- step S ⁇ b>11 the modeling unit 21 models the input directivity data of each sound source type, and supplies model parameters and difference information obtained as a result to the model data generation unit 22 .
- the modeling unit 21 models the directivity data by representing (representing) the directivity data with a mixed model consisting of a plurality of distributions shown in the above equation (3).
- the parameter concentration ⁇ , ellipticity ⁇ , weight ⁇ i , vector ⁇ 1 , major axis vector ⁇ 2 , minor axis vector ⁇ 3 , scale factor, minimum value is obtained as a model parameter.
- the modeling unit 21 generates information indicating the number of data points, the positions of the data points, the number of frequency points, the center frequency of the bin, etc. as information about the original directivity data before modeling.
- the modeling unit 21 uses the residual (difference) between the modeled directivity data, that is, the directivity data represented by the mixed model and the original directivity data before modeling as difference information. Generate.
- the difference information is obtained when a specific condition is satisfied, such as when the residual between the directivity data represented by the mixed model and the original directivity data is greater than or equal to a predetermined value, or when the content creator, etc. may be generated when generation of difference information is instructed by .
- the modeling unit 21 supplies the model parameters obtained in this way, information on the original directivity data before modeling, and difference information to the model data generating unit 22 .
- step S12 the model data generation unit 22 generates model data by packing the model parameters supplied from the modeling unit 21, the information on the original directivity data before modeling, and the difference information, and outputs the model data. 24.
- the model data generation unit 22 for example, Huffman-encodes the difference information, and packs the resulting encoded difference information (hereinafter also referred to as differential encoded data), model parameters, etc., to obtain the data shown in FIG. Generate model data in the format shown in . Note that the model parameters and model data may be coded.
- step S ⁇ b>13 the audio data encoding unit 23 encodes the input audio data of each object, and supplies the resulting encoded audio data to the output unit 24 .
- the audio data encoding unit 23 When there is metadata for the audio data of each object, the audio data encoding unit 23 also encodes the metadata of each object (audio data), and outputs the resulting encoded metadata to the output unit 24. supply to
- the metadata includes object position information indicating the absolute position of the object in the three-dimensional space, object direction information indicating the orientation of the object in the three-dimensional space, sound source type information indicating the type of the object (sound source), etc. include.
- step S14 the output unit 24 multiplexes the model data supplied from the model data generation unit 22 and the encoded audio data supplied from the audio data encoding unit 23 to generate and output an encoded bitstream.
- the output unit 24 generates an encoded bitstream including model data, encoded audio data, and encoded metadata.
- the output unit 24 transmits the encoded bitstream to an information processing device functioning as a client (not shown). Once the encoded bitstream has been transmitted, the encoding process ends.
- the server 11 models the directivity data and outputs an encoded bitstream containing model parameters and difference information obtained as a result. By doing so, it is possible to reduce the amount of directional data transmitted to the client, that is, the amount of directional data transmitted. As a result, occurrence of transmission delay and increase in transmission rate can be suppressed.
- FIG. 11 An information processing apparatus that acquires an encoded bitstream output from the server 11 and generates output audio data for reproducing the sound of content is configured as shown in FIG. 11, for example.
- the information processing device 51 shown in FIG. 11 is composed of, for example, a personal computer, a smart phone, a tablet, a game device, and the like.
- the information processing device 51 has an acquisition unit 61 , a distribution model decoding unit 62 , an audio data decoding unit 63 , and a rendering processing unit 64 .
- the acquisition unit 61 acquires the encoded bitstream output from the server 11 and extracts model data and encoded audio data from the encoded bitstream.
- the acquisition unit 61 supplies the model data to the distribution model decoding unit 62 and supplies the encoded audio data to the audio data decoding unit 63 .
- the distribution model decoding unit 62 calculates directivity data from the model data.
- the distribution model decoding unit 62 has an unpacking unit 81 , a directivity data calculation unit 82 , a difference information decoding unit 83 , an addition unit 84 and a frequency interpolation processing unit 85 .
- the unpacking unit 81 unpacks the model data supplied from the acquiring unit 61 to extract model parameters, information on original directivity data before modeling, and differential code data from the model data.
- the unpacking unit 81 also supplies the model parameters and information about the original directivity data before modeling to the directivity data calculating unit 82 , and supplies the differential encoded data to the differential information decoding unit 83 .
- the directivity data calculator 82 calculates (restores) the directivity data based on the model parameters supplied from the unpacking unit 81 and the information on the original directivity data before modeling, and supplies the directivity data to the adder 84 .
- the directivity data calculated (restored) by the directivity data calculator 82 based on the model parameters will also be referred to as approximate directivity data.
- the differential information decoding unit 83 decodes the encoded differential data supplied from the unpacking unit 81 using a method corresponding to Huffman coding, and adds the resulting differential information as a directional data residual. 84.
- the addition unit 84 adds the general directivity data supplied from the directivity data calculation unit 82 and the directivity data residual (difference information) supplied from the difference information decoding unit 83 to obtain the following: , and supplies it to the frequency interpolation processing unit 85 .
- the frequency interpolation processing unit 85 performs frequency direction interpolation processing on the directivity data supplied from the addition unit 84 and supplies the resulting directivity data to the rendering processing unit 64 .
- the audio data decoding unit 63 decodes the encoded audio data supplied from the acquisition unit 61 and supplies the resulting audio data of each object to the rendering processing unit 64 .
- the audio data decoding unit 63 decodes the encoded metadata supplied from the acquisition unit 61 and renders the resulting metadata. 64.
- the rendering processing unit 64 generates output audio data based on the directivity data supplied from the frequency interpolation processing unit 85 and the audio data supplied from the audio data decoding unit 63 .
- the rendering processing unit 64 has a directivity data storage unit 86, an HRTF (Head Related Transfer Function) data storage unit 87, a temporal interpolation processing unit 88, a directivity convolution unit 89, and an HRTF convolution unit 90.
- HRTF Head Related Transfer Function
- Viewpoint position information, listener direction information, object position information, and object direction information are supplied to the directivity data storage unit 86 and the HRTF data storage unit 87 in accordance with user designation, sensor measurement, and the like. .
- the viewpoint position information is information indicating the viewpoint position (listening position) of the user (listener) viewing the content in the three-dimensional space
- the listener direction information is the face of the user viewing the content in the three-dimensional space. This is information indicating the orientation of the viewpoint position (listening position) of the user (listener) viewing the content in the three-dimensional space
- the object position information and the object orientation information are extracted from the metadata obtained by decoding the encoded metadata, It is supplied to the HRTF data holding unit 87 .
- sound source type information obtained by extracting from metadata is also supplied to the directivity data holding unit 86, and a user ID indicating the user viewing the content is stored in the HRTF data holding unit 87 as appropriate. supplied.
- the directivity data holding unit 86 holds the directivity data supplied from the frequency interpolation processing unit 85 .
- the directivity data holding unit 86 selects directivity data corresponding to the supplied viewpoint position information, listener direction information, object position information, object direction information, and sound source type information from the held directivity data. The data is read out and supplied to the time interpolation processing section 88 .
- the HRTF data holding unit 87 holds HRTFs for each user indicated by the user ID for each of multiple directions viewed from the user (listener).
- the HRTF data holding unit 87 reads out HRTFs corresponding to the supplied viewpoint position information, listener direction information, object position information, object direction information, and user ID from the held HRTFs, and HRTF convolution unit 90 supply to
- the temporal interpolation processing unit 88 performs temporal interpolation processing on the directivity data supplied from the directivity data holding unit 86 and supplies the resultant directivity data to the directivity convolution unit 89 .
- the directional convolution unit 89 convolves the audio data supplied from the audio data decoding unit 63 and the directional data supplied from the time interpolation processing unit 88, and supplies the resulting audio data to the HRTF convolution unit 90. do. Convolution of the directional data adds the directional characteristics of the object (sound source) to the audio data.
- the HRTF convolution unit 90 convolves the audio data supplied from the directional convolution unit 89, that is, the audio data in which the directional data is convoluted, with the HRTF supplied from the HRTF data holding unit 87, resulting in Output audio data as output audio data. By convolving the HRTF, it is possible to obtain output audio data in which the sound of the object is localized at the position of the object seen by the user (listener).
- This directivity data generation process is started when the acquisition unit 61 receives the encoded bitstream transmitted from the server 11 and supplies the model data extracted from the encoded bitstream to the unpacking unit 81 . be.
- step S51 the unpacking unit 81 unpacks the model data supplied from the acquiring unit 61, and outputs information about model parameters extracted from the model data and original directivity data before modeling to the directivity data calculating unit. 82.
- step S52 the directivity data calculation unit 82 calculates (generates) approximate directivity data based on the model parameters supplied from the unpacking unit 81 and information on the original directivity data before modeling, It is supplied to the adding section 84 .
- the directivity data calculation unit 82 includes a mixture model F'(x; ]” to calculate the binwise mixture model output value F(x; ⁇ ) at the data point. This results in approximate directivity data consisting of the directivity gain (amplitude data) for each bin at each data point.
- step S53 the unpacking unit 81 determines whether or not the model data supplied from the acquiring unit 61 contains differential code data, that is, whether or not there is differential code data.
- step S53 If it is determined in step S53 that differential encoded data is included, the unpacking unit 81 extracts differential encoded data from the model data and supplies the differential encoded data to the differential information decoding unit 83, after which the process proceeds to step S54. move on.
- step S ⁇ b>54 the differential information decoding unit 83 decodes the differential encoded data supplied from the unpacking unit 81 and supplies the resulting directional data residual (difference information) to the addition unit 84 .
- step S55 the adding unit 84 adds the directivity data residual supplied from the difference information decoding unit 83 to the general directivity data supplied from the directivity data calculating unit 82.
- the addition unit 84 supplies the directivity data obtained by the addition to the frequency interpolation processing unit 85, after which the process proceeds to step S56.
- step S53 if it is determined in step S53 that differential code data is not included, the processing of steps S54 and S55 is skipped, and then the processing proceeds to step S56.
- the adder 84 supplies the general directivity data supplied from the directivity data calculator 82 to the frequency interpolation processor 85 as restored directivity data.
- step S53 If it is determined in step S53 that differential code data is not included, or if the process of step S55 is performed, the process of step S56 is performed.
- step S56 the frequency interpolation processing unit 85 performs interpolation processing in the frequency direction on the directivity data supplied from the addition unit 84, and supplies the directivity data obtained by the interpolation processing to the directivity data holding unit 86. to hold.
- the audio data of an object is data in the frequency domain
- the audio data has frequency component values for each of multiple frequency bins.
- an interpolation process of calculating the directivity gain of the necessary bin so that the directivity data has the directivity gain for all frequency bins in which the audio data has frequency component values. is done.
- the frequency interpolation processing unit 85 performs an interpolation process based on the directional gains of a plurality of bins (frequencies) of predetermined data points in the directional data, so that Calculate the directional gain for new frequencies (bins) at the same data points that were not Through such interpolation processing in the frequency direction, it is possible to obtain directivity data including directivity gains at more frequencies.
- the directivity data generating process ends.
- the information processing device 51 calculates the directivity data based on the model data. By doing so, it is possible to reduce the amount of directional data to be transmitted, that is, the amount of directional data to be transmitted. As a result, occurrence of transmission delay and increase in transmission rate can be suppressed.
- step S81 the audio data decoding unit 63 decodes the encoded audio data supplied from the acquisition unit 61, and supplies the resulting audio data to the directional convolution unit 89. For example, decoding yields audio data in the frequency domain.
- the audio data decoding unit 63 decodes the encoded metadata, and extracts the object position information and the object direction information included in the resulting metadata.
- the sound source type information is supplied to the directivity data holding unit 86 and the HRTF data holding unit 87 as appropriate.
- the directivity data holding unit 86 supplies the time interpolation processing unit 88 with directivity data corresponding to the supplied viewpoint position information, listener direction information, object position information, object direction information, and sound source type information.
- the directivity data storage unit 86 identifies the relationship between the object in the three-dimensional space and the user's viewpoint position (listening position) from the viewpoint position information, the listener direction information, the object position information, and the object direction information. Identify data points according to results.
- the position on the spherical surface of the mixture model in the viewpoint position direction when viewed from the center of the mixture model is specified as the target data point position. Note that there may not be an actual data point at the data point location of interest.
- the directivity data storage unit 86 extracts the directivity gain of each bin at a plurality of data points near the specified target data point position from the directivity data of the sound source type indicated by the sound source type information.
- the directivity data holding unit 86 performs time interpolation processing on the data consisting of the directivity gain of each bin in the plurality of extracted data points as directivity data according to the relationship between the position and direction of the object and the user (listener). 88.
- the HRTF data holding unit 87 supplies the HRTF convolution unit 90 with HRTF corresponding to the supplied viewpoint position information, listener direction information, object position information, object direction information, and user ID.
- the HRTF data storage unit 87 stores the relative direction of the object as viewed from the listener (user) based on the viewpoint position information, the listener direction information, the object position information, and the object direction information. Identify as Then, the HRTF data holding unit 87 supplies the HRTF in the direction corresponding to the object direction to the HRTF convolution unit 90 among the HRTFs in each direction corresponding to the user ID.
- step S82 the time interpolation processing unit 88 performs temporal interpolation processing on the directivity data supplied from the directivity data holding unit 86, and supplies the resultant directivity data to the directivity convolution unit 89. do.
- the time interpolation processing unit 88 calculates the directivity gain of each bin at the target data point position by interpolation processing, based on the directivity gain of each bin at a plurality of data points included in the directivity data. That is, the directivity gain at a new data point (target data point position) different from the original data point is calculated by interpolation processing.
- the temporal interpolation processing unit 88 supplies the data of the directivity gain of each bin at the target data point position to the directivity convolution unit 89 as the directivity data obtained by interpolation processing in the time direction.
- step S83 the directional convolution unit 89 convolves the audio data supplied from the audio data decoding unit 63 and the directional data supplied from the time interpolation processing unit 88, and applies the resulting audio data to the HRTF convolution unit. supply to 90.
- step S84 the HRTF convolution unit 90 convolves the audio data supplied from the directional convolution unit 89 and the HRTF supplied from the HRTF data holding unit 87, and outputs the resulting output audio data.
- step S85 the information processing device 51 determines whether or not to end the process.
- step S85 when encoded audio data of a new frame is supplied from the acquisition unit 61 to the audio data decoding unit 63, it is determined in step S85 that the process is not to end. On the other hand, for example, when the encoded audio data of a new frame is not supplied from the acquisition unit 61 to the audio data decoding unit 63 and the output audio data of all frames of the content is generated, the process ends in step S85. be judged.
- step S85 If it is determined in step S85 that the process has not yet ended, then the process returns to step S81 and the above-described processes are repeated.
- step S85 the information processing device 51 terminates the operation of each unit and terminates the output audio data generation processing.
- the information processing device 51 selects appropriate directivity data and HRTF, and convolves the directivity data and HRTF with audio data to produce output audio data. By doing so, it is possible to realize high-quality audio reproduction with a more realistic feeling by considering the directional characteristics of the object (sound source) and the relationship between the position and orientation of the object and the listener.
- the server 11 appropriately generates difference information indicating the difference between the directivity data before modeling and the directivity data after modeling.
- differential information is encoded by an encoding method such as Huffman encoding to obtain differential encoded data.
- the difference information encoding method is applied on the server 11 side, that is, on the encoder side, so that appropriate encoding can be performed according to the sound source type and frequency band for encoding the difference information. You may enable it to be selected.
- a distribution of occurrence probability is generated based on difference information for each of a plurality of bins obtained from one directional data to be coded. .
- the horizontal axis indicates the value (dB value) of the difference information
- the vertical axis indicates the appearance probability of each value of the difference information.
- the appearance probability of each value of the difference information is obtained by generating a histogram from the difference information of each bin.
- the appearance probability distribution may be obtained for each bin, may be obtained for bins included in a specific frequency band, or may be obtained for all bins. may be required or any of them may be selectable.
- the server 11 selects an appropriate Huffman coding table from a plurality of prepared Huffman coding tables, or selects one new Huffman coding table based on the appearance probability of such difference information.
- a conversion table is generated.
- All bins (frequencies) in all data points of the directional data may be considered and one Huffman coding table may be selected or generated for all those bins, or One Huffman coding table may be selected or generated.
- the Huffman coding table selected or generated in this manner is used to Huffman code the difference information.
- the Huffman coding table is a table for converting pre-encoded data into Huffman code, showing the correspondence between pre-encoded data, that is, difference information, and Huffman code (encoded data) obtained by encoding. be.
- a reverse lookup table corresponding to the Huffman coding table is used when decoding the difference encoded data obtained by Huffman coding the difference information.
- the reverse lookup table is a table for converting the Huffman code into the data after decoding, showing the correspondence between the Huffman code (encoded data) and the data after decoding.
- This reverse lookup table can be generated from a Huffman coding table.
- both the server 11 (encoder) and the information processing device 51 (decoder) may hold Huffman-encoding tables in advance. In such a case, the server 11 notifies the information processing device 51 of ID information indicating the Huffman coding table used for Huffman coding the difference information.
- the server 11 may store the Huffman coding table or the reverse lookup table in the coded bitstream and transmit it to the information processing device 51 .
- the Huffman coding table is transmitted from the server 11 to the information processing device 51, and the information processing device 51 performs reverse lookup based on the Huffman coding table at the time of decoding or the like.
- a reference table may be generated.
- the range corresponding to the data of the narrow dynamic range including the value of the difference information with high occurrence probability such as the range of ⁇ 3 dB as the range of possible values of the difference information, was selected as the target range, and only that target range was targeted.
- a Huffman coding table may be used.
- the difference information of values outside the target range that is, the difference information of irregular values with a low appearance probability
- the differential information is treated as it is as differential code data.
- a highly efficient Huffman coding table is selected or generated according to the probability density distribution of difference information, and information about which Huffman coding table to use is encoded. By describing it in the bitstream, it is possible to efficiently encode and transmit the differential information.
- the dynamic range can be further reduced and the encoding efficiency can be improved.
- multistage differential encoding can be realized by combining a plurality of schemes.
- the mode indicating the presence or absence of multi-stage differential encoding and the method is recorded as enc_mode etc. in the model data.
- the multistage differential encoding method is recorded in the lower 4 bits and whether the target is a real number or a complex number is recorded in the upper 4 bits, the following information is stored in the model data.
- Target data is real number
- 0x00 No multistage differential encoding
- 0x01 Spatial adjacent difference method
- 0x02 Inter-frequency difference method
- 0x03 Spatial adjacent difference method + Inter-frequency difference method (target data is complex number)
- 0x1* Lower bits are the same as the target data real number
- the spatial adjacent difference method when encoding the difference information of the data point to be processed, the difference information at the data point to be processed and the difference information at other data points near the data point to be processed. is obtained as spatial difference information. For example, a difference in difference information between adjacent data points is obtained as spatial difference information. Then, the obtained spatial difference information is Huffman-encoded to obtain differential encoded data.
- the data at spatially close positions (data points) in the directivity data that is, the directivity gain and difference information, take advantage of the property that they tend to take close values.
- the inter-frequency difference method when encoding the difference information of the bin (frequency) to be processed, the difference information in the bin to be processed and the neighboring frequencies of the bins adjacent to the bin to be processed are used. is obtained as inter-frequency difference information. Then, the obtained inter-frequency difference information is Huffman-encoded to be differential code data.
- the data of close frequencies that is, the directivity gain and difference information, take advantage of the property that they tend to take close values.
- the difference in spatial difference information between adjacent bins is obtained as inter-frequency difference information, and the inter-frequency difference information is Huffman encoded.
- a difference in inter-frequency difference information between adjacent data points is obtained as spatial difference information, and the spatial difference information is Huffman-encoded.
- the complex difference method is used when the directivity data has not only information about the amplitude described above but also information about the phase.
- the directional data has information about amplitude and phase
- the information about those amplitudes and phases, that is, the directional gain is expressed by a complex number.
- the directivity data has complex number data (hereinafter also referred to as complex directivity gain) indicating the amplitude and phase for each bin for each data point, and the difference information is also complex number data.
- the real and imaginary parts of the differential information represented by complex numbers are Huffman-encoded independently (individually), or two-dimensional data consisting of real and imaginary parts (complex directivity gain) Huffman encoding is performed.
- the complex difference method it may be possible to select whether Huffman coding is performed on the real part and the imaginary part separately, or Huffman coding is performed on the two-dimensional data.
- each method of encoding by combining at least one or more of the spatially adjacent differential method, the inter-frequency differential method, and the complex differential method, and the method of Huffman encoding the difference information as it is are referred to as one differential encoding method or Also referred to as differential encoding mode.
- a differential encoding method that Huffman-encodes differential information as it is can be said to be an encoding that uses the difference, that is, a method that does not perform differential encoding.
- the server 11 selects the most efficient one from among a plurality of differential encoding methods (differential encoding modes) based on the differential information, etc., and Huffman-encodes the differential information using the selected differential encoding method. I do.
- the code amount (data amount) of the differential code data in each differential encoding method is obtained by calculation based on the difference information, and the differential encoding method with the smallest code amount is the most efficient. It may be selected as high.
- an appropriate differential encoding method may be selected based on, for example, the sound source type of the directional data or the environment during recording of the directional data such as an anechoic room.
- the HOA method for each frequency band, that is, for each bin or band, or in common for all frequency bands, at least one or more of the HOA method, the mixed method, the complex mixed method, and the difference method are combined. may be used to generate the model data.
- directivity data is modeled by one or a plurality of different methods such as the HOA method and the mixed method, and model data including model parameters and the like obtained as a result is generated.
- the HOA method is a method that uses HOA to model directional data consisting of complex directional gains for each data point bin. That is, the HOA method is a method of modeling directivity data by spherical harmonic expansion.
- spherical harmonic expansion is performed on the directional data, and as a result, spherical harmonic coefficients, which are coefficients for spherical harmonic functions in each dimension, are obtained as model parameters.
- Directivity data consisting of the complex directional gain after modeling by HOA can be obtained from the spherical harmonic coefficients in each of these dimensions.
- the mixed method is a method of modeling using a mixed model consisting of the above-mentioned Kent distribution and vMF distribution.
- the mixed scheme can describe the shape of the directional gain, which varies sharply at a particular orientation (direction) as seen from the sound source, ie the location of the data points.
- the complex mixture method is a method of modeling directivity data consisting of complex directivity gain, that is, amplitude and phase data, using a mixture distribution (mixture model) corresponding to complex numbers.
- modeling by the following two methods can be considered.
- each of the real and imaginary parts of the complex directivity gain, or each of the amplitude and phase angle obtained from the complex directivity gain is independently calculated using a mixed model of probability density distribution for real numbers.
- a method of modeling by describing is conceivable.
- the directional data is modeled by a mixture model consisting of one or more complex Bingham distributions or one or more complex Watson distributions, so that the model parameters are similar to those in the mixture scheme. can get. From the model parameters thus obtained, it is possible to obtain directivity data consisting of complex directivity gains after modeling in the complex mixture method.
- the description is performed in the format shown in the following formula (5). That is, the complex Bingham distribution value f(z) is represented by the following equation (5).
- the complex vector z in Equation (5) corresponds to the position vector x on the spherical surface in the Kent distribution or the vMF distribution, and z* is its complex conjugate.
- the complex matrix A is a k ⁇ k-dimensional matrix indicating the position, steepness, direction, and shape, and the normalization coefficient C(A) is given by the following equation (6).
- ⁇ j is the eigenvalue of the complex matrix A and ⁇ 1 ⁇ 2 ⁇ 3 ⁇ . . . ⁇ k .
- the mixture model consisting of one or more complex Bingham distributions that is, the number of mixtures and weights in the complex Bingham mixture model are common to the formulation of the mixture model consisting of the Kent distribution and the vMF distribution described above.
- a value F(x; ⁇ ) of a mixture model using N complex Bingham distributions f(z; ⁇ i ) can be weighted and described as shown in the following equation (8).
- the sum of the weights is 1, ⁇ is the set of all parameters, ⁇ i is the set of parameters of each complex Bingham distribution (parameters constituting the complex Bingham distribution), and ⁇ i is each complex It represents the weights for the Bingham distribution.
- the difference method is a method that uses differences to generate model data.
- model data is generated by combining one or more other methods such as the HOA method and the mixed method with the difference method
- the difference method the directivity data before modeling and the one or more other methods
- Difference information indicating the difference from the directivity data after modeling is encoded by any of the differential encoding methods described above, and differential encoded data obtained as a result is stored in the model data.
- the difference in directivity data obtained by the difference method may be modeled by the HOA method or the like.
- the difference method for example, at least one of the difference between spatial positions (between data points) and the difference between frequencies (between bins or bands) is obtained for the difference information, and the resulting difference is Huffman encoded. are used as differential code data.
- the difference in the differential information to be Huffman-encoded is a complex number
- the real part and the imaginary part of the difference may be individually Huffman-encoded, or the complex number may be directly Huffman-encoded.
- each of the amplitude component and the phase component obtained from the difference may be individually Huffman-encoded.
- At this time, at least one of the spatially adjacent differential method, the inter-frequency differential method, and the complex differential method, including at least one of the spatially adjacent differential method and the inter-frequency differential method, is used. That is, a difference in directivity gain between spatial positions (between data points) or between frequencies (between bins or bands) is obtained, and the difference is Huffman-encoded.
- the difference is represented by a complex number
- the real part and the imaginary part of the difference may be separately Huffman-encoded, or the difference (complex number) may be Huffman-encoded.
- each of the amplitude component and the phase component obtained from the difference may be individually Huffman-encoded.
- model data is generated that includes data composed of Huffman codes obtained by Huffman coding the differences obtained by the differential method (hereinafter also referred to as coded directivity data).
- coded directivity data since there is no directivity data residual, the model data does not include differential code data.
- differential code It is necessary to define the order of data when data and encoded directional data are stored in model data, and the compression ratio varies depending on the data order.
- the differential information is calculated after applying offsets and scale factors to the average directivity and matching the dynamic range.
- model data is generated by combining the HOA method, mixture method, complex mixture method, and difference method
- the methods for generating model data can be categorized into the following five methods.
- the five methods here are the band hybrid method, the additive hybrid method, the multiplicative hybrid method, the spherical harmonic coefficient modeling method, and the combination hybrid method. Each method will be described below.
- the band hybrid method is a method for switching which method of the HOA method, the mixing method, the complex mixing method, and the differential method to generate model data for each frequency band, that is, for each bin or band.
- low frequencies may be recorded with a complex directional gain
- high frequencies may be recorded with a real directional gain.
- the HOA method is used for modeling in the lower band
- the mixed method is used for modeling in the higher band. Modeling of directional data can be performed.
- the low-side band may be modeled by a complex mixed method using a complex Bingham distribution or the like, and the high-side band may be modeled by a mixed method.
- additive hybrid method In the additive hybrid method, difference information indicating the difference from modeled directivity data is further modeled or encoded by a differential method.
- additive hybrid methods include the following methods (AH1) to (AH4).
- processing is executed in order from the method described on the left.
- the directional data is first modeled with a mixed method.
- difference information indicating the difference between the directivity data before modeling and the directivity data after modeling by the mixed method is encoded by the differential method to generate difference encoded data.
- model data including model parameters obtained by modeling in the mixed method and differential code data is generated.
- the directivity data is first modeled using the HOA method.
- modeling in the HOA method involves spherical harmonic expansion up to low-order terms.
- difference information indicating the difference between the directivity data before modeling and the directivity data after modeling by the HOA method is further modeled by the mixed method.
- model data is generated that includes model parameters obtained by modeling in the HOA method and model parameters obtained by modeling differential information in the mixed method.
- method (AH3) as in method (AH2), the HOA method is used to model up to the lower-order terms, and then the difference information obtained for modeling in the HOA method is encoded by the difference method. and differential code data is generated.
- model data including model parameters obtained by modeling in the HOA method and differential code data is generated.
- method (AH4) as in method (AH2), after modeling up to low-order terms with the HOA method, the differential information is further modeled with the mixed method.
- the difference information indicating the difference between the difference information obtained for modeling by the HOA method and the difference information after modeling by the mixed method is encoded by the difference method to generate difference encoded data.
- the difference information indicating the difference between the directivity data after modeling modeled by a combination of the HOA method and the mixed method and the directivity data before modeling is encoded by the differential method to obtain differential encoded data. is generated.
- model data is generated that includes the model parameters obtained by modeling with the HOA method, the model parameters obtained by modeling the difference information with the mixed method, and the differential code data.
- the difference information that is obtained is also referred to as intermediate difference information.
- the difference information obtained by modeling in the HOA method is the intermediate difference information, and this intermediate difference information is modeled in the mixed method.
- difference information indicating the difference between the original intermediate difference information and the intermediate difference information after modeling by the mixed method is encoded by the differential method.
- method (AH2) cannot obtain data that completely matches the original directivity data on the decoding side, but method (AH1) and method (AH3) , and method (AH4) yields a perfect match with the original directivity data.
- the directional data may be modeled or coded by a single method instead of the additive hybrid method. That is, for example, a model in which the directional data is modeled or coded by only one of the HOA method, the mixed method, and the differential method, and the resulting model parameters or coded directional data is included. Data may be generated.
- multiplicative hybrid method In the multiplicative hybrid method, the directivity data is modeled by a predetermined method, and the ratio (quotient) of the directivity data after modeling and the directivity data before modeling is another method different from the predetermined method. is modeled with
- multiplicative hybrid system examples include the following system (MH1) and system (MH2).
- the directivity data is first modeled by the HOA method.
- modeling in the HOA method involves spherical harmonic expansion up to low-order terms.
- the value obtained by dividing the directivity data before modeling by the directivity data after modeling in the HOA method (hereinafter also referred to as amplitude modulation information) is further modeled in a mixed method.
- the absolute value (amplitude component) of the complex number (complex directivity gain) that constitutes the amplitude modulation information may be modeled by the mixed method, or the amplitude component of the directivity data before and after modeling may be used. may be used as the amplitude modulation information.
- model data including model parameters obtained by modeling in the HOA method and model parameters obtained by modeling amplitude modulation information in the mixed method is generated.
- the directivity data calculated from the model parameters for the HOA method is multiplied by the amplitude modulation information calculated from the model parameters for the mixed method to calculate the final directivity data.
- amplitude modulation information indicating fine amplitude fluctuations corresponding to the high frequency direction (direction from the sound source), which cannot be expressed by modeling up to the low-order terms in the HOA method, is obtained. It is modeled by a mixed method and recorded (stored) in model data. At the time of decoding, the directivity data calculated from the model parameters for the HOA method is modulated with the amplitude modulation information to obtain the directivity data with less error.
- method (MH2) as in method (MH1), the directivity data is modeled up to the lower-order terms in the HOA method.
- the value obtained by dividing the directivity data before modeling by the directivity data after modeling in the HOA method (hereinafter also referred to as amplitude phase modulation information) is further modeled in a mixed method.
- the real part and imaginary part of the complex number (complex directional gain) and the amplitude component and phase component that constitute the amplitude phase modulation information are modeled by the mixed method.
- the amplitude phase modulation information may be modeled by a complex mixing scheme.
- model data including model parameters obtained by modeling in the HOA method and model parameters obtained by modeling the amplitude phase modulation information in the mixed method is generated.
- the directivity data calculated from the model parameters for the HOA method is multiplied by the amplitude phase modulation information calculated from the model parameters for the mixed method to calculate the final directivity data.
- amplitude phase modulation information that indicates rotational changes in the high-frequency phase according to the direction (direction from the sound source), which cannot be expressed by modeling down to the lower-order terms in the HOA method. is modeled by a mixed method and recorded (stored) in the model data.
- the directivity data calculated from the model parameters for the HOA method is modulated by the amplitude phase modulation information to obtain the directivity data with less error.
- the real and imaginary parts of the complex numbers are different or independent by the same method ( separately) may be modeled.
- the real part may be modeled by a mixed method and the imaginary part may also be modeled by a mixed method.
- the amplitude component and the phase component may be modeled independently (individually) by any method, and complex number data may be modeled by the complex mixture method.
- the directivity data is modeled in two stages, the HOA method and the mixed method, in the spherical harmonic coefficient modeling method.
- spherical harmonic coefficients are calculated based on the model parameters for the mixed method, and then directivity data (approximate directivity data) are calculated based on the spherical harmonic coefficients.
- each of the real and imaginary parts of the spherical harmonic coefficients as model parameters, or each of the amplitude and phase components obtained from the model parameters can be modeled individually (independently) by any method such as a mixing method.
- the spherical harmonic coefficients may be modeled by complex mixtures, such as one or more complex Bingham distributions.
- model data is generated using a combination of at least two of the above-described band hybrid method, additive hybrid method, multiplicative hybrid method, and spherical harmonics modeling method.
- information indicating a combination of one or more methods used to generate model data may be stored in the model data.
- the server 11 side can appropriately select or switch between one or more methods used to generate model data.
- the model data is configured as shown in FIGS. 15 and 16, for example. 16 shows a portion following the portion shown in FIG. 15. As shown in FIG. 15 and 16 corresponding to those shown in FIG. 5 will be omitted as appropriate.
- FIGS. 15 and 16 are examples in which the directivity information (directivity data) of one type of sound source specified by num_sound_types_id is described as directivityConfig.
- the vMF distribution, the Kent distribution, and the syntax when there is difference data (difference information) are shown as an example of realizing the hybrid method, and the number of bits of each information is just an example.
- the model data shown in FIGS. 15 and 16 are basically composed of the same data as the model data shown in FIG. The number of bits and the data structure of some data are different.
- the azimuth “azimuth_table[i]” and elevation “elevation_table[i]” are 16-bit unsigned shorts.
- the number of bands “band_count” and the number of mixtures “mix_count[i_band]” are 8-bit unsigned chars, and the selection flag “dist_flag” is a 1-bit bool.
- the model data includes the ID of the hybrid mode (differential encoding mode (differential encoding method)) used for encoding the differential information, that is, "mode” indicating the differential encoding mode information.
- the model data also includes an index "table_index” indicating the Huffman coding table used for coding the difference information.
- the model data includes "int db_resolution”, which indicates the quantization step size such as quantization every 1.0 dB.
- int db_resolution a value of '0' indicates no quantization, a value of '1' indicates 0.01 dB, a value of '2' indicates 0.2 dB, and a value of '3' indicates 0.4 dB, the value "256" indicates 25.6 dB.
- model data also includes a Huffman code (Huffman code) obtained by Huffman coding the difference information for each data point for each bin, that is, "diff_data[i_bin][i_point]", which is differential code data. stored.
- Huffman code Huffman code
- information of the configuration shown in FIG. 17 is transmitted from the server 11 to the information processing device 51, either stored in the model data or separately from the model data.
- the information shown in FIG. 17 includes a Huffman coding table or a reverse lookup table.
- diff_mode_count is information indicating the total number of differential encoding methods, and "int_nbits_res_data” is stored for this total number "diff_mode_count”.
- This "int_nbits_res_data” is information indicating the maximum number of bits of the Huffman code, that is, the maximum word length of the Huffman code. can be done.
- element_count is information indicating the number of elements in the Huffman coding table or reverse lookup table
- “Huff_dec_table[i_element]” which is the number of elements, is stored.
- “Huff_dec_table[i_element]” is an element of the reverse lookup table.
- the Huffman coding table is as shown in FIG. 18, for example. That is, FIG. 18 shows a specific example of the Huffman coding table.
- Huff_dec_table is a reverse lookup table when the maximum word length is 2 bits. 0: 0dB 1: 0dB 2: 1dB 3: 2dB
- processing is performed in the following procedure.
- An offset value is required for restoration.
- ⁇ Server configuration example> When the server 11 generates model data by combining one or a plurality of methods and encodes difference information in the differential encoding mode, the server 11 is configured as shown in FIG. 19, for example.
- the server 11 shown in FIG. 19 is an information processing device such as a computer, and functions as an encoding device as in the case of FIG.
- the server 11 has a directional data encoding unit 201, an audio data encoding unit 23, and an output unit 24.
- the directional data encoding unit 201 generates model data based on the supplied directional data.
- Directivity data encoding section 201 has model parameter estimation section 211 , residual calculation section 212 , encoding method selection section 213 , Huffman encoding section 214 , and model data generation section 215 .
- model parameter estimation unit 211 and the residual calculation unit 212 correspond to the modeling unit 21 in FIG. handle.
- the model parameter estimating unit 211 models the supplied directivity data to be processed by at least one method such as the HOA method or the mixed method, and the residual calculating unit 212 calculates the model parameters obtained as a result for each method. It is supplied to the model data generator 215 .
- the residual calculation unit 212 calculates difference information based on the supplied directivity data to be processed and the model parameters supplied from the model parameter estimation unit 211, and the coding method selection unit 213 and the Huffman coding unit 214.
- the encoding method selection unit 213 selects a differential encoding mode and a Huffman encoding mode when Huffman encoding the difference information.
- An encoding table is selected, and encoding mode information indicating the selection result is supplied to the Huffman encoding unit 214 and the model data generation unit 215 .
- the encoding mode information consists of differential encoding mode information indicating the selected differential encoding mode (differential encoding method) and table index information indicating the selected Huffman coding table. It should be noted that only the difference information may be used when the encoding mode information is generated by the encoding method selection unit 213 .
- the Huffman encoding unit 214 Huffman-encodes the difference information supplied from the residual calculation unit 212 based on the encoding mode information supplied from the encoding method selection unit 213, and converts the resulting encoded differential data into It is supplied to the model data generator 215 .
- the model data generating unit 215 generates model parameters for each method supplied from the model parameter estimating unit 211 , differential encoded data supplied from the Huffman encoding unit 214 , and encoding mode supplied from the encoding scheme selecting unit 213 .
- the model data including the information is generated and supplied to the output unit 24 .
- the difference code data is not included in the model data when the difference information is not encoded.
- the model data also stores information about the directivity data described above.
- information indicating the method used to model the directivity data may be stored in the model data.
- the server 11 performs the encoding process described with reference to FIG. However, in steps S11 and S12, in more detail, the processing described below is performed.
- step S11 the model parameter estimation unit 211 models the supplied directivity data to be processed by at least one method, and the residual calculation unit 212 calculates difference information as necessary. be done.
- the HOA method, the mixture method, the complex mixture method, the difference method, and the like are combined as necessary, and thereby the above-mentioned band hybrid method, additive hybrid method, multiplicative hybrid method, spherical harmonic coefficient modeling method, Model parameters and difference information are calculated by a combination hybrid method or the like.
- step S12 the encoding method selection unit 213 selects a differential encoding mode and a Huffman encoding table, and the Huffman encoding unit 214 performs Huffman encoding as necessary. Data generation takes place.
- the model parameter estimation unit 211 first models the directivity data by the HOA method, and as a result, the spherical harmonics as the model parameters. get the coefficients.
- the model parameter estimating unit 211 obtains the difference between the directivity data modeled by the HOA method and the directivity data before modeling as intermediate difference information, and models the intermediate difference information by the mixing method.
- parameter concentration ⁇ , ellipticity ⁇ , weight ⁇ i , vector ⁇ 1 , major axis vector ⁇ 2 , minor axis vector ⁇ 3 , scale factor, and minimum value are obtained as model parameters. be done.
- the model parameter estimating unit 211 combines the model parameters obtained by modeling the directivity data by the HOA method and the model parameters obtained by modeling the intermediate difference information by the mixed method into the residual calculation unit 212 and the model data generation unit 212 . 215.
- the residual calculator 212 generates difference information based on the model parameters supplied from the model parameter estimator 211 and the supplied directivity data.
- This difference information is the residual difference between the directivity data after modeling, which is modeled by a combination of the HOA method and the mixed method, and the directivity data before modeling.
- the Huffman encoding unit 214 Huffman-encodes the difference information supplied from the residual calculation unit 212 according to the encoding mode information supplied from the encoding method selection unit 213 as necessary.
- processing is performed by the method indicated by the differential encoding mode information. That is, for example, the difference information is Huffman-encoded by one or more of the spatial adjacent difference method, the inter-frequency difference method, and the complex difference method, or the Huffman-encoding of the difference information is not performed.
- the Huffman coding unit 214 obtains the difference of the difference information between the adjacent data points as the spatial difference information, and Huffman codes the spatial difference information. Generate differential code data.
- the model data generation unit 215 generates model data including the HOA model parameters and the mixed method model parameters supplied from the model parameter estimation unit 211 and the encoding mode information supplied from the encoding method selection unit 213. Generate. In particular, when the difference information is Huffman-encoded, the model data generator 215 also stores the differential code data supplied from the Huffman encoder 214 in the model data.
- model parameter estimation unit 211 uses at least one of the spatial adjacent difference method and the inter-frequency difference method based on the supplied directivity data.
- a difference in directivity data (hereinafter also referred to as differential directivity data) is obtained.
- This differential directivity data is the difference in directivity data, or directivity gain, between data points or between bins.
- the encoding method selection unit 213 generates encoding mode information based on the differential directivity data supplied from the model parameter estimation unit 211 via the residual calculation unit 212 .
- the Huffman encoding unit 214 designates the differential directivity data supplied from the model parameter estimation unit 211 via the residual calculation unit 212 based on the encoding mode information supplied from the encoding method selection unit 213. Huffman encoding is performed by the differential encoding method to generate encoded directivity data.
- model data generation unit 215 generates model data including the coding directivity data supplied from the Huffman coding unit 214 and the coding mode information supplied from the coding scheme selection unit 213, and outputs the model data to the output unit 24. supply to
- the information processing device 51 that has received the encoded bitstream supplied from the server 11 having the configuration shown in FIG. 19 performs, for example, the directivity data generation processing shown in FIG. The output audio data generation processing described above is performed.
- step S111 the same processing as the processing in step S51 of FIG. 12 is performed. That is, in step S111, the unpacking unit 81 unpacks the model data, and extracts model parameters, information on original directivity data before modeling, differential code data, and the like from the model data.
- step S112 the unpacking unit 81 determines whether or not there are model parameters that have not yet been supplied to the directivity data calculating unit 82 among the model parameters for each method extracted by the unpacking.
- step S112 If it is determined in step S112 that there are model parameters, the unpacking unit 81 supplies the directivity data calculating unit 82 with the model parameters that have not yet been supplied to the directivity data calculating unit 82, that is, have not yet been processed. Then, the process proceeds to step S113.
- step S113 the directivity data calculation unit 82 calculates data based on the model parameters based on the model parameters of one method supplied from the unpacking unit 81.
- step S113 based on the model parameters for each method such as the HOA method and the mixed method, the directivity gain, the intermediate difference information, the amplitude modulation information, the amplitude phase modulation information, etc., which constitute the directivity data after modeling, are modeled. Calculated as data based on parameters.
- step S113 After the processing of step S113 is performed, the processing returns to step S112, and the above-described processing is repeatedly performed.
- step S112 determines whether there is no model parameter that has not been supplied to the directivity data calculation unit 82. If it is determined in step S112 that there is no model parameter that has not been supplied to the directivity data calculation unit 82, then the process proceeds to step S114.
- step S114 the unpacking unit 81 determines whether or not the model data supplied from the acquiring unit 61 contains differential code data, that is, whether or not there is differential code data.
- step S114 If it is determined in step S114 that differential encoded data is included, the unpacking unit 81 supplies the differential encoded data and the encoding mode information extracted from the model data to the differential information decoding unit 83. The process proceeds to step S115.
- step S115 the differential information decoding unit 83 acquires the encoding mode information and differential code data output from the unpacking unit 81.
- step S116 the difference information decoding unit 83 decodes the difference encoded data based on the obtained encoding mode information, and supplies the resulting difference information (directivity data residual) to the addition unit 84.
- differential encoding mode information included in the encoding mode information specifies that encoding is being performed using the spatial adjacent differential method.
- the difference information decoding unit 83 uses the reverse lookup table specified by the table index information included in the encoding mode information to decode the differential encoded data supplied from the unpacking unit 81. , to obtain the spatial difference information for each data point.
- the difference information decoding unit 83 adds the difference information of other decoded data points in the vicinity of the data point to the spatial difference information of the data point to be processed, thereby obtaining the difference information of the data point to be processed.
- step S116 If the process of step S116 has been performed, or if it is determined that there is no differential code data in step S114, then the process of step S117 is performed.
- step S117 the directivity data calculator 82 and the adder 84 calculate the directivity data.
- the directivity data calculation unit 82 calculates approximate directivity data based on the data obtained by the process of step S113 performed one or more times, and supplies it to the addition unit 84.
- model parameters are calculated by the addition hybrid method (AH4) on the server 11 side.
- step S113 post-modeling directivity data (rough directivity data) is calculated based on the model parameters of the HOA method.
- intermediate difference information after modeling is calculated based on the model parameters of the mixed method.
- the directivity data calculation unit 82 adds the intermediate difference information to the outline directivity data, that is, adds the intermediate difference information for each bin at each data point to the directivity gain for each bin at each data point. By doing so, the final outline directivity data is obtained.
- the addition unit 84 adds the difference information (directivity data residual) supplied from the difference information decoding unit 83 to the final general directivity data obtained by the directivity data calculation unit 82 in this manner. Then, directivity data is calculated and supplied to the frequency interpolation processing unit 85 . If there is no difference information, the final outline directivity data is used as the directivity data.
- model parameters are calculated by the multiplication hybrid method (MH1) on the server 11 side.
- step S113 post-modeling directivity data (rough directivity data) is calculated based on the model parameters of the HOA method. Further, in the process of step S113 for the second time, amplitude modulation information after modeling is calculated based on the model parameters of the mixed method.
- the directivity data calculation unit 82 multiplies the approximate directivity data by the amplitude modulation information, that is, the directivity gain for each bin at each data point is multiplied by the amplitude modulation information for each bin at each data point. to obtain the final directivity data.
- the processing of steps S115 and S116 is not performed, and since there is no difference information, the directivity data obtained by the directivity data calculator 82 is directly supplied to the frequency interpolation processor 85 via the adder 84. .
- model data may be generated by the difference method alone on the server 11 side.
- the process of step S113 is not performed, and the encoded directivity data is decoded by the difference information decoding unit 83 in steps S115 and S116.
- the difference information decoding unit 83 uses the reverse lookup table specified by the table index information included in the encoding mode information to decode the encoded directivity data supplied from the unpacking unit 81, Obtain differential directivity data.
- step S117 the difference information decoding unit 83 calculates directivity data based on the value (difference) for each bin of each data point forming the differential directivity data.
- the differential information decoding unit 83 adds the value (difference) for each bin of the data points to be processed to the The directional gains for each bin of the data point being processed are determined by adding the directional gains for the same bin of other reconstructed data points in the vicinity of the data point.
- the difference information decoding unit 83 adds the value (difference) of the bin to be processed of the data point to the process at the same data point.
- the directivity gain of the bin to be processed is obtained by adding the directivity gains of other restored bins in the vicinity of the bin of interest.
- step S118 is performed and the directivity data generation process ends. Since it is the same, its explanation is omitted.
- the information processing device 51 calculates the directivity data based on the model data. By doing so, it is possible to reduce the transmission amount of directional data. As a result, occurrence of transmission delay and increase in transmission rate can be suppressed.
- the directivity data encoding unit 201 has a model parameter estimation unit 241, a calculation unit 242, a model parameter estimation unit 243, a calculation unit 244, a differential coding unit 245, and a model data generation unit 215. ing.
- the model parameter estimator 241 through the calculator 244 correspond to the model parameter estimator 211 in FIG.
- the model parameter estimating unit 241 models the supplied directionality data to be processed by the mixed method, supplies the model parameters obtained as a result to the model data generating unit 215, and estimates the directivity after modeling by the mixed method. Data is supplied to the calculation unit 242 .
- the computing unit 242 calculates intermediate difference information by subtracting the modeled directivity data supplied from the model parameter estimating unit 241 (obtaining the difference) from the supplied directivity data to be processed. It is supplied to the parameter estimator 243 and the calculator 244 .
- the model parameter estimating unit 243 models the intermediate difference information supplied from the computing unit 242 by the HOA method, supplies the model parameters obtained as a result to the model data generating unit 215, and calculates the intermediate difference information after modeling by the HOA method.
- the difference information is supplied to the calculation unit 244 .
- the calculation unit 244 calculates difference information by subtracting the intermediate difference information after modeling supplied from the model parameter estimation unit 243 from the intermediate difference information supplied from the calculation unit 242 (finding the difference). It is supplied to the encoding unit 245 .
- the differential encoding unit 245 generates the encoding mode information and the differential encoded data based on the difference information supplied from the calculation unit 244 and the directivity data to be processed supplied as appropriate, and generates model data. 215.
- model parameter estimation unit 241 performs modeling by the mixed method and the model parameter estimation unit 243 performs modeling by the HOA method has been described.
- model parameter estimation unit 241 and the model parameter estimation unit 243 may use any method for modeling.
- the model parameter estimation unit 241 may perform modeling by the HOA method
- the model parameter estimation unit 243 may perform modeling by the mixed method.
- the differential encoding unit 245 can have the configuration shown in FIG. 22, for example.
- portions corresponding to those in FIG. 19 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
- the differential encoding unit 245 has a residual calculating unit 212, an encoding method selecting unit 213, a multistage differential processing unit 271, and a Huffman encoding unit 214.
- the residual calculation unit 212 calculates difference information based on the supplied directivity data to be processed, and the modeled directivity data and the intermediate difference information supplied from the model parameter estimation unit 241 and the model parameter estimation unit 243. is calculated and supplied to the encoding method selection unit 213 and the multistage difference processing unit 271 .
- the multistage difference processing unit 271 selects the difference indicated by the encoding mode information supplied from the encoding method selection unit 213 based on either the difference information from the residual calculation unit 212 or the difference information from the calculation unit 244. Multistage differential information is generated in differential encoding mode.
- Huffman coding when Huffman coding is performed by the spatial adjacent difference method as the differential encoding mode, spatial difference information is obtained as multistage difference information, and when Huffman coding is performed by the inter-frequency difference method as the differential encoding mode. , inter-frequency difference information is obtained as multistage difference information.
- Huffman coding when Huffman coding is performed by the spatial adjacent difference method and the inter-frequency difference method as the differential encoding mode, Huffman coding obtained by obtaining spatial difference information and inter-frequency difference information is performed. The information becomes multilevel differential information.
- the multistage difference processing unit 271 supplies the obtained multistage difference information to the encoding method selection unit 213 and the Huffman encoding unit 214 .
- the encoding method selection unit 213 selects the supplied directivity data to be processed, the difference information supplied from the residual calculation unit 212 or the calculation unit 244, and the multistage difference information supplied from the multistage difference processing unit 271. Based on this, the encoding mode information is generated and supplied to the multistage difference processing unit 271 , the Huffman encoding unit 214 and the model data generation unit 215 .
- the Huffman encoding unit 214 Huffman-encodes the multistage difference information supplied from the multistage difference processing unit 271 based on the encoding mode information supplied from the encoding method selection unit 213, and obtains differential encoded data as a result. is supplied to the model data generation unit 215 .
- step S151 the model parameter estimating unit 241 models the supplied directionality data to be processed using the mixed method.
- the model parameter estimation unit 241 supplies the model parameters obtained by modeling to the model data generation unit 215 and supplies the directivity data after modeling by the mixed method to the calculation unit 242 .
- step S152 the computing unit 242 calculates intermediate difference information based on the supplied directivity data to be processed and the modeled directivity data supplied from the model parameter estimating unit 241. 243 and the calculation unit 244 .
- step S153 the model parameter estimation unit 243 models the intermediate difference information supplied from the calculation unit 242 by the HOA method.
- the model parameter estimation unit 243 supplies the model parameters obtained by modeling to the model data generation unit 215, and supplies the intermediate difference information after modeling by the HOA method to the calculation unit 244.
- step S154 the computing unit 244 calculates difference information based on the intermediate difference information supplied from the computing unit 242 and the intermediate difference information after modeling supplied from the model parameter estimating unit 243, and the differential encoding unit 245.
- step S ⁇ b>155 the differential encoding unit 245 performs differential encoding based on the differential information supplied from the computing unit 244 .
- the encoding method selection unit 213 of the differential encoding unit 245 uses the supplied directivity data to be processed, the difference information supplied from the calculation unit 244, and the previous processing such as the previous frame to perform multistage differential processing.
- Coding mode information is generated based on the multistage difference information supplied from the unit 271 and supplied to the multistage difference processing unit 271 , the Huffman coding unit 214 , and the model data generation unit 215 .
- the encoding method selection unit 213 may use the difference information supplied from the residual calculation unit 212 to generate the encoding mode information.
- the multi-stage difference processing unit 271 generates multi-stage difference information based on, for example, the difference information supplied from the calculation unit 244 and the encoding mode information supplied from the encoding method selection unit 213, and selects the encoding method. It is supplied to the unit 213 and the Huffman coding unit 214 .
- the Huffman encoding unit 214 Huffman-encodes the multistage difference information supplied from the multistage difference processing unit 271 based on the encoding mode information supplied from the encoding method selection unit 213, and obtains differential encoded data as a result. is supplied to the model data generation unit 215 .
- step S ⁇ b>156 the model data generation unit 215 generates model data by packing and supplies it to the output unit 24 .
- the model data generating unit 215 generates the model parameters of the mixed method from the model parameter estimating unit 241, the model parameters of the HOA method from the model parameter estimating unit 243, and the coding mode information from the coding method selecting unit 213. , and the difference code data from the Huffman coding unit 214 are generated. When the model data is generated in this manner, the model data generation process ends.
- the directional data encoding unit 201 generates model data by the additive hybrid method. By doing so, it is possible to reduce the transmission amount of directional data and suppress the occurrence of transmission delays and an increase in the transmission rate.
- the distribution model decoding unit 62 of the information processing device 51 has the configuration shown in FIG. 24, for example.
- parts corresponding to those in FIG. 11 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
- calculation units 301 and 302 correspond to directivity data calculation unit 82 shown in FIG.
- the calculation unit 301 calculates the directivity data (approximate directivity data) after modeling by the mixed method based on the model parameters of the mixed method supplied from the unpacking unit 81 , and supplies the calculated directivity data to the calculation unit 304 .
- the calculation unit 302 calculates intermediate difference information after modeling by the HOA method based on the model parameters of the HOA method supplied from the unpacking unit 81 , and supplies it to the calculation unit 303 .
- the differential information decoding unit 83 calculates differential information (directivity data residual) based on the encoding mode information and the differential encoded data supplied from the unpacking unit 81 and supplies it to the computing unit 303 .
- the calculation unit 303 adds (synthesizes) the difference information supplied from the difference information decoding unit 83 and the intermediate difference information supplied from the calculation unit 302, and supplies the addition result (difference information) to the calculation unit 304. .
- the calculation unit 304 adds the directivity data (approximate directivity data) supplied from the calculation unit 301 and the addition result (difference information) supplied from the calculation unit 303, and obtains the resulting directivity data. is supplied to the frequency interpolation processing unit 85 .
- the directivity data (rough directivity data) is calculated by the calculating unit 301 in the first step S113 in the directivity data generation process of FIG. be. Also, in the second step S113, the calculation unit 302 calculates the intermediate difference information.
- the difference information decoding unit 83 performs the processing of steps S115 and S116 to generate difference information, and in step S117, addition processing is performed by the calculation units 303 and 304 to generate directivity data.
- the configuration of the model data described above is not limited to the configuration shown in FIG. 5 and the configurations shown in FIGS. 15 and 16, but can also be the configuration shown in FIG.
- bslbf indicates bit string, left bit first, that is, left bit first.
- uimsbf denotes unsigned integer most significant bit first, that is, an unsigned integer with the most significant bit first.
- the model data shown in FIG. 25 includes the number of frequency points "bin_count” indicating the number of frequency bins, and the frequency "bin_freq[i]" at the center of the frequency bin is equal to the number of frequency points "bin_count”. is stored.
- the mixture number “mix_count[j]” indicating the number of distributions that make up the mixture model in each band and the bin information "bin_range_per_band[j]” indicating the bins included in the band ” is stored.
- parameter concentration ⁇ concentration of mixtures “mix_count[k]”.
- weight ⁇ i weight of mixtures “mix_count[k]”.
- selection flag “dist_flag” selection flag “dist_flag” are stored as model parameters for the number of mixtures “mix_count[k]”.
- "kappa[j][k]” indicates parameter concentration ⁇
- "weight[j][k]” indicates weight ⁇ i
- "gamma_x[j][k]”, “gamma_y[j][k]”, and “gamma_z[j][k]” are the X component (X coordinate) and Y component ( Y coordinate), and the Z component (Z coordinate).
- the selection flag “dist_flag” is “1”, that is, when the distribution is the Kent distribution, the ellipticity ⁇ , the major axis vector ⁇ 2 and the minor axis vector ⁇ 3 are also stored.
- 'beta[j][k]' denotes the ellipticity ⁇
- 'gamma2_x[j][k]', 'gamma2_y[j][k]' and 'gamma2_z[j][k]' indicates the X, Y and Z components that make up the major axis vector ⁇ 2
- ⁇ gamma3_x[j][k]'', ⁇ gamma3_y[j][k]'' and ⁇ gamma3_z[j][k]'' represent the X, Y and Z components of the minor axis vector ⁇ 3 . showing.
- the model data contains the scale factor ⁇ scale_factor[i]'' that indicates the dynamic range of the directional gain and the offset value of the directional data in each bin, that is, the minimum value ⁇ offset[i ]” is also included.
- model data also contains information for identifying the position of each data point.
- the information processing device 51 uses the decoded directivity data when performing rendering processing. However, what is required in this case is not only the value (directivity gain) at the data point described in the original directivity data, but also the directivity gain at the position (azimuth) used during rendering processing. is.
- data directivity gain
- grid data arrangement data arrangement in which data points are arranged at grid points obtained by equally dividing the latitude and longitude on the surface of a sphere
- grid data arrangement data arrangement in which data points are arranged at grid points obtained by equally dividing the latitude and longitude on the surface of a sphere
- the uniform data arrangement referred to here is a data arrangement in which a plurality of data points are uniformly arranged on the surface of a sphere centered on the sound source position, as shown in FIG. 26, for example.
- uniform data placement places the data points at a constant density over any area on the surface of the sphere.
- each point on the spherical surface represents a data point, and the data points are arranged at a constant density in any direction as viewed from the sound source position. directional data) are recorded.
- Recording directivity data with such a uniform data arrangement is particularly effective when the direction of the listener (user) seen from the sound source changes evenly over time.
- non-uniform data arrangement is data arrangement in which multiple data points are non-uniformly arranged on a spherical surface centered on the sound source position.
- non-uniform data placement places the data points at different densities in different regions on the surface of the sphere. Therefore, it can be said that the cross-cut data arrangement is one example of non-uniform data placement, but the non-uniform data placement does not include the cross-cut data arrangement below. .
- non-uniform data arrangement for example, the area corresponding to the front direction of the sound source, which is important for hearing, on the surface of a sphere centered on the sound source position, and the possibility that the user's viewpoint and the sound source are close to each other as a positional relationship. It is conceivable to place a high density of data points in regions corresponding to high orientations of . In non-uniform data placement, it is also conceivable to place data points densely in areas of high directional gain.
- data points i.e., directional gains
- areas areas where the amount of change in directional gain is large as a whole or in important areas on a spherical surface centered on the sound source position.
- the priority of the directivity data may be determined based on the priority of the sound source type of the object in the content in which the directivity data is used.
- the description of the directivity data for that sound source type will be more bits are allocated. That is, it is conceivable that more data points are provided for the directivity data of the sound source type with higher priority, and the directivity data is recorded with high definition.
- FIG. 27 shows an example of a description format (Syntax) of information for specifying the position of each data point.
- each data point is arranged on the surface of a sphere centered at the sound source position.
- the present invention is not limited to this, and the distance from the sound source position to the data point may be different for each data point.
- position_type is information indicating the arrangement format (arrangement method) of data points, that is, the coordinate recording method.
- the value of the coordinate recording method "position_type" is "0x000".
- the value of the coordinate recording method "position_type” is "0x001"
- the coordinate The value of the recording method "position_type” is set to "0x010”.
- Priority_index is the priority of directional data, more specifically, priority information indicating the priority of directional data. For example, since the directivity data is prepared for each type of object, that is, for each sound source type, it can be said that the priority information indicates the priority of the directivity data for each type of sound source (object). This priority may change over time.
- the directivity data can be processed without reducing the spatial resolution in the information processing device 51, which is the decoding side, before modeling (before encoding). data points may be recovered (decoded).
- the distribution model decoding unit 62 based on the model data, directivity data having the same positions and the same number of data points as before modeling may be calculated. good.
- the density (number) of data points forming the directional data may be determined, for example, according to the priority of the directional data.
- azimuth_interval indicates the angle (difference in azimuth angle) indicating the azimuth direction interval between data points adjacent to each other in the azimuth direction on the surface of the sphere.
- the elevation angle interval "elevation_interval” indicates an angle (elevation angle difference) that indicates the elevation angle interval between data points that are adjacent to each other in the elevation direction on the surface of the sphere.
- At least one reference position such as the position in the front direction viewed from the sound source position, is known as the data point arrangement position on the information processing device 51 side. Therefore, from these azimuth and elevation spacings and predetermined reference positions, all data points can be located.
- the arrangement position of each data point is known for each number of data points, and the positions of all data points can be specified from the number of data points.
- position_type is "0x010", i.e. non-uniform data arrangement, the number of mandatory data points "num_mandatory_point” and the positions of the mandatory data points for that number of mandatory data points azimuth_table[i] and elevation data "elevation_table[i]" are described (stored).
- the resolution of the arrangement of data points in other words, the resolution for arranging data points ⁇ gain_resolution'' that indicates the arrangement density of data points is also described (stored is being used).
- the data point placement resolution “gain_resolution” is a decibel value that indicates the amount of variation in data (directivity gain).
- data points are set for each variation amount of the directional gain indicated by the data point arrangement resolution "gain_resolution”. That is, the number of data points in the directivity data obtained by decoding changes according to the data point arrangement resolution.
- the data points that always exist (are arranged) regardless of the data point arrangement resolution, that is, the data points that are always restored at the time of decoding are regarded as essential data points.
- a mandatory data point number “num_mandatory_point” indicating the number of mandatory data points is described.
- azimuth angle data "azimuth_table[i]” and the elevation angle data “elevation_table[i]” are the azimuth angle and elevation angle indicating the positions (coordinates) of the required data points in the azimuth direction and elevation direction, respectively.
- azimuth data "azimuth_table[i]” and elevation data “elevation_table[i]” can be used to specify the arrangement position of each essential data point.
- azimuth and elevation data are not limited to coordinates, i.e., azimuth and elevation, but may be any other information, such as an index from which azimuth and elevation can be obtained, as long as they are information that can specify the location of the required data points.
- Information such as
- the placement positions of the data points other than the required data points in the directional data are the placement positions of the required data points and the data point placement resolution "gain_resolution ” is identified based on
- a mixture model F(x; ⁇ ) for each bin is obtained based on model data, more specifically model parameters.
- This mixture model F(x; ⁇ ) gives the value of directivity gain at any position on the spherical surface surrounding the sound source position.
- non-essential data points (hereinafter also referred to as non-essential data points) on the surface of the sphere is placed.
- the positions of the non-essential data points are such that the directional gain values dictated by the mixture model F(x; ⁇ ) are different from the directional gain values at the essential data points on the spherical surface, e.g. is the position where the value is changed by the amount of variation indicated by .
- non-mandatory data points may be set.
- the non-essential data points may be arranged at intervals corresponding to the amount of variation indicated by the data point arrangement resolution with respect to the essential data points.
- a number of non-essential data points corresponding to the data point placement resolution may be arranged at equal intervals between essential data points adjacent to each other in the azimuth and elevation directions.
- the arrangement positions of all the data points that make up the directional data in the non-uniform arrangement that is, the arrangement positions of all the essential data points and the non-essential data points are identified.
- the decoding side information processing device 51
- the spatial resolution of the directional data ie the number of data points
- the number of data points changes according to the value of the priority "priority_index”.
- the spatial resolution of the directional data that is, the data amount of the directional data obtained by decoding, for example, the value of the priority "priority_index” is multiplied by the azimuth direction interval “azimuth_interval” or the elevation direction interval "elevation_interval". I can think of a way.
- a method of multiplying the number of data points "uniform_dist_point_count” by the reciprocal of the value of the priority "priority_index”, or a method of multiplying the value of the priority "priority_index” to the data point placement resolution " gain_resolution” can also be considered.
- the information processing device 51 can obtain directivity data with an appropriate spatial resolution. That is, the spatial resolution (number of data points) of the directional data can be adjusted appropriately.
- the information for specifying the position of each data point is the configuration shown in FIG. (hereinafter also referred to as data point position information) may be stored.
- model data includes data point position information configured as shown in FIG. 27, in step S12 of the encoding process described with reference to FIG. Generate model data that includes each piece of information. That is, model data including data point position information is generated.
- model data including data point position information may be generated by the model data generation unit 215 .
- difference information is obtained for each data point of the directional data after decoding, that is, for each data point specified by the data point position information.
- Each information such as is calculated.
- step S52 of the directivity data generation process described with reference to FIG. generate gender data.
- the directivity data calculation unit 82 identifies the data point arrangement format (coordinate recording system) based on the data point position information included in the model data, and determines the arrangement position of each data point in the directivity data. Identify. At this time, the directivity data calculator 82 also uses the priority information of the directivity data to identify the arrangement positions of the data points as necessary.
- the directivity data calculation unit 82 calculates a mixture model F′(x; Calculate the mixture model output value F(x; ⁇ ) for each bin at the data point, based on the minimum value for each bin. This results in approximate directivity data consisting of the directivity gain for each bin at each data point.
- model data includes data point position information
- directivity data generation process described with reference to FIG. Results are used as appropriate.
- the spatial adjacent difference method and the inter-frequency difference method have been described as the differential encoding methods.
- the difference information and directivity gain difference between adjacent bins, that is, between adjacent frequencies are obtained.
- the directivity gain values are close between adjacent frequencies (bins), that is, the shape of the directivity data is close.
- the difference information and directivity gain difference between adjacent data points that is, between adjacent positions are obtained.
- the directivity data has a characteristic that the difference in directivity gain between spatially close positions is small. That is, in the directivity data, the directivity gain on the surface of the sphere often changes continuously, and the closer the positions (azimuths), the closer the directivity gain values are.
- HRTF Head-Related Transfer Function
- SOFA spatialally Oriented Format for Acoustics
- data points are arranged at adjacent longitude positions along the circumference. At this time, the data points are arranged, for example, at regular intervals around the circle.
- the data points are arranged at each longitude position on the circumference corresponding to the latitude while sequentially changing the latitude value. , data points are provided on the surface of the sphere.
- directivity data of a method such as grid data arrangement can be obtained.
- the data density ie, the density of data points
- increases around the poles such as the south and north poles.
- directivity gain when actually recording directivity data (directivity gain) as described above, it is important to ensure that the data (data points) are dense in important directions where it is necessary to record changes in directivity gain with high precision. , or with a data distribution that is uniform (uniform distribution) as a whole.
- the important azimuth here is, for example, the frontal direction, a direction that is often used during rendering, a direction of a position with a large directivity gain value, and the like.
- differential encoding may be performed by sorting (arrangement) as follows.
- Method DE1 Differential encoding with data points sorted by a given criterion
- Method DE2 Differential encoding with directivity gain decibel values sorted in ascending or descending order
- Methodhod DE3 High priority Differential encoding by sorting from orientation
- data points that is, differential information and directional gains at data points
- a predetermined order for data arrangements such as grid data arrangement, uniform data arrangement, and non-uniform data arrangement ( sorted).
- the difference information and the directivity gain difference are obtained between the data points adjacent to each other after sorting.
- the order of sorting is known on the decoding side, that is, on the information processing device 51 side.
- the data points are sorted in ascending or descending order of the values (decibel values (dB values)) for which differences such as differential information and directional gain are to be calculated at those data points.
- dB values decibel values
- differential information and directivity gain differences are obtained between mutually adjacent data points after sorting. By doing so, the difference information and the difference in directivity gain between data points can be made smaller.
- information indicating the sorting order of data points after sorting is stored in the model data so that the sorting order can be specified on the decoding side (information processing device 51 side).
- the data point position information shown in FIG. 27 may store information indicating the arrangement order of the data points after sorting.
- the information indicating the sort order of the data points after sorting may be anything, such as information obtained by arranging the indexes indicating each data point in sort order.
- each data point is sorted in order from the azimuth with the highest priority, such as the azimuth in front or the azimuth with a large directivity gain. Differential information and directivity gain differences are obtained between data points adjacent to each other. As a result, the amount of data such as difference information that has undergone differential encoding can be contained within a predetermined number of bits.
- method DE3 as in method DE2, information indicating the order of data points after sorting is stored in the model data.
- the differential information and directional gains are sorted in a predetermined order of data points and frequencies (bins), and the difference between adjacent differential information and directional gains after sorting, i.e. Differences between data points and between bins are determined. Note that after sorting is performed in a predetermined order, differences may be obtained both between data points and between bins, or differences may be obtained only between bins.
- the difference between adjacent difference information and directional gains that is, between data points or the difference between bins.
- the differential information and directional data in the bins of each data point are sorted according to the priority of the data points and frequencies (bins). , i.e. the difference between data points or between bins.
- the data points or bins are sorted in order of priority.
- Sorting may be performed by groups of one or more bins or data points, such as by .
- each variable (information) in the encoded bitstream such as in model data, may be tabulated, and only the index indicating the value of the variable after tabulation may be transmitted.
- variable value in the floating-point format for recording variable values, it is possible to take any value within the float (32-bit) format as the variable value.
- the syntax may be written in the following manner.
- variable values (parameters) to be described often take specific values or can be represented by specific values
- the values actually used that is, the variable values to be described are tabulated. Then, only indexes obtained by tabulation are described in the coded bitstream such as model data, that is, in the Syntax.
- the table itself is transmitted to the decoding side separately from the encoded bitstream.
- the variable value can be described with a small number of bits, and the data amount (transmission amount) of the encoded bitstream can be reduced.
- variable values such as only the range of 0.0 to 0.1 or only the range of 0.9 to 1.0 of the variable values. be done.
- the actual variable value is stored in the model data or the like and transmitted.
- F'(x; ⁇ ) is the output value of the mixture model for each band.
- scale factor "scale_factor[i]” is the sum of the vMF distribution and Kent distribution (model data sum), that is, the sum of the values (directivity gain) at each data point of the mixture model F'(x; ⁇ ), It is the ratio of the sum of the values at each data point of the original directional data before modeling in the bin indicated by index i, i.e. the i-th bin.
- This scale factor is a float value that represents the dynamic range.
- model data sum is the sum of values (directivity gain) defined on the spherical surface and is ideally 1, but it is not 1 because it is actually discretized.
- original directivity data before modeling is dB scale data, and is offset in the positive direction when calculating the scale factor.
- the minimum value "offset[i]" is the original directional data before modeling in the i-th bin, i.e. the minimum value of the directional gain (dB value), expressed as a float value. .
- the output values of the mixture model can be corrected and restored according to the dynamic range of each bin.
- scale factors and minimum values for the number of bins are required. If the frequency resolution of the directional data is made high-definition, the amount of information required to record the scale factors and minimum values, that is, the number of bits, is proportional to the number of bins. It becomes many.
- the amount of information (number of bits) required to record the scale factor and minimum value may be reduced by parametrically expressing the scale factor and minimum value.
- values shown in FIGS. 28 and 29 are obtained as scale factors and minimum values (offset values) for directivity data of six sound source types.
- FIG. 28 shows scale factors for each of the six sound source types.
- the vertical axis indicates the value of the scale factor, which is a dimensionless ratio
- the horizontal axis indicates the bin index i.
- the scale factor fluctuates greatly between adjacent bins, and the scale factor fluctuates less between adjacent bins.
- FIG. 29 shows the minimum value (offset value) for each of the six sound source types.
- the vertical axis indicates the minimum value (offset value) in dB
- the horizontal axis indicates the bin index i.
- the minimum value fluctuates greatly between adjacent bins and fluctuates less depending on the sound source type.
- the model data generation unit 22 and the model data generation unit 215 may change the scale factor or the minimum value of each bin if the variation between bins is large and the encoding efficiency cannot be improved by the parametric expression of the scale factor or the minimum value. is stored (described) in the model data as it is.
- model data generation unit 22 and the model data generation unit 215 parametricize the scale factor or the minimum value and store it in the model data (description )do.
- the model data generation unit 22 and the model data generation unit 215 generate function approximation parameters for obtaining an approximation function corresponding to the graph representing the scale factor or minimum value of each bin by curve fitting or the like. Then, the model data generation unit 22 and the model data generation unit 215 store the function approximation parameter in the model data instead of the scale factor or minimum value of each bin.
- the directivity data calculation unit 82 and the calculation unit 301 obtain the scale factor or minimum value in each bin from the approximate function based on the function approximation parameter and the bin index i, and use it as a model parameter.
- function approximation instead of having to store the scale factors and minimum values of all bins in the model data, only the function approximation parameters need to be described, and the amount of data can be compressed.
- function approximation arbitrary approximation such as approximation by a linear function, n-th order function (n ⁇ 2), polynomial approximation, etc. can be performed.
- pre-processing for function approximation includes taking the logarithm of the scale factor and minimum value, and converting the scale factor and minimum value using a nonlinear function. By doing so, the dynamic range may be compressed.
- examples of methods for generating model data by combining the HOA method, the mixture method, the complex mixture method, and the difference method include the band hybrid method, additive hybrid method, multiplicative hybrid method, and spherical harmonic coefficient modeling. A method and a combined hybrid method have been described.
- any of the above-mentioned HOA method, mixing method, complex mixing method, difference method, band hybrid method, additive hybrid method, etc. You may make it generate
- horizontal plane data that is, data on the equator (directivity gain) is most likely to be used frequently, while data near the poles is likely to be used less frequently. Therefore, by switching the method for each area, it is possible to appropriately reduce the number of bits of model data.
- the horizontal plane here is a plane including a plurality of positions where the latitude, that is, the elevation angle, is 0 degrees as viewed from the sound source position.
- the HOA method and the mixed method more specifically, the method of modeling with the vMF distribution.
- the degree of spherical harmonic expansion in the HOA method may be set to the first order, and the combination of the HOA method and the mixed method may be used or only the mixed method may be used for each region (orientation).
- data points near the horizontal plane are recorded with high precision using a method that models directivity data by circular harmonic function expansion instead of spherical harmonic function expansion, and data points other than near the horizontal plane are recorded in any other way. It is also conceivable to sparsely record the directional gain by the method of .
- the shape of a speaker as a sound source is symmetrical, and the directivity data of the speaker is also symmetrical.
- the data are not vertically symmetrical.
- regular dodecahedral loudspeakers and the like have also been commercialized, and regular dodecahedral loudspeakers have symmetry with respect to 12 directions.
- regular dodecahedral loudspeakers have symmetry with respect to 12 directions.
- in the case of a cubic full-range speaker not only left-right symmetry but also up-down symmetry may be established.
- humans also have a laterally symmetrical external shape, and although lateral symmetry is established to some extent, the top and bottom are not symmetrical with the head, torso, and legs, and the directivity is not vertically symmetrical.
- the syntax of the model data will be as shown in Fig. 30, for example.
- the model data shown in FIG. 30 includes the number of frequency points “bin_count” indicating the number of bins, and the center frequencies “bin_freq[i]” of the bins are stored for the number of frequency points “bin_count”. ing.
- values "4", “3", “2”, “1”, and “0" of the symmetry information "use_symmetry” indicate that vertical and horizontal symmetry, horizontal and vertical symmetry, and vertical and horizontal symmetry, respectively. , exploit any symmetry and rotation, and do not perform any symmetry or rotation operations.
- the directivity data is described by a model such as the above-mentioned vMF distribution or Kent distribution, that is, a mixed model, etc. for the directivity gain in all directions. be. Also, the values "5" to "7" of the symmetry information "use_symmetry" are reserved.
- the model data stores operation-related information for rotation or symmetry operations according to the value of the symmetry information "use_symmetry".
- the model data When the value of the symmetry information "use_symmetry” is "4", the model data describes operation-related information “LeftRightVerticalLineSymmetricDir()" for vertically and horizontally symmetrical operations. When the value of the symmetry information “use_symmetry” is “3”, the model data describes operation-related information “LeftRightLineSymmetricDir( )” for left-right symmetry operation.
- model data describes operation-related information "VerticalLineSymmetricDir()" for vertical symmetrical operations.
- model data When the value of the symmetry information "use_symmetry" is "1", the model data describes operation-related information "SymmetricDir()" for arbitrary symmetrical or rotational operations.
- Fig. 31 shows the syntax of "SymmetricDir()".
- model data "SymmetricDir()" contains the mixture number "mix_count[j]", the bin information "bin_range_per_band[j]", and the model parameters "kappa[j ][k]", "weight[j][k]”, “gamma_x[j][k]”, “gamma_y[j][k]”, and “gamma_z[j][k]” and selection flags "dist_flag[j][k]” is stored.
- the operation count information "sym_operation_count” performs a rotation operation, which is an operation to rotate and copy, or a symmetric operation, which is an operation to copy to a symmetrical position, for one distribution (distribution model) such as vMF distribution or Kent distribution. This is information indicating the number of times.
- the operation flag "sym_operation_flag” is flag information indicating whether to perform a rotation operation or a symmetrical operation. For example, when the value of the operation flag "sym_operation_flag" is "1", it indicates that a rotation operation is to be performed, and when the value is "0", it indicates that a symmetrical operation is to be performed.
- the operation flag "sym_operation_flag" is included for the number of times indicated by the operation count information "sym_operation_count", and the information necessary for the operation is stored according to the value of the operation flag.
- the rotation axis azimuth angle "sym_azi” and the rotation axis elevation angle “sym_elev” are the azimuth angle and elevation angle that indicate the direction of the rotation axis as seen from the sound source position when performing the rotation operation. That is, the rotation axis is determined by these rotation axis azimuth angle and rotation axis elevation angle. Also, the rotation angle “sym_rotation” is the angle when rotating around the rotation axis in the rotation operation.
- FIGS. 32 and 33 are examples in which a rotation operation and a symmetrical operation are performed on the Kent distribution.
- FIG. 32 shows an example of rotating the Kent distribution.
- the directivity gain on the sphere SP11 is represented by the Kent distribution
- vectors V81 to V83 represent vector ⁇ 1 , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 of the Kent distribution.
- These vectors V81 through V83 are the model parameters stored in the model data, that is, "gamma_x[j][k]” through “gamma_z[j][k]” and “gamma2_x[j][k]” through It is obtained by "gamma2_z[j][k]”.
- the directivity data calculation unit 82 of the information processing device 51 obtains the rotation axis RS11 based on the rotation axis azimuth angle "sym_azi” and the rotation axis elevation angle "sym_elev” read from the model data.
- the directivity data calculator 82 uses the vectors V81 to V83 to find the Kent distribution f(x; ⁇ i ).
- the directivity data calculator 82 obtains the Kent distribution f(x; ⁇ i ) using the vectors V′81 to V′83.
- vector V'81 through vector V'83 are rotated vectors obtained by rotating vector V81 through vector V83 by the rotation angle "sym_rotation" stored in the model data about the rotation axis RS11. is.
- vectors V'81 to V'83 are used as the Kent distribution vector ⁇ 1 , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 .
- the directivity data calculation unit 82 calculates rotated model parameters by performing a rotation operation on the model parameters such as the vector ⁇ 1 of the Kent distribution based on the rotation axis azimuth angle and the like. Then, the directivity data calculation unit 82 obtains a Kent distribution based on each of the pre-rotation model parameters and the rotated (post-rotation) model parameters. That is, directivity data (directivity gain) is calculated. In other words, one distribution is obtained by synthesizing the Kent distribution obtained from the model parameters before the rotation operation and the Kent distribution obtained from the model parameters after the rotation operation. Desired.
- the two Kent distributions may be used as they are to calculate the mixture model, or only a partial region of each of the two Kent distributions, such as the right half or the left half, may be used to calculate the mixture model. This applies not only to rotational operations but also to symmetrical operations.
- Fig. 33 shows an example of performing symmetrical operations on the Kent distribution.
- portions corresponding to those in FIG. 32 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
- the directivity data calculation unit 82 obtains the cross section SF11 of the sphere SP11, which is the plane of symmetry, based on the yaw angle "sym_yaw”, pitch angle “sym_pitch”, and roll angle "sym_roll” read from the model data.
- This cross section SF11 is a plane that includes the center (sound source position) of the sphere SP11.
- the directivity data calculator 82 uses the vectors V81 to V83 to find the Kent distribution f(x; ⁇ i ).
- the directivity data calculator 82 obtains the Kent distribution f(x; ⁇ i ) using the vectors V''81 to V''83.
- the vectors V''81 to V''83 are vectors obtained by folding back (symmetrically moving) the vectors V81 to V83 with the cross section SF11 as the plane of symmetry. That is, the vectors V''81 to V''83 and the vectors V81 to V83 are symmetrical (plane symmetrical) with respect to the cross section SF11.
- vectors V''81 to V''83 are used as vector ⁇ 1 , major axis vector ⁇ 2 , and minor axis vector ⁇ 3 of the Kent distribution.
- the directivity data calculation unit 82 calculates symmetrically moved (symmetrically operated) model parameters by performing symmetrical operations on model parameters such as vector ⁇ 1 of the Kent distribution based on the yaw angle and the like. . Then, the directivity data calculation unit 82 obtains a Kent distribution based on each of the model parameters before the symmetrical movement and the model parameters after the symmetrical movement (after the symmetrical movement). Then, the directivity data (directivity gain) is calculated.
- FIG. 34 shows an example syntax of information "NonSymmetricDir()" for obtaining directivity data in the model data shown in FIG.
- scale factors "scale_factor[i]” and minimum values “offset[i]” are also stored for the number of frequency points "bin_count”.
- the model data since no rotation or symmetry operations are performed, the model data describes the model parameters that make up all the distributions.
- the directivity data calculator 82 performs a symmetrical operation when decoding the directivity data.
- the directivity data calculation unit 82 calculates the distribution corresponding to the model parameters described in the model data with respect to the front median plane. Perform left-right symmetry operations to obtain new vMF distributions and Kent distributions.
- the left-right symmetrical operation performed in this case is a symmetrical operation in which the front median plane (median plane) seen from the sound source is the cross section SF11 shown in FIG.
- the left-right symmetrical operation is realized by performing the symmetrical operation described with reference to FIG. 33 with the median plane as the cross section SF11.
- synthesizing the distribution obtained from the model parameters before the left-right symmetry operation and the distribution obtained from the model parameters after the left-right symmetry operation one distribution that is left-right symmetric when viewed from the sound source can be obtained.
- the directivity data calculation unit 82 performs a vertically symmetrical operation on the distribution corresponding to the model parameter described in the model data with respect to the front horizontal plane. to obtain new vMF distributions and Kent distributions.
- the vertical symmetrical operation performed in this case is a symmetrical operation in which the front horizontal plane (horizontal plane) seen from the sound source is the cross section SF11 shown in FIG.
- the vertical symmetrical operation is realized by performing the symmetrical operation described with reference to FIG. 33 with the horizontal plane as the cross section SF11.
- synthesizing the distribution obtained from the model parameters before the vertical symmetry operation and the distribution obtained from the model parameters after the vertical symmetry operation one distribution that is vertically symmetrical when viewed from the sound source can be obtained.
- the directivity data calculation unit 82 performs vertical and horizontal symmetrical operations on the distribution corresponding to the model parameters described in the model data with respect to the front. to obtain a new distribution.
- the vertically and horizontally symmetrical operation is an operation to obtain a vertically and horizontally symmetrical distribution by performing a vertically and horizontally symmetrical operation on the distribution to be operated.
- vMF distributions and Kent distributions that have undergone symmetrical operations including left-right symmetrical operations and vertical symmetrical operations are effective over the entire spherical surface where directivity data are defined during decoding (during restoration).
- a boundary may be defined in the distribution of the operation target or the distribution obtained by the operation, and the directivity gain may be discontinuous at the boundary.
- Fig. 35 shows an example of model data syntax for cross-fading.
- a crossfade flag "fade_flag” and upper limit bin index "bin_range_per_band_fadein[j]" are stored (included).
- the crossfade flag "fade_flag" for each band is stored for the number of bands "band_count”.
- the crossfade flag ⁇ fade_flag'' is the crossfade between adjacent bands in calculating the mixture model F(x; ⁇ ) for each bin, that is, the weighted addition of the mixture model F'(x; ⁇ ) for each band. is flag information indicating whether or not to perform
- crossfading between bands is performed, and when the value is "0”, crossfading between bands is not performed. Note that cross-fading between bands is used in the second and higher bands.
- the upper limit bin index "bin_range_per_band_fadein[j]" is an index indicating the upper limit bin for crossfading between bands, that is, the bin with the highest frequency among the bins within the band for crossfading between bands.
- the directivity data calculation unit 82 calculates the output value F'(x; ⁇ ) of the mixture model obtained for a predetermined band and the mixture model weighted addition with the output value F'(x; ⁇ ) of .
- the directivity data calculator 82 multiplies the output value obtained by the weighted addition by the scale factor, and further adds the minimum value (offset value) to the multiplication result to obtain the Let F(x; ⁇ ) be the output value of the mixture model at the bin of interest.
- the target of crossfading is each bin from the lowest frequency bin in the other band to the upper bin indicated by the upper bin index "bin_range_per_band_fadein[j]" in the other band. There is no crossfade in bins.
- the output value F(x; ⁇ ) of the mixture model is obtained from the output value F'(x; ⁇ ) of the mixture model in the band to which the bin belongs, the scale factor, and the minimum value.
- the calculation of the directional data involves weighting the output values of the reconstructed mixture model between adjacent bands before applying the scale factor and minimum value.
- An additional step is to take the sum (weighted addition value) as the output value of the mixture model for the final band.
- Fig. 36 shows a conceptual diagram of crossfade between bands.
- the vertical axis indicates weights used during cross-fading, and the horizontal axis indicates frequencies. Also, a case where the number of bands is three is shown here as an example.
- the left side shows the weight during weighted addition when cross-fading between bands is not performed.
- Lines L51 to L53 represent the output of the mixture model for each band of the bands “bin_range_per_band[0]” to “bin_range_per_band[2]”, which are used to calculate the output value F(x; ⁇ ) of the mixture model for each bin. It shows the weight of the value F'(x; ⁇ ).
- the ranges of the straight lines L51 to L53 in the frequency direction do not overlap each other, and the weight of the output value F'(x; ⁇ ) of the mixture model for each band for each bin (frequency) is 1. It's becoming Therefore, it can be seen that there is substantially no cross-fade between bands.
- the right side of the figure shows the weights during weighted addition when cross-fading between bands is performed.
- Lines L61 to L63 represent the output of the mixture model for each band of the bands "bin_range_per_band[0]" to "bin_range_per_band[2]", which are used to calculate the output value F(x; ⁇ ) of the mixture model for each bin. It shows the weight of the value F'(x; ⁇ ).
- the right end of the polygonal line L61 indicating the weight of the mixture model output value F'(x; ⁇ ) for the band "bin_range_per_band[0]" corresponds to the frequency position.
- the frequency (bin) at the right end of the polygonal line L61 is a bin within the band "bin_range_per_band[1]" adjacent to the band “bin_range_per_band[0]”, and this bin is the upper limit bin “bin_range_per_band_fadein[ 1]”.
- the bins between the lowest frequency bin and the upper bin “bin_range_per_band_fadein[1]” are cross-faded between the bands. It can be seen that the output value F(x; ⁇ ) of the mixture model for each bin is determined. In this case, the weights are calculated so that the sum of the weights used to calculate the output value F(x; ⁇ ) of the mixture model is 1 for each bin.
- the bins with frequencies higher than the upper limit bin have a weight value of 1 indicated by the polygonal line L62. It can be seen that there is no crossfade between.
- the weight model_weight i_band-1 [i_bin] of the output value of the mixture model for the lower frequency side band "i_band-1" for a given bin "i_bin” is given by the following equation ( 10).
- weight model_weight i_band [i_bin] of the output value of the mixture model of the higher frequency band "i_band” for the predetermined bin “i_bin” can be obtained by the following equation (11).
- scale_factor[i_bin] and offset[i_bin] in equation (12) indicate the scale factor and minimum value (offset value) of the bin "i_bin”.
- the directivity data calculation unit 82 calculates the output value of the mixture model for each bin, that is, the directivity gain for each data point for each bin, by calculating Equation (12). By doing so, the amount of model data can be reduced.
- model data syntax is as shown in Fig. 37, for example.
- the model data shown in FIG. 37 includes the number of frequency points “bin_count” indicating the number of bins, and the center frequencies “bin_freq[i]” of the bins are stored for the number of frequency points “bin_count”. ing.
- model data also stores the number of bands "band_count”. A number “mix_count[j]” and bin information “bin_range_per_band[j]” are stored.
- the symmetry information "use_symmetry[j]” is similar to the symmetry information "use_symmetry” shown in FIG. ” is used without being marked as reserved as described later.
- the number of mixtures "mix_count[j]” and the bin information “bin_range_per_band[j]” are the same as those shown in FIG. This is information indicating bins for the original directivity data.
- the number of mixtures "mix_count[j]” and bin information "bin_range_per_band[j]" are stored for each piece of operation-related information. However, since the number of mixtures and the bin information are the same, in the example of FIG. 37, the number of mixtures and the bin information are stored outside the operation-related information in the model data.
- the value of the symmetry information "use_symmetry[j]" for each band is any value from “0" to "7".
- Values “7”, “6”, and “5” of the symmetry information “use_symmetry[j]” indicate performing vertical and forward/backward symmetrical operations, performing front/back/left/right symmetrical operations, and performing front/backward and backward symmetrical operations. ing.
- the model data stores the crossfade flag "fade_flag" for each band.
- This crossfade flag “fade_flag” is the same as described with reference to FIG. That is, when the value of the crossfade flag “fade_flag” is “1”, crossfading between bands is performed, and when the value is “0”, crossfading between bands is not performed.
- the model data stores the upper limit bin index "bin_range_per_band_fadein[j]" for the band.
- model data stores the start bin "start_bin”.
- the bins with low frequencies may not contain substantially any data. That is, the directivity gain of the low frequency bin may be zero.
- start bin "start_bin” is information indicating the lowest frequency bin containing non-zero directional gain as data among the bins indicated by the frequency "bin_freq[i]".
- model data stores operation-related information for rotation operation or symmetry operation according to the value of the symmetry information "use_symmetry[j]".
- the model data When the value of the symmetry information "use_symmetry[j]" is “7”, the model data describes operation-related information “FrontBackVerticalSymmetricDir()" for vertical and forward-backward symmetrical operations. When the value of the symmetry information “use_symmetry[j]” is “6”, the model data describes operation-related information “FrontBackLeftRightSymmetricDir( )” for front-rear and left-right symmetrical operations.
- the model data describes operation-related information "FrontBackSymmetricDir()" for front-back symmetrical operations.
- the model data When the value of the symmetry information "use_symmetry[j]” is "4", the model data describes the operation-related information “LeftRightVerticalLineSymmetricDir()". When the value of the symmetry information “use_symmetry[j]” is “3”, the model data describes operation-related information “LeftRightLineSymmetricDir( )”.
- the model data describes the operation-related information "VerticalLineSymmetricDir()".
- the model data describes the operation-related information "SymmetricDir()".
- the value of the symmetry information "use_symmetry[j]" is "0"
- the information "NonSymmetricDir()" is described in the model data.
- model data contains information about the dynamic range "DynamicRangeForDir()".
- This information 'DynamicRangeForDir()' contains a scale factor 'scale_factor[i]' and a minimum value 'offset[i] ” is stored.
- FIG. 38 shows an example syntax of information "NonSymmetricDir()" for obtaining directivity data in the model data shown in FIG.
- gamma_azi[j][k] and “gamma_elev[j][k]” indicate the horizontal angle (azimuth) and vertical angle (elevation) indicating the direction of the vector ⁇ 1 .
- the vector ⁇ 1 is represented by "gamma_x[j][k]", “gamma_y[j][k]”, and “gamma_z[j][k]", but in FIG.
- the azimuth and elevation angles represent the vector ⁇ 1 .
- gamma1_azi[j][k] is a horizontal angle (rotation angle) indicating the relative direction of the major axis vector ⁇ 2 and the minor axis vector ⁇ 3 when viewed from the vector ⁇ 1 .
- the major axis vector ⁇ 2 and the minor axis vector ⁇ 3 can be obtained from the vector ⁇ 1 and the angle "gamma1_azi[j][k]".
- FIG. 39 shows a syntax example of the operation-related information "LeftRightLineSymmetricDir()".
- the operation-related information "LeftRightLineSymmetricDir()" contains the number of mixtures "mix_count[k]” for each distribution (mixture) such as the Kent distribution or vMF distribution that constitutes the mixture model representing the distribution of the directional gain in the band. "sym_flag[k]" of is stored.
- “sym_flag[k]” is flag information indicating whether or not to perform operations such as symmetry and rotation on the target distribution. For example, the flag information “sym_flag[k]” value "00" indicates that operations such as symmetry and rotation are not performed, and the flag information “sym_flag[k]” value "01" indicates that symmetric operations are performed. is shown.
- the flag information "sym_flag[k]" in each piece of operation-related information is flag information indicating whether or not to perform an operation corresponding to that piece of operation-related information.
- the rotation axis azimuth angle “sym_azi”, the rotation axis elevation angle “sym_elev”, the rotation angle “sym_rotation”, the yaw angle “sym_yaw”, the pitch angle “sym_pitch”, and the The roll angle "sym_roll” is stored in the operation-related information as appropriate. Then, according to the value of the flag information “sym_flag[k]”, rotation operation and symmetry operation are performed for each distribution constituting the mixture model.
- the operation-related information "SymmetricDir()" has the same configuration as the example shown in Fig. 31, and the operation count information “sym_operation_count” and the operation flag “sym_operation_flag” define whether or not to execute a rotation operation or a symmetrical operation. You may make it
- the model data stores operation-related information "FrontBackVerticalSymmetricDir()", “FrontBackLeftRightSymmetricDir()”, or “FrontBackSymmetricDir()", that is, if the value of the symmetry information "use_symmetry[j]" is "7", If it is "6” or "5", the directivity data calculator 82 performs a symmetrical operation when decoding the directivity data.
- the directivity data calculation unit 82 calculates Perform vertical and forward/backward symmetrical operations to obtain new distributions.
- the directivity data calculation unit 82 calculates the directivity data (directivity gain) from the new distribution and the like. Further, after that, crossfade between bands is also appropriately performed according to the value of the crossfade flag “fade_flag” for each band.
- a vertically symmetrical operation is an operation to obtain a vertically symmetrical distribution by performing a vertically symmetrical operation and a frontally symmetrical operation on the distribution to be operated.
- the vertical symmetrical operation performed in this case is a symmetrical operation in which the front horizontal plane (horizontal plane) seen from the sound source is the cross section SF11 shown in FIG.
- the vertical symmetrical operation is realized by performing the symmetrical operation described with reference to FIG. 33 with the horizontal plane as the cross section SF11.
- the front-rear symmetrical operation is a symmetrical operation in which the plane obtained by rotating the front median plane (median plane) seen from the sound source by 90 degrees in the horizontal direction is the cross section SF11 shown in FIG.
- the front-back symmetrical operation is realized by performing the symmetrical operation described with reference to FIG. 33 with the plane obtained by rotating the front median plane horizontally by 90 degrees as the cross section SF11.
- the directivity data calculation unit 82 performs front-rear and left-right symmetrical operations on the distribution whose flag information “sym_flag[k]” has a value of “01”. Then, a new distribution is obtained, and the directivity data is calculated using the obtained distribution.
- Front-back and left-right symmetry operations are operations that obtain front-rear and left-right symmetrical distributions by performing front-rear and left-right symmetric operations on the distribution to be manipulated.
- the left-right symmetrical operation performed in this case is a symmetrical operation in which the front median plane (median plane) seen from the sound source is the cross section SF11 shown in FIG.
- the directivity data calculation unit 82 calculates the symmetric distribution for the distribution whose flag information “sym_flag[k]” has the value “01”. The operation is performed to obtain a new distribution, and the obtained distribution is used to calculate directivity data.
- Distributions such as the vMF distribution and the Kent distribution that have undergone symmetrical operations including left-right symmetrical operations, vertical symmetrical operations, and front-back symmetrical operations are valid over the entire spherical surface where the directivity data are defined at the time of decoding (reconstruction). Become.
- a boundary may be defined in the distribution of the operation target or the distribution obtained by the operation, and the directivity gain may be discontinuous at the boundary.
- the symmetry and rotation operations specified by the symmetry information "use_symmetry[j]” for each band Flag information “sym_flag[k]” defines whether or not to actually perform symmetry and rotation operations.
- the 1-bit symmetry information "use_symmetry" is flag information indicating whether or not to perform operations such as symmetry and rotation.
- the directivity data calculation unit 82 performs an operation according to the value of the flag information "sym_flag[k]" to obtain a new distribution.
- the directivity data calculation unit 82 calculates a plurality of distributions that make up the mixture model, such as the Kent distribution, the vMF distribution, and the complex Bingham distribution obtained from the model parameters, using the weights ⁇ i of these distributions, that is, the weights described above.
- Mixture model F'(x; ⁇ ) (directivity data) is calculated by weighted addition using [j][k] and weight[i_band][i_mix].
- the value of the weight ⁇ i of each distribution is determined so that the sum of the weights ⁇ i of the multiple distributions that make up the mixture model is 1.
- the value of each weight ⁇ i may be a positive value. , may be negative.
- the weight ⁇ i of one distribution such as the Kent distribution or the vMF distribution that constitutes the mixture model is set to a positive value
- the distribution after the multiplication of the weight ⁇ i is, for example, As indicated by arrow Q101 in FIG.
- the horizontal direction indicates a predetermined direction on the spherical surface in a distribution such as the Kent distribution defined on the spherical surface
- the vertical direction indicates the value at each position of the distribution, that is, the directivity gain. ing.
- the degree of freedom can be increased by appropriately determining the weights ⁇ i of each distribution including negative values. , it becomes possible to express mixture models with more diverse shapes.
- the series of processes described above can be executed by hardware or by software.
- a program that constitutes the software is installed in the computer.
- the computer includes, for example, a computer built into dedicated hardware and a general-purpose personal computer capable of executing various functions by installing various programs.
- FIG. 41 is a block diagram showing a hardware configuration example of a computer that executes the series of processes described above by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input/output interface 505 is further connected to the bus 504 .
- An input unit 506 , an output unit 507 , a recording unit 508 , a communication unit 509 and a drive 510 are connected to the input/output interface 505 .
- the input unit 506 consists of a keyboard, mouse, microphone, imaging device, and the like.
- the output unit 507 includes a display, a speaker, and the like.
- a recording unit 508 is composed of a hard disk, a nonvolatile memory, or the like.
- a communication unit 509 includes a network interface and the like.
- a drive 510 drives a removable recording medium 511 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory.
- the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input/output interface 505 and the bus 504 and executes the above-described series of programs. is processed.
- the program executed by the computer (CPU 501) can be provided by being recorded on a removable recording medium 511 such as package media, for example. Also, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 508 via the input/output interface 505 by loading the removable recording medium 511 into the drive 510 . Also, the program can be received by the communication unit 509 and installed in the recording unit 508 via a wired or wireless transmission medium. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
- the program executed by the computer may be a program that is processed in chronological order according to the order described in this specification, or may be executed in parallel or at a necessary timing such as when a call is made. It may be a program in which processing is performed.
- this technology can take the configuration of cloud computing in which one function is shared by multiple devices via a network and processed jointly.
- each step described in the flowchart above can be executed by a single device, or can be shared by a plurality of devices.
- one step includes multiple processes
- the multiple processes included in the one step can be executed by one device or shared by multiple devices.
- this technology can also be configured as follows.
- an acquisition unit that acquires model data obtained by modeling directivity data representing directivity of a sound source; an information processing apparatus comprising: a calculator that calculates the directivity data based on the model data.
- the model data includes model parameters that constitute the mixture model obtained by modeling the directional data with a mixture model consisting of one or more distributions. processing equipment.
- the one or more distributions include at least one of a vMF distribution and a Kent distribution.
- the directional data includes a directional gain for each of a plurality of frequency bins; (2) or ( The information processing device according to 3).
- model data includes a scale factor indicating a dynamic range of the directional gain in the frequency bin and a minimum value of the directional gain in the frequency bin.
- model data includes difference information indicating a difference between the directivity data before modeling and the directivity data after modeling;
- the information processing apparatus according to any one of (1) to (5), further comprising an addition unit that adds the difference information to the directivity data calculated by the calculation unit.
- the difference information is Huffman-encoded.
- the directional data includes a directional gain for each of a plurality of frequency bins; Any one of (1) to (7), further comprising an interpolation processing unit that calculates the directivity gain of the new frequency bin by performing interpolation processing based on the directivity data calculated by the calculation unit.
- the information processing device according to item 1. the directional data includes a directional gain at each of a plurality of data points; Any one of (1) to (8), further comprising an interpolation processing unit that calculates the directivity gain at the new data point by performing interpolation processing based on the directivity data calculated by the calculation unit.
- model data includes model parameters obtained by modeling the directivity data by one or a plurality of methods different from each other.
- the method includes at least one of a method of modeling by a mixture model consisting of one or more distributions and a method of modeling by spherical harmonic expansion.
- the model data further includes difference information indicating a difference between the directivity data after modeling by the one or more methods and the directivity data before modeling. ).
- the difference information is Huffman-encoded.
- each of the real part and the imaginary part of the difference information is individually Huffman-encoded.
- the model data includes, among spatial positions and frequencies of difference information indicating a difference between the directivity data after modeling by the one or more methods and the directivity data before modeling, The information processing device according to (14) or (15), wherein differential code data obtained by Huffman-encoding at least one of the differences is included.
- the model data includes the difference code data obtained by separately Huffman-encoding a real part and an imaginary part of the difference information.
- the model data includes the model parameters obtained by modeling the directivity data by a predetermined method, the directivity data after modeling by the predetermined method, and the directivity data before modeling.
- the information processing apparatus according to (14) or (15), wherein another model parameter obtained by modeling the difference between is included by a method different from the predetermined method.
- the model data includes the model parameters obtained by modeling the directivity data by a predetermined method, the directivity data after modeling by the predetermined method, and the directivity data before modeling.
- the information processing apparatus according to (14) or (15) which includes other model parameters obtained by modeling the ratio of , using a method different from the predetermined method.
- the information according to (14) or (15), wherein the model data includes model parameters obtained by further modeling the model parameters obtained by modeling the directivity data.
- model data includes the model parameters obtained by modeling the directivity data using a different method for each frequency band. processing equipment.
- the directional data includes a directional gain at each of a plurality of data points;
- the model data includes information indicating an arrangement method of the data points and information for specifying an arrangement position of the data points. processing equipment.
- the model data includes priority information indicating a priority of the directivity data for each type of the sound source.
- the number of data points varies according to the priority;
- the information processing device identifies the arrangement position of the data point using the priority information.
- the directional data includes a directional gain for each frequency bin at each of a plurality of data points;
- the model data includes the directivity gain of the directivity data after modeling by the one or more methods and the directivity gain of the directivity data before modeling after rearranging the difference information.
- the information processing apparatus according to (19), wherein the difference code data of at least one of the difference between the data points of the difference information and the difference between the frequency bins is included.
- the model data includes a parameter obtained by parametricizing at least one of a scale factor indicating a dynamic range of the directional gain in each frequency bin and a minimum value of the directional gain in each frequency bin.
- the information processing device according to (4). (31) the model data includes operation-related information for rotational or symmetrical operations;
- the calculation unit calculates the rotated or symmetrically moved model parameters by performing the rotation operation or the target operation on the model parameters based on the operation-related information, and calculates the rotated or symmetrically moved model parameters.
- the information processing apparatus according to any one of (2) to (5), wherein the directivity data is calculated using the distribution obtained from the model parameters.
- the calculation unit performs weighted addition of the output value of the mixture model of the predetermined band and the output value of the mixture model of the other band adjacent to the predetermined band, thereby obtaining the predetermined frequency
- the information processing device according to (4) or (5), wherein the directivity gain of the bin is calculated.
- the calculation unit calculates the directivity data by weighted addition of the plurality of distributions obtained from the model parameters using a weight including a negative value.
- the information processing device according to the item.
- the information processing device Acquiring model data obtained by modeling directivity data representing the directivity of a sound source, An information processing method for calculating the directivity data based on the model data.
- Acquiring model data obtained by modeling directivity data representing the directivity of a sound source A program that causes a computer to execute a process of calculating the directivity data based on the model data.
- a modeling unit that models directivity data representing the directivity of a sound source using a mixed model consisting of one or more distributions; and a model data generation unit that generates model data including model parameters that constitute the mixture model obtained by the modeling.
- the information processing device Directivity data representing the directivity of a sound source is modeled by a mixture model consisting of one or more distributions, An information processing method for generating model data including model parameters constituting the mixture model obtained by the modeling. (38) Directivity data representing the directivity of a sound source is modeled by a mixture model consisting of one or more distributions, A program that causes a computer to execute a process of generating model data including model parameters that constitute the mixture model obtained by the modeling.
- directivity data representing the directivity of a sound source, the directivity data consisting of directivity gains for each of a plurality of frequency bins at each of a plurality of data points; an acquisition unit that acquires differential directivity data obtained by finding a difference between at least one of an information processing apparatus comprising: a calculator that calculates the directivity data based on the differential directivity data.
- the differential directivity data is Huffman-encoded, The information processing device according to (39), wherein the calculation unit decodes the Huffman-encoded differential directivity data. (41) (40), wherein the real part and the imaginary part of the differential directivity data are individually Huffman-encoded.
- the information processing apparatus according to any one of items.
- the sorting is sorting in a predetermined order, the order of priority of the data points or the frequency bins, the ascending order of the directional gains, or the descending order of the directional gains.
- Information processing equipment (44) The information processing device for directivity data representing directivity of a sound source, the directivity data comprising directivity gains for each of a plurality of frequency bins at each of a plurality of data points; obtaining differential directivity data obtained by determining the difference between at least one of An information processing method, wherein the directivity data is calculated based on the differential directivity data.
- directivity data representing directivity of a sound source
- the directivity data comprising directivity gains for each of a plurality of frequency bins at each of a plurality of data points; obtaining differential directivity data obtained by determining the difference between at least one of A program that causes a computer to execute a process of calculating the directivity data based on the differential directivity data.
Abstract
Description
〈本技術について〉
本技術は、指向性データをモデル化することで、指向性データの伝送量を低減させることができるようにするものである。
音源種別ごとの指向性データの記録方式について説明する。
ここで、指向性データをモデル化することで得られるモデルデータの具体的な例について説明する。
図9は、本技術を適用したサーバの構成例を示す図である。
次に、サーバ11の動作について説明する。すなわち、以下、図10のフローチャートを参照して、サーバ11による符号化処理について説明する。
サーバ11から出力された符号化ビットストリームを取得し、コンテンツの音を再生するための出力オーディオデータを生成する情報処理装置は、例えば図11に示すように構成される。図11に示す情報処理装置51は、例えばパーソナルコンピュータやスマートフォン、タブレット、ゲーム機器などからなる。
次に、情報処理装置51の動作について説明する。
続いて、図13のフローチャートを参照して、情報処理装置51により行われる出力オーディオデータ生成処理について説明する。この出力オーディオデータ生成処理は、図12を参照して説明した指向性データ生成処理が行われた後の任意のタイミングで行われる。
〈差分情報の符号化について〉
ところで、指向性データは、音源種別ごとや周波数帯域ごとに異なる指向性形状を有している。
0x00:多段差分符号化無し
0x01:空間隣接差分方式
0x02:周波数間差分方式
0x03:空間隣接差分方式+周波数間差分方式
(対象データが複素数)
0x1*:下位ビットは対象データ実数の場合と同じ
以上においては、主に指向性データがKent分布やvMF分布からなる混合モデル(混合分布モデル)によりモデル化される例について説明した。
帯域ハイブリッド方式は、周波数帯域ごと、すなわちビンごとやバンドごとにHOA方式、混合方式、複素混合方式、および差分方式のうちの何れの方式を用いてモデルデータを生成するかを切り替える方式である。この場合、例えば低域では複素指向性ゲインでの記録が行われ、高域では実数の指向性ゲインでの記録が行われるようにしてもよい。
加算ハイブリッド方式では、モデル化後の指向性データとの差分を示す差分情報が、さらにモデル化されたり、差分方式により符号化されたりする。
方式(AH2):HOA方式(低次)+混合方式
方式(AH3):HOA方式(低次)+差分方式
方式(AH4):HOA方式(低次)+混合方式+差分方式
乗算ハイブリッド方式では、所定の方式で指向性データがモデル化され、モデル化後の指向性データと、モデル化前の指向性データとの比(商)がさらに所定の方式とは異なる他の方式でモデル化される。
方式(MH2):HOA方式(低次)×振幅位相変調(混合方式)
球面調和係数モデル化方式では、指向性データがHOA方式でモデル化され、その結果得られたモデルパラメータ、すなわち球面調和係数がさらに混合方式でモデル化され、その結果得られたモデルパラメータがモデルデータに格納される。
組み合わせハイブリッド方式では、上述した帯域ハイブリッド方式、加算ハイブリッド方式、乗算ハイブリッド方式、および球面調和係数モデル化方式のうちの少なくとも2以上のものの組み合わせが用いられてモデルデータが生成される。
10:+1dB
11:+2dB
int_nbits_res_data=2;// huffman decodeテーブル(indexからdataを得る逆引きテーブル)の最大語長
Huff_dec_table[4]={0,0,1,2};
0:0dB
1:0dB
2:1dB
3:2dB
(1)最大語長でbitstreamからビット列を取得
(2)ビット列をi_element(huffman codeを最大語長で記録したものと等価)としてhuff_dec_tableを参照
(3)i_elementの要素が復元されたデータを得る
(4)上記データをdb_resolutionを元に復元し、dB値を得る
なお、復元にはオフセット値が必要である。
サーバ11において、1または複数の方式を組み合わせてのモデルデータの生成や差分符号化モードでの差分情報の符号化が行われる場合、例えばサーバ11は図19に示すように構成される。
図19に示した構成のサーバ11から符号化ビットストリームの供給を受けた情報処理装置51は、例えば図20に示す指向性データ生成処理を行うとともに、その後、任意のタイミングで図13を参照して説明した出力オーディオデータ生成処理を行う。
〈指向性データ符号化部の構成例〉
ところで、サーバ11において固定的に加算ハイブリッド方式でモデルデータが生成される場合、図19に示したサーバ11における指向性データ符号化部201の構成は、例えば図21に示す構成とすることができる。なお、図21において図19における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
また、差分符号化部245は、例えば図22に示す構成とすることができる。なお、図22において図19における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
指向性データ符号化部201が図21に示した構成とされる場合、指向性データ符号化部201では、図10のステップS11およびステップS12に対応する処理として、図23に示すモデルデータ生成処理が行われる。
また、指向性データ符号化部201が図21に示した構成とされる場合、情報処理装置51の分布モデル復号部62は、例えば図24に示す構成とされる。なお、図24において図11における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
ところで、上述したモデルデータの構成は、図5に示した構成や、図15および図16に示した構成に限らず、図25に示す構成とすることもできる。
・一様データ配置
・非一様データ配置
for azi in azimuth
data_point(azi, elev)
end
end
(手法DE2):指向性ゲインのデシベル値を昇順または降順でソートして差分符号化
(手法DE3):優先度の高い方位から順にソートして差分符号化
・9bit(0から1までの間の値を512段階で表現)や11bit等の整数フォーマットでダイナミックレンジと必要な解像度に応じて値を割り当て
F(x;Θ)=F’(x;Θ)×scale_factor[i]+offset[i]
End
〈データの対称性の活用について〉
ところで、指向性データには、元の音源の形状に応じて対称性が存在する場合がある。
〈バンド間のクロスフェードについて〉
以上においては、指向性データを周波数帯域ごと、すなわちバンドごとにモデル化してデータ量を削減する手法について説明した。
〈データの対称性の活用について〉
第3の実施の形態においては、データの対称性の活用について説明した。
ところで、図12のステップS52や図20のステップS117など、概形指向性データ(指向性データ)の算出時においては、指向性データ算出部82は、モデルパラメータに基づいて各バンドの混合モデルF’(x;Θ)を算出する。
ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。
音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得する取得部と、
前記モデルデータに基づいて、前記指向性データを算出する算出部と
を備える情報処理装置。
(2)
前記モデルデータには、前記指向性データを1または複数の分布からなる混合モデルによりモデル化することで得られた、前記混合モデルを構成するモデルパラメータが含まれている
(1)に記載の情報処理装置。
(3)
前記1または複数の分布は、vMF分布とKent分布の少なくとも何れかを含む
(2)に記載の情報処理装置。
(4)
前記指向性データは、複数の各周波数ビンの指向性ゲインを含み、
前記モデルデータには、1または複数の前記周波数ビンを含む周波数帯域であるバンドごとに、前記指向性ゲインの分布を表す前記混合モデルを構成する前記モデルパラメータが含まれている
(2)または(3)に記載の情報処理装置。
(5)
前記モデルデータには、前記周波数ビンにおける前記指向性ゲインのダイナミックレンジを示すスケールファクタと、前記周波数ビンにおける前記指向性ゲインの最小値とが含まれている
(4)に記載の情報処理装置。
(6)
前記モデルデータには、モデル化前の前記指向性データと、モデル化後の前記指向性データとの差分を示す差分情報が含まれており、
前記算出部により算出された前記指向性データに、前記差分情報を加算する加算部をさらに備える
(1)乃至(5)の何れか一項に記載の情報処理装置。
(7)
前記差分情報は、ハフマン符号化されている
(6)に記載の情報処理装置。
(8)
前記指向性データは、複数の各周波数ビンの指向性ゲインを含み、
前記算出部により算出された前記指向性データに基づいて補間処理を行うことで、新たな前記周波数ビンの前記指向性ゲインを算出する補間処理部をさらに備える
(1)乃至(7)の何れか一項に記載の情報処理装置。
(9)
前記指向性データは、複数の各データポイントにおける指向性ゲインを含み、
前記算出部により算出された前記指向性データに基づいて補間処理を行うことで、新たな前記データポイントにおける前記指向性ゲインを算出する補間処理部をさらに備える
(1)乃至(8)の何れか一項に記載の情報処理装置。
(10)
前記指向性データとオーディオデータとを畳み込む指向性畳み込み部をさらに備える
(1)乃至(9)の何れか一項に記載の情報処理装置。
(11)
前記指向性データが畳み込まれた前記オーディオデータと、HRTFとを畳み込むHRTF畳み込み部をさらに備える
(10)に記載の情報処理装置。
(12)
前記1または複数の分布は、複素Bingham分布または複素watson分布を含む
(2)に記載の情報処理装置。
(13)
前記モデルデータには、前記指向性データを球面調和関数展開によりモデル化することで得られた球面調和係数がモデルパラメータとして含まれている
(1)に記載の情報処理装置。
(14)
前記モデルデータには、互いに異なる1または複数の方式により前記指向性データをモデル化することで得られたモデルパラメータが含まれている
(1)に記載の情報処理装置。
(15)
前記方式は、1または複数の分布からなる混合モデルによりモデル化する方式、および球面調和関数展開によりモデル化する方式のうちの少なくとも何れかを含む
(14)に記載の情報処理装置。
(16)
前記モデルデータには、前記1または複数の方式によるモデル化後の前記指向性データと、モデル化前の前記指向性データとの差分を示す差分情報がさらに含まれている
(14)または(15)に記載の情報処理装置。
(17)
前記差分情報は、ハフマン符号化されている
(16)に記載の情報処理装置。
(18)
前記差分情報の実部と虚部のそれぞれが個別にハフマン符号化されている
(17)に記載の情報処理装置。
(19)
前記モデルデータには、前記1または複数の方式によるモデル化後の前記指向性データと、モデル化前の前記指向性データとの差分を示す差分情報の空間上の位置間および周波数間のうちの少なくとも何れかの差分をハフマン符号化することで得られた差分符号データが含まれている
(14)または(15)に記載の情報処理装置。
(20)
前記モデルデータには、前記差分情報の差分の実部と虚部のそれぞれを個別にハフマン符号化することで得られた前記差分符号データが含まれている
(19)に記載の情報処理装置。
(21)
前記モデルデータには、前記指向性データを所定の方式によりモデル化することで得られた前記モデルパラメータ、および前記所定の方式によるモデル化後の前記指向性データとモデル化前の前記指向性データとの差分を、前記所定の方式とは異なる方式によりモデル化することで得られた他のモデルパラメータが含まれている
(14)または(15)に記載の情報処理装置。
(22)
前記モデルデータには、前記指向性データを所定の方式によりモデル化することで得られた前記モデルパラメータ、および前記所定の方式によるモデル化後の前記指向性データとモデル化前の前記指向性データとの比を、前記所定の方式とは異なる方式によりモデル化することで得られた他のモデルパラメータが含まれている
(14)または(15)に記載の情報処理装置。
(23)
前記モデルデータには、前記指向性データをモデル化することで得られた前記モデルパラメータをさらにモデル化することで得られたモデルパラメータが含まれている
(14)または(15)に記載の情報処理装置。
(24)
前記モデルデータには、周波数帯域ごとに異なる方式で前記指向性データをモデル化することで得られた前記モデルパラメータが含まれている
(14)乃至(23)の何れか一項に記載の情報処理装置。
(25)
前記指向性データは、複数の各データポイントにおける指向性ゲインを含み、
前記モデルデータには、前記データポイントの配置方式を示す情報、および前記データポイントの配置位置を特定するための情報が含まれている
(1)乃至(24)の何れか一項に記載の情報処理装置。
(26)
前記モデルデータには、前記音源の種別ごとの前記指向性データの優先度を示す優先度情報が含まれている
(25)に記載の情報処理装置。
(27)
前記データポイントの数は前記優先度に応じて変化し、
前記算出部は、前記優先度情報を用いて前記データポイントの配置位置を特定する
(26)に記載の情報処理装置。
(28)
前記指向性データは、複数の各データポイントにおける周波数ビンごとの指向性ゲインを含み、
前記モデルデータには、前記差分情報の並び替え後における、前記1または複数の方式によるモデル化後の前記指向性データの前記指向性ゲインと、モデル化前の前記指向性データの前記指向性ゲインとの差分を示す前記差分情報の前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分の前記差分符号データが含まれている
(19)に記載の情報処理装置。
(29)
前記並び替えは、予め定められた順、前記データポイント若しくは前記周波数ビンの優先度の順、前記差分情報の昇順、または前記差分情報の降順への並び替えである
(28)に記載の情報処理装置。
(30)
前記モデルデータには、各前記周波数ビンにおける前記指向性ゲインのダイナミックレンジを示すスケールファクタと、各前記周波数ビンにおける前記指向性ゲインの最小値との少なくとも何れかをパラメトリック化して得られたパラメータが含まれている
(4)に記載の情報処理装置。
(31)
前記モデルデータには、回転操作または対称操作のための操作関連情報が含まれており、
前記算出部は、前記操作関連情報に基づいて、前記モデルパラメータに対する前記回転操作または前記対象操作を行うことで、回転または対称移動された前記モデルパラメータを算出するとともに、前記回転または対称移動された前記モデルパラメータにより得られる前記分布を用いて前記指向性データを算出する
(2)乃至(5)の何れか一項に記載の情報処理装置。
(32)
前記算出部は、所定の前記バンドの前記混合モデルの出力値と、前記所定の前記バンドに隣接する他の前記バンドの前記混合モデルの出力値とを重み付き加算することで、所定の前記周波数ビンの前記指向性ゲインを算出する
(4)または(5)に記載の情報処理装置。
(33)
前記算出部は、前記モデルパラメータから得られる複数の前記分布を、負の値を含む重みを用いて重み付き加算することで前記指向性データを算出する
(2)乃至(5)の何れか一項に記載の情報処理装置。
(34)
情報処理装置が、
音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得し、
前記モデルデータに基づいて、前記指向性データを算出する
情報処理方法。
(35)
音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得し、
前記モデルデータに基づいて、前記指向性データを算出する
処理をコンピュータに実行させるプログラム。
(36)
音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化するモデル化部と、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成するモデルデータ生成部と
を備える情報処理装置。
(37)
情報処理装置が、
音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化し、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成する
情報処理方法。
(38)
音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化し、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成する
処理をコンピュータに実行させるプログラム。
(39)
音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得する取得部と、
前記差分指向性データに基づいて、前記指向性データを算出する算出部と
を備える情報処理装置。
(40)
前記差分指向性データは、ハフマン符号化されており、
前記算出部は、ハフマン符号化された前記差分指向性データの復号を行う
(39)に記載の情報処理装置。
(41)
前記差分指向性データの実部と虚部のそれぞれが個別にハフマン符号化されている
(40)に記載の情報処理装置。
(42)
前記差分指向性データは、前記指向性ゲインの並び替え後における、前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの前記差分を求めることにより得られたものである
(39)乃至(41)の何れか一項に記載の情報処理装置。
(43)
前記並び替えは、予め定められた順、前記データポイント若しくは前記周波数ビンの優先度の順、前記指向性ゲインの昇順、または前記指向性ゲインの降順への並び替えである (42)に記載の情報処理装置。
(44)
情報処理装置が、
音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得し、
前記差分指向性データに基づいて、前記指向性データを算出する
情報処理方法。
(45)
音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得し、
前記差分指向性データに基づいて、前記指向性データを算出する
処理をコンピュータに実行させるプログラム。
Claims (45)
- 音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得する取得部と、
前記モデルデータに基づいて、前記指向性データを算出する算出部と
を備える情報処理装置。 - 前記モデルデータには、前記指向性データを1または複数の分布からなる混合モデルによりモデル化することで得られた、前記混合モデルを構成するモデルパラメータが含まれている
請求項1に記載の情報処理装置。 - 前記1または複数の分布は、vMF分布とKent分布の少なくとも何れかを含む
請求項2に記載の情報処理装置。 - 前記指向性データは、複数の各周波数ビンの指向性ゲインを含み、
前記モデルデータには、1または複数の前記周波数ビンを含む周波数帯域であるバンドごとに、前記指向性ゲインの分布を表す前記混合モデルを構成する前記モデルパラメータが含まれている
請求項2に記載の情報処理装置。 - 前記モデルデータには、前記周波数ビンにおける前記指向性ゲインのダイナミックレンジを示すスケールファクタと、前記周波数ビンにおける前記指向性ゲインの最小値とが含まれている
請求項4に記載の情報処理装置。 - 前記モデルデータには、モデル化前の前記指向性データと、モデル化後の前記指向性データとの差分を示す差分情報が含まれており、
前記算出部により算出された前記指向性データに、前記差分情報を加算する加算部をさらに備える
請求項1に記載の情報処理装置。 - 前記差分情報は、ハフマン符号化されている
請求項6に記載の情報処理装置。 - 前記指向性データは、複数の各周波数ビンの指向性ゲインを含み、
前記算出部により算出された前記指向性データに基づいて補間処理を行うことで、新たな前記周波数ビンの前記指向性ゲインを算出する補間処理部をさらに備える
請求項1に記載の情報処理装置。 - 前記指向性データは、複数の各データポイントにおける指向性ゲインを含み、
前記算出部により算出された前記指向性データに基づいて補間処理を行うことで、新たな前記データポイントにおける前記指向性ゲインを算出する補間処理部をさらに備える
請求項1に記載の情報処理装置。 - 前記指向性データとオーディオデータとを畳み込む指向性畳み込み部をさらに備える
請求項1に記載の情報処理装置。 - 前記指向性データが畳み込まれた前記オーディオデータと、HRTFとを畳み込むHRTF畳み込み部をさらに備える
請求項10に記載の情報処理装置。 - 前記1または複数の分布は、複素Bingham分布または複素watson分布を含む
請求項2に記載の情報処理装置。 - 前記モデルデータには、前記指向性データを球面調和関数展開によりモデル化することで得られた球面調和係数がモデルパラメータとして含まれている
請求項1に記載の情報処理装置。 - 前記モデルデータには、互いに異なる1または複数の方式により前記指向性データをモデル化することで得られたモデルパラメータが含まれている
請求項1に記載の情報処理装置。 - 前記方式は、1または複数の分布からなる混合モデルによりモデル化する方式、および球面調和関数展開によりモデル化する方式のうちの少なくとも何れかを含む
請求項14に記載の情報処理装置。 - 前記モデルデータには、前記1または複数の方式によるモデル化後の前記指向性データと、モデル化前の前記指向性データとの差分を示す差分情報がさらに含まれている
請求項14に記載の情報処理装置。 - 前記差分情報は、ハフマン符号化されている
請求項16に記載の情報処理装置。 - 前記差分情報の実部と虚部のそれぞれが個別にハフマン符号化されている
請求項17に記載の情報処理装置。 - 前記モデルデータには、前記1または複数の方式によるモデル化後の前記指向性データと、モデル化前の前記指向性データとの差分を示す差分情報の空間上の位置間および周波数間のうちの少なくとも何れかの差分をハフマン符号化することで得られた差分符号データが含まれている
請求項14に記載の情報処理装置。 - 前記モデルデータには、前記差分情報の差分の実部と虚部のそれぞれを個別にハフマン符号化することで得られた前記差分符号データが含まれている
請求項19に記載の情報処理装置。 - 前記モデルデータには、前記指向性データを所定の方式によりモデル化することで得られた前記モデルパラメータ、および前記所定の方式によるモデル化後の前記指向性データとモデル化前の前記指向性データとの差分を、前記所定の方式とは異なる方式によりモデル化することで得られた他のモデルパラメータが含まれている
請求項14に記載の情報処理装置。 - 前記モデルデータには、前記指向性データを所定の方式によりモデル化することで得られた前記モデルパラメータ、および前記所定の方式によるモデル化後の前記指向性データとモデル化前の前記指向性データとの比を、前記所定の方式とは異なる方式によりモデル化することで得られた他のモデルパラメータが含まれている
請求項14に記載の情報処理装置。 - 前記モデルデータには、前記指向性データをモデル化することで得られた前記モデルパラメータをさらにモデル化することで得られたモデルパラメータが含まれている
請求項14に記載の情報処理装置。 - 前記モデルデータには、周波数帯域ごとに異なる方式で前記指向性データをモデル化することで得られた前記モデルパラメータが含まれている
請求項14に記載の情報処理装置。 - 前記指向性データは、複数の各データポイントにおける指向性ゲインを含み、
前記モデルデータには、前記データポイントの配置方式を示す情報、および前記データポイントの配置位置を特定するための情報が含まれている
請求項1に記載の情報処理装置。 - 前記モデルデータには、前記音源の種別ごとの前記指向性データの優先度を示す優先度情報が含まれている
請求項25に記載の情報処理装置。 - 前記データポイントの数は前記優先度に応じて変化し、
前記算出部は、前記優先度情報を用いて前記データポイントの配置位置を特定する
請求項26に記載の情報処理装置。 - 前記指向性データは、複数の各データポイントにおける周波数ビンごとの指向性ゲインを含み、
前記モデルデータには、前記差分情報の並び替え後における、前記1または複数の方式によるモデル化後の前記指向性データの前記指向性ゲインと、モデル化前の前記指向性データの前記指向性ゲインとの差分を示す前記差分情報の前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分の前記差分符号データが含まれている
請求項19に記載の情報処理装置。 - 前記並び替えは、予め定められた順、前記データポイント若しくは前記周波数ビンの優先度の順、前記差分情報の昇順、または前記差分情報の降順への並び替えである
請求項28に記載の情報処理装置。 - 前記モデルデータには、各前記周波数ビンにおける前記指向性ゲインのダイナミックレンジを示すスケールファクタと、各前記周波数ビンにおける前記指向性ゲインの最小値との少なくとも何れかをパラメトリック化して得られたパラメータが含まれている
請求項4に記載の情報処理装置。 - 前記モデルデータには、回転操作または対称操作のための操作関連情報が含まれており、
前記算出部は、前記操作関連情報に基づいて、前記モデルパラメータに対する前記回転操作または前記対象操作を行うことで、回転または対称移動された前記モデルパラメータを算出するとともに、前記回転または対称移動された前記モデルパラメータにより得られる前記分布を用いて前記指向性データを算出する
請求項2に記載の情報処理装置。 - 前記算出部は、所定の前記バンドの前記混合モデルの出力値と、前記所定の前記バンドに隣接する他の前記バンドの前記混合モデルの出力値とを重み付き加算することで、所定の前記周波数ビンの前記指向性ゲインを算出する
請求項4に記載の情報処理装置。 - 前記算出部は、前記モデルパラメータから得られる複数の前記分布を、負の値を含む重みを用いて重み付き加算することで前記指向性データを算出する
請求項2に記載の情報処理装置。 - 情報処理装置が、
音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得し、
前記モデルデータに基づいて、前記指向性データを算出する
情報処理方法。 - 音源の指向性を表す指向性データをモデル化することにより得られたモデルデータを取得し、
前記モデルデータに基づいて、前記指向性データを算出する
処理をコンピュータに実行させるプログラム。 - 音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化するモデル化部と、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成するモデルデータ生成部と
を備える情報処理装置。 - 情報処理装置が、
音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化し、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成する
情報処理方法。 - 音源の指向性を表す指向性データを、1または複数の分布からなる混合モデルによりモデル化し、
前記モデル化により得られた、前記混合モデルを構成するモデルパラメータを含むモデルデータを生成する
処理をコンピュータに実行させるプログラム。 - 音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得する取得部と、
前記差分指向性データに基づいて、前記指向性データを算出する算出部と
を備える情報処理装置。 - 前記差分指向性データは、ハフマン符号化されており、
前記算出部は、ハフマン符号化された前記差分指向性データの復号を行う
請求項39に記載の情報処理装置。 - 前記差分指向性データの実部と虚部のそれぞれが個別にハフマン符号化されている
請求項40に記載の情報処理装置。 - 前記差分指向性データは、前記指向性ゲインの並び替え後における、前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの前記差分を求めることにより得られたものである
請求項39に記載の情報処理装置。 - 前記並び替えは、予め定められた順、前記データポイント若しくは前記周波数ビンの優先度の順、前記指向性ゲインの昇順、または前記指向性ゲインの降順への並び替えである
請求項42に記載の情報処理装置。 - 情報処理装置が、
音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得し、
前記差分指向性データに基づいて、前記指向性データを算出する
情報処理方法。 - 音源の指向性を表す指向性データであって、複数の各データポイントにおける複数の各周波数ビンの指向性ゲインからなる指向性データに対して、前記指向性ゲインの前記データポイント間および前記周波数ビン間のうちの少なくとも何れかの差分を求めることにより得られた差分指向性データを取得し、
前記差分指向性データに基づいて、前記指向性データを算出する
処理をコンピュータに実行させるプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2022375400A AU2022375400A1 (en) | 2021-10-29 | 2022-10-27 | Information processing device, method, and program |
TW111141214A TW202325040A (zh) | 2021-10-29 | 2022-10-28 | 資訊處理裝置及方法、以及程式 |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021-177285 | 2021-10-29 | ||
JP2021177285 | 2021-10-29 | ||
JPPCT/JP2022/000355 | 2022-01-07 | ||
PCT/JP2022/000355 WO2023074009A1 (ja) | 2021-10-29 | 2022-01-07 | 情報処理装置および方法、並びにプログラム |
JPPCT/JP2022/024014 | 2022-06-15 | ||
PCT/JP2022/024014 WO2023074039A1 (ja) | 2021-10-29 | 2022-06-15 | 情報処理装置および方法、並びにプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023074800A1 true WO2023074800A1 (ja) | 2023-05-04 |
Family
ID=86159688
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/024014 WO2023074039A1 (ja) | 2021-10-29 | 2022-06-15 | 情報処理装置および方法、並びにプログラム |
PCT/JP2022/040170 WO2023074800A1 (ja) | 2021-10-29 | 2022-10-27 | 情報処理装置および方法、並びにプログラム |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/024014 WO2023074039A1 (ja) | 2021-10-29 | 2022-06-15 | 情報処理装置および方法、並びにプログラム |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU2022375400A1 (ja) |
TW (1) | TW202325040A (ja) |
WO (2) | WO2023074039A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007006359A (ja) * | 2005-06-27 | 2007-01-11 | Sony Corp | 復号化装置,復号化方法及びデジタル音声通信システム |
JP2008107629A (ja) * | 2006-10-26 | 2008-05-08 | Nec Corp | オーディオ信号の符号化復号化方法、この方法を実施するための装置及びプログラム |
WO2010109918A1 (ja) * | 2009-03-26 | 2010-09-30 | パナソニック株式会社 | 復号化装置、符号化復号化装置および復号化方法 |
WO2020255810A1 (ja) * | 2019-06-21 | 2020-12-24 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
-
2022
- 2022-06-15 WO PCT/JP2022/024014 patent/WO2023074039A1/ja unknown
- 2022-10-27 WO PCT/JP2022/040170 patent/WO2023074800A1/ja active Application Filing
- 2022-10-27 AU AU2022375400A patent/AU2022375400A1/en active Pending
- 2022-10-28 TW TW111141214A patent/TW202325040A/zh unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007006359A (ja) * | 2005-06-27 | 2007-01-11 | Sony Corp | 復号化装置,復号化方法及びデジタル音声通信システム |
JP2008107629A (ja) * | 2006-10-26 | 2008-05-08 | Nec Corp | オーディオ信号の符号化復号化方法、この方法を実施するための装置及びプログラム |
WO2010109918A1 (ja) * | 2009-03-26 | 2010-09-30 | パナソニック株式会社 | 復号化装置、符号化復号化装置および復号化方法 |
WO2020255810A1 (ja) * | 2019-06-21 | 2020-12-24 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Also Published As
Publication number | Publication date |
---|---|
WO2023074039A1 (ja) | 2023-05-04 |
AU2022375400A1 (en) | 2024-04-11 |
TW202325040A (zh) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7400910B2 (ja) | 音声処理装置および方法、並びにプログラム | |
RU2555221C2 (ru) | Канальное кодирование на основе комплексного преобразования с частотным кодированием с расширенной полосой | |
US8190425B2 (en) | Complex cross-correlation parameters for multi-channel audio | |
US8379868B2 (en) | Spatial audio coding based on universal spatial cues | |
US7953604B2 (en) | Shape and scale parameters for extended-band frequency coding | |
US8964994B2 (en) | Encoding of multichannel digital audio signals | |
KR102659722B1 (ko) | 공간 확장 음원을 재생하는 장치 및 방법 또는 공간 확장 음원으로부터 비트 스트림을 생성하는 장치 및 방법 | |
US20150163615A1 (en) | Method and device for rendering an audio soundfield representation for audio playback | |
CN105340009A (zh) | 声场的经分解表示的压缩 | |
CN106133828A (zh) | 编码装置和编码方法、解码装置和解码方法及程序 | |
WO2023074800A1 (ja) | 情報処理装置および方法、並びにプログラム | |
WO2023074009A1 (ja) | 情報処理装置および方法、並びにプログラム | |
KR20210071972A (ko) | 신호 처리 장치 및 방법, 그리고 프로그램 | |
WO2018190151A1 (ja) | 信号処理装置および方法、並びにプログラム | |
CN105340008A (zh) | 声场的经分解表示的压缩 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22887125 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023556636 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022375400 Country of ref document: AU Ref document number: AU2022375400 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2022375400 Country of ref document: AU Date of ref document: 20221027 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112024008012 Country of ref document: BR |