WO2018198789A1 - 信号処理装置および方法、並びにプログラム - Google Patents
信号処理装置および方法、並びにプログラム Download PDFInfo
- Publication number
- WO2018198789A1 WO2018198789A1 PCT/JP2018/015352 JP2018015352W WO2018198789A1 WO 2018198789 A1 WO2018198789 A1 WO 2018198789A1 JP 2018015352 W JP2018015352 W JP 2018015352W WO 2018198789 A1 WO2018198789 A1 WO 2018198789A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- priority information
- information
- priority
- audio signal
- unit
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000005236 sound signal Effects 0.000 claims description 191
- 230000008569 process Effects 0.000 claims description 35
- 238000001514 detection method Methods 0.000 claims description 19
- 238000003672 processing method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 18
- 238000009877 rendering Methods 0.000 description 18
- 238000012856 packing Methods 0.000 description 17
- 238000009499 grossing Methods 0.000 description 14
- 230000008859 change Effects 0.000 description 4
- 102000003712 Complement factor B Human genes 0.000 description 3
- 108090000056 Complement factor B Proteins 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the present technology relates to a signal processing device, method, and program, and more particularly, to a signal processing device, method, and program that can reduce the amount of decoding calculation at low cost.
- MPEG Motion Picture Experts Group
- 3D audio standard which is an international standard
- the amount of calculation at the time of decoding is reduced by transmitting priority information indicating the priority of each audio object to the decoding device side.
- priority information there are many contents that are not given priority information.
- whether or not priority information is included in encoded data can be switched by a flag in the header part. That is, the presence of encoded data to which priority information is not assigned is also permitted.
- the present technology has been made in view of such a situation, and is capable of reducing the calculation amount of decoding at low cost.
- the signal processing device includes a priority information generation unit that generates priority information of the audio object based on a plurality of elements representing the characteristics of the audio object.
- the element can be metadata of the audio object.
- the element can be the position of the audio object in space.
- the element can be a distance from a reference position in the space to the audio object.
- the element can be a horizontal angle indicating the horizontal position of the audio object in the space.
- the priority information generation unit can generate the priority information according to the moving speed of the audio object based on the metadata.
- the element can be gain information to be multiplied by the audio signal of the audio object.
- the priority information of the processing target unit time is obtained based on a difference between the gain information of the processing target unit time and an average value of the gain information of a plurality of unit times. Can be generated.
- the priority information generation unit can generate the priority information based on the sound pressure of the audio signal multiplied by the gain information.
- the element can be spread information.
- the priority information generation unit can generate the priority information corresponding to the area of the audio object based on the spread information.
- the element can be information indicating the sound attribute of the audio object.
- the element can be an audio signal of the audio object.
- the priority information generation unit can generate the priority information based on a result of a voice section detection process for the audio signal.
- the priority information generating unit can perform smoothing in the time direction on the generated priority information to obtain final priority information.
- the signal processing method or program includes a step of generating priority information of the audio object based on a plurality of elements representing characteristics of the audio object.
- priority information of the audio object is generated based on a plurality of elements representing the characteristics of the audio object.
- the calculation amount of decoding can be reduced at low cost.
- This technology generates audio object priority information based on audio object metadata, content information, audio object audio signal, and other elements that represent audio object characteristics, thereby reducing the cost of decoding. The amount can be reduced.
- the multi-channel audio signal and the audio signal of the audio object are encoded according to a predetermined standard or the like.
- the audio object is also simply referred to as an object.
- audio signals of each channel and each object are encoded and transmitted for each frame.
- an encoded audio signal and information necessary for decoding the audio signal are stored in a plurality of elements (bit stream elements), and a bit stream composed of these elements is transmitted from the encoding side to the decoding side. .
- a plurality of elements are arranged in order from the top, and finally an identifier indicating the end position regarding the information of the frame is arranged.
- the element arranged at the head is an ancillary data area called DSE (Data Stream Element), and the DSE describes information about each of a plurality of channels such as information on audio signal downmix and identification information. .
- DSE Data Stream Element
- each element following DSE stores an encoded audio signal.
- an element storing a single channel audio signal is called SCE (Single Channel Element), and an element storing a pair of two channel audio signals is called CPE (Coupling Channel Element). It is.
- the audio signal of each object is stored in the SCE.
- the priority information of the audio signal of each object is generated and stored in the DSE.
- the priority information is information indicating the priority of the object.
- the higher the priority value indicated by the priority information that is, the higher the numerical value indicating the priority, the higher the priority of the object. It shows that there is.
- the priority information of each object is generated based on the metadata of the object. Therefore, even if priority information is not given to content, the amount of calculation of decoding can be reduced. In other words, it is possible to reduce the calculation amount of decoding at low cost without giving priority information manually.
- FIG. 1 is a diagram illustrating a configuration example of an encoding device to which the present technology is applied.
- 1 includes a channel audio encoding unit 21, an object audio encoding unit 22, a metadata input unit 23, and a packing unit 24.
- the channel audio encoding unit 21 is supplied with an audio signal of each multi-channel channel having M channels.
- the audio signal of each channel is supplied from a microphone corresponding to the channel.
- the characters “# 0” to “# M ⁇ 1” represent the channel numbers of the respective channels.
- the channel audio encoding unit 21 encodes the supplied audio signal of each channel, and supplies the encoded data obtained by the encoding to the packing unit 24.
- the audio signal of each of N objects is supplied to the object audio encoding unit 22.
- the audio signal of each object is supplied from a microphone attached to the object.
- the characters “# 0” to “# N ⁇ 1” represent the object number of each object.
- the object audio encoding unit 22 encodes the supplied audio signal of each object. Also, the object audio encoding unit 22 generates priority information based on the supplied audio signal, the metadata and content information supplied from the metadata input unit 23, and the encoded data obtained by encoding And the priority information are supplied to the packing unit 24.
- the metadata input unit 23 supplies the metadata and content information of each object to the object audio encoding unit 22 and the packing unit 24.
- object metadata includes object position information indicating the position of the object in space, spread information indicating the range of the size of the sound image of the object, gain information indicating the gain of the audio signal of the object, and the like.
- the content information includes information related to the sound attribute of each object in the content.
- the packing unit 24 includes encoded data supplied from the channel audio encoding unit 21, encoded data and priority information supplied from the object audio encoding unit 22, and metadata supplied from the metadata input unit 23.
- a bit stream is generated by packing content information and output.
- the bit stream thus obtained includes encoded data of each channel, encoded data of each object, priority information of each object, and metadata and content information of each object for each frame.
- the audio signals of M channels and the audio signals of N objects stored in the bit stream for one frame are the audio signals of the same frame to be reproduced simultaneously.
- priority information is generated for each audio signal for each frame as the priority information of the audio signal of each object.
- One priority information may be generated for audio signals for frames.
- the object audio encoding unit 22 in FIG. 1 is configured in more detail as shown in FIG. 2, for example.
- the object audio encoding unit 22 shown in FIG. 2 includes an encoding unit 51 and a priority information generation unit 52.
- the encoding unit 51 includes an MDCT (Modified Discrete Cosine Transform) unit 61, and the encoding unit 51 encodes an audio signal of each object supplied from the outside.
- MDCT Modified Discrete Cosine Transform
- the MDCT unit 61 performs MDCT (modified discrete cosine transform) on the audio signal of each object supplied from the outside.
- the encoding unit 51 encodes the MDCT coefficient of each object obtained by MDCT, and supplies the encoded data of each object obtained as a result, that is, the encoded audio signal, to the packing unit 24.
- the priority information generation unit 52 receives at least one of an audio signal of each object supplied from the outside, metadata supplied from the metadata input unit 23, and content information supplied from the metadata input unit 23. Based on this, priority information of the audio signal of each object is generated and supplied to the packing unit 24.
- the priority information generation unit 52 generates priority information of an object based on one or more elements representing the characteristics of the object such as an audio signal, metadata, and content information.
- an audio signal is an element that represents characteristics of an object's sound
- metadata is an element that represents characteristics such as the position of the object, the extent of sound image, and gain
- content information is an element that represents characteristics of the object's sound attributes It is.
- gain information is stored in the metadata of the object, and the audio signal multiplied by this gain information is used as the final audio signal of the object.
- the sound pressure of will change.
- the priority information generation unit 52 generates priority information using at least information other than the sound pressure of the audio signal. Thereby, appropriate priority information can be obtained.
- the priority information is generated by at least one of the following methods (1) to (4).
- the object metadata includes object position information, spread information, and gain information. Therefore, it is conceivable to generate priority information using these object position information, spread information, and gain information.
- the object position information is information indicating the position of the object in the three-dimensional space.
- coordinate information including a horizontal angle a, a vertical angle e, and a radius r indicating the position of the object viewed from the reference position (origin). Is done.
- the horizontal angle a is a horizontal angle (azimuth angle) indicating the horizontal position of the object viewed from the reference position where the user is located, that is, the reference direction in the horizontal direction and the object viewed from the reference position. It is the angle made with the direction.
- the horizontal direction angle a when the horizontal direction angle a is 0 degree, the object is located in front of the user, and when the horizontal direction angle a is 90 degrees or -90 degrees, the object is located directly beside the user. It will be. When the horizontal direction angle a is 180 degrees or ⁇ 180 degrees, the object is located immediately behind the user.
- the vertical direction angle e is a vertical angle (elevation angle) indicating a vertical position of the object viewed from the reference position, that is, an angle formed between the reference direction in the vertical direction and the object direction viewed from the reference position. It is.
- the radius r is the distance from the reference position to the object position.
- an object having a short distance from the origin (reference position) that is the user's position that is, an object that has a small radius r and is close to the origin is considered to be more important than an object that is far from the origin. Therefore, the priority indicated by the priority information can be increased as the radius r is smaller.
- the priority information generation unit 52 generates the priority information of the object by calculating the following equation (1) based on the radius r of the object.
- priority information is also referred to as priority.
- Equation (1) the smaller the radius r, the larger the value of the priority information priority and the higher the priority.
- the priority information generation unit 52 generates the priority information of the object by calculating the following equation (2) based on the horizontal direction angle a of the object. However, when the horizontal direction angle a is less than 1 degree, the value of the priority information priority of the object is 1.
- abs (a) indicates the absolute value of the horizontal direction angle a. Therefore, in this example, the value of the priority information priority increases as the horizontal direction angle a is smaller and the position of the object is closer to the position in front of the user.
- the priority indicated by the priority information can be increased as the time change amount of the object position information is larger, that is, as the moving speed of the object is faster.
- the priority information generation unit 52 calculates the following expression (3) based on the horizontal direction angle a, the vertical direction angle e, and the radius r included in the object position information of the object, thereby obtaining the object's object position information. Priority information corresponding to the moving speed is generated.
- a (i), e (i), and r (i) indicate the horizontal angle a, the vertical angle e, and the radius r of the object in the current frame to be processed, respectively.
- a (i-1), e (i-1), and r (i-1) are the horizontal angle a of the object in the frame immediately before the current frame to be processed, A vertical angle e and a radius r are shown.
- Equation (3) indicates the horizontal speed of the object
- the right side of Equation (3) corresponds to the speed of the entire object. That is, the value of the priority information priority indicated by the expression (3) increases as the speed of the object increases.
- the object metadata includes, as gain information, a coefficient value to be multiplied with the audio signal of the object at the time of decoding.
- the gain information value that is, the coefficient value as gain information is larger
- the sound pressure of the audio signal of the final object after the multiplication of the coefficient value is increased, and thereby the sound of the object is easily perceived by humans.
- An object that increases sound pressure by giving large gain information is considered an important object in the content.
- the priority information generation unit 52 calculates the following expression (4) based on the gain information of the object, that is, the coefficient value g that is the gain indicated by the gain information, so that the priority of the object Generate information.
- the coefficient value g itself that is gain information is set as priority information priority.
- a time average value of gain information (coefficient value g) of a plurality of frames of one object is denoted as a time average value g ave .
- the time average value g ave is a time average value of gain information of a plurality of consecutive frames in the past from the processing target frame.
- the frame difference is large between the gain information and the time average value g ave, more in frame substantially greater than the coefficient value g is the time average value g ave more, the difference between the coefficient value g and the time average value g ave
- the importance of the object is considered high compared to the small frame. In other words, it is considered that the importance of the object is high in the frame in which the coefficient value g suddenly increases.
- the priority information generation unit 52 calculates the following expression (5) based on the gain information of the object, that is, the coefficient value g and the time average value g ave , thereby calculating the priority of the object. Generate information. In other words, priority information is generated based on the difference between the coefficient value g of the current frame and the time average value g ave .
- g (i) indicates the coefficient value g of the current frame. Therefore, in this example, the value of the priority information priority becomes larger as the coefficient value g (i) of the current frame is larger than the time average value g ave . In other words, in the example shown in Expression (5), the importance of the object is high in the frame in which the gain information is rapidly increased, and the priority indicated by the priority information is also high.
- the time average value g ave may be an exponential average value based on the gain information (coefficient value g) of a plurality of past frames of the object or an average value of the gain information of the object over the entire content.
- Spread information is angle information indicating the range of the size of the sound image of the object, that is, angle information indicating the degree of spread of the sound image of the object.
- the spread information can be said to be information indicating the size of the object area.
- an angle indicating the range of the size of the sound image of the object indicated by the spread information is referred to as a spread angle.
- An object with a large spread angle is an object that appears large on the screen. Therefore, it is considered that an object having a large spread angle is more likely to be an important object in the content than an object having a small spread angle. Therefore, the priority indicated by the priority information can be made higher as the spread angle indicated by the spread information is larger.
- the priority information generation unit 52 generates the priority information of the object by calculating the following equation (6) based on the spread information of the object.
- s indicates a spread angle indicated by spread information.
- the square value of the spread angle s is set as the value of the priority information priority in order to reflect the area of the object region, that is, the width of the sound image range in the value of the priority information priority. Therefore, the priority information corresponding to the area of the object area, that is, the area of the sound image area of the object sound, is generated by the calculation of Expression (6).
- spread directions may be given as spread information, that is, horizontal and vertical spread angles that are perpendicular to each other.
- the spread information includes a horizontal spread angle s width and a vertical spread angle s height .
- objects having different sizes in the horizontal direction and the vertical direction, that is, different degrees of spread can be represented by spread information.
- the priority information generation unit 52 calculates the following expression (7) based on the spread information of the object, thereby obtaining the object Generate priority information for.
- the product of the spread angle s width and the spread angle s height is the priority information priority.
- priority information is generated based on object metadata such as object position information, spread information, and gain information.
- object metadata such as object position information, spread information, and gain information.
- some object audio encoding schemes include content information as information about each object.
- the sound information of the object is specified by the content information. That is, the content information includes information indicating the sound attribute of the object.
- whether or not the sound of the object depends on the language according to the content information, the language type of the sound of the object, whether or not the sound of the object is a sound, and the sound of the object is an environmental sound Or not.
- the object when the sound of an object is sound, the object is considered to be more important than other objects such as environmental sounds. This is because in content such as movies and news, the amount of information by sound is larger than the amount of information by other sounds, and human hearing is more sensitive to sound.
- the priority of an object that is a voice can be made higher than the priority of an object having another attribute.
- the priority information generation unit 52 generates the priority information of the object by the calculation of the following equation (8) based on the content information of the object.
- object_class indicates the sound attribute of the object indicated by the content information.
- the priority information value is 10
- the sound attribute of the object indicated by the content information is not sound. That is, for example, in the case of an environmental sound, the value of the priority information is set to 1.
- VAD voice segment detection processing
- the priority information of the object may be generated based on the detection result (processing result).
- the detection result indicating that the sound of the object is a voice is obtained as a result of the voice section detection process, it is more than when another detection result is obtained. Also, the priority indicated by the priority information is made higher.
- the priority information generation unit 52 performs voice section detection processing on the audio signal of the object, and generates the priority information of the object by the calculation of the following equation (9) based on the detection result. .
- object_class_vad represents the sound attribute of the object obtained as a result of the voice segment detection process.
- the priority information when the sound attribute of the object is sound, that is, when a detection result indicating that the sound of the object is sound is obtained as a detection result by the sound section detection processing, the priority information The value is 10.
- the value of the priority information when the sound attribute of the object is not sound, that is, when a detection result indicating that the sound of the object is sound is not obtained as a detection result by the sound section detection process, the value of the priority information Is set to 1.
- priority information may be generated based on the voice segment likelihood value. In such a case, the priority becomes higher as the current frame of the object is more likely to be a voice section.
- priority information may be generated based on the sound pressure of a signal obtained by multiplying the audio signal of the object by gain information. That is, priority information may be generated based on gain information and an audio signal.
- the priority information generation unit 52 multiplies the audio signal of the object by the gain information, and obtains the sound pressure of the audio signal after the gain information multiplication. Then, the priority information generation unit 52 generates priority information based on the obtained sound pressure. At this time, for example, the priority information is generated such that the higher the sound pressure, the higher the priority.
- priority information is generated based on elements representing object characteristics, such as object metadata, content information, and audio signals.
- the present invention is not limited to the above-described example, and the calculated priority information such as a value obtained by the calculation of equation (1) is further multiplied by a predetermined coefficient or a predetermined constant is added. Things may be final priority information.
- More appropriate priority information can be obtained by combining a plurality of priority information, that is, by combining a plurality of priority information.
- priority information calculated based on object position information and priority information calculated based on spread information are linearly combined to form one final priority information.
- the object is considered to be an important object.
- the object is in front of the user, it is considered that the object is not an important object when the size of the sound image of the object is small.
- the final priority information may be obtained by a linear sum of the priority information obtained based on the object position information and the priority information obtained based on the spread information.
- the priority information generation unit 52 linearly combines a plurality of priority information, for example, by calculating the following equation (10), and generates one final priority information for the object.
- priority (position) indicates the priority information obtained based on the object position information
- priority (spread) indicates the priority information obtained based on the spread information.
- priority indicates priority information obtained by, for example, Expression (1), Expression (2), Expression (3), or the like.
- priority (spread) indicates the priority information obtained by, for example, Expression (6) or Expression (7).
- a and B indicate linear sum coefficients. In other words, it can be said that A and B indicate weighting factors used to generate priority information.
- setting method 1 a method of setting equal weights according to a range based on a generation formula of priority information to be linearly combined (hereinafter also referred to as setting method 1) can be considered.
- setting method 2 a method of changing the weighting coefficient by reporting the case
- the priority information obtained from the above-described equation (2) is priority (position)
- the priority information obtained from the above-described equation (6) is priority (spread).
- the value range of the priority information priority is 1 / ⁇ to 1
- the value range of the priority information priority is 0 to ⁇ 2 .
- Priority information priority can be generated.
- the weighting factor A is ⁇ / ( ⁇ + 1)
- the weighting factor B is 1 / ( ⁇ + 1).
- priority information calculated based on, for example, content information and priority information calculated based on information other than the content information are nonlinearly combined to form one final priority information.
- the content information it is possible to specify whether or not the sound of the object is sound.
- the sound of the object is a voice
- it is desirable that the value of the priority information finally obtained is large regardless of the information other than the content information used for generating the priority information. This is because a speech object generally has a larger amount of information than other objects and is considered to be a more important object.
- the priority information generation unit 52 when the priority information calculated based on the content information and the priority information calculated based on information other than the content information are combined into final priority information, for example, the priority information generation unit 52 Then, the following equation (11) is calculated using the weighting coefficient determined by the setting method 2 described above to generate one final priority information.
- priority (object_class) indicates the priority information obtained based on the content information, for example, the priority information obtained by Expression (8) described above. Further, priority (others) indicates priority information obtained based on information other than the content information, for example, object position information, gain information, spread information, an object audio signal, and the like.
- a and B are values of powers of nonlinear sums, but it can be said that these A and B indicate weighting factors used for generating priority information.
- the sound of the object is speech
- the final priority information priority value becomes sufficiently large, and the object is not speech.
- the priority information is never smaller.
- the magnitude relationship between the priority information of two objects that are voices is determined by the value of priority (others) B , which is the second term of Expression (11).
- more appropriate priority information can be obtained by combining a plurality of pieces of priority information obtained by a plurality of different methods by linear combination or non-linear combination.
- the present invention is not limited to this, and one final priority information may be generated by a conditional expression of a plurality of priority information.
- priority information is generated from object metadata, content information, etc., or a plurality of priority information is combined to obtain one final priority.
- An example of generating degree information has been described. However, it is not desirable that the magnitude relationship of the priority information of a plurality of objects changes many times during a short period.
- the sound of the object can be heard or cannot be heard every short time due to the change in the magnitude relationship of the priority information of a plurality of objects. Will be. If this happens, the auditory degradation will occur.
- the priority information generation unit 52 for example, if the priority information is smoothed in the time direction by exponential averaging by performing the calculation shown in the following equation (12), the magnitude relationship of the priority information of the object is switched in a short time. This can be suppressed.
- i indicates an index indicating the current frame
- i-1 indicates an index indicating a frame immediately before the current frame.
- priority (i) indicates the priority information before smoothing obtained for the current frame, and priority (i) is obtained by, for example, any one of the equations (1) to (11) described above. This is the requested priority information.
- priority_smooth (i) indicates the priority information after smoothing the current frame, that is, final priority information
- priority_smooth (i-1) indicates the smoothed frame of the frame immediately before the current frame.
- the priority information is shown.
- ⁇ represents an exponential average smoothing coefficient
- the smoothing coefficient ⁇ is a value between 0 and 1.
- the value obtained by subtracting the priority information priority_smooth (i-1) multiplied by (1- ⁇ ) from the priority information priority (i) multiplied by the smoothing coefficient ⁇ is used as the final priority information.
- priority_smooth (i) the priority information is smoothed.
- final priority information priority_smooth (i) of the current frame is generated by performing smoothing in the time direction on the priority information priority (i) of the generated current frame.
- the priority information may be smoothed by a smoothing method.
- the priority information of the object is generated based on the metadata or the like, it is possible to reduce the cost of manually assigning the priority information of the object. Moreover, even if the priority information of the object is encoded data that is not properly assigned for all times (frames), the priority information can be appropriately given, and as a result, the amount of calculation for decoding is reduced. Can be made.
- the encoding device 11 performs encoding processing when the audio signals of the plurality of channels and the audio signals of the plurality of objects to be reproduced simultaneously are supplied for one frame, and the encoded audio signal is converted into an encoded audio signal. Output the included bitstream.
- step S11 the priority information generation unit 52 of the object audio encoding unit 22 generates the priority information of the supplied audio signal of each object, and supplies it to the packing unit 24.
- the metadata input unit 23 obtains metadata and content information of each object by receiving a user input operation, communicating with the outside, or reading from an external recording area, and prioritizes. This is supplied to the degree information generating unit 52 and the packing unit 24.
- the priority information generation unit 52 is based on at least one of the supplied audio signal, the metadata supplied from the metadata input unit 23, and the content information supplied from the metadata input unit 23 for each object. To generate object priority information.
- the priority information generation unit 52 generates any one of the above-described formulas (1) to (9), a method for generating priority information based on the audio signal and gain information of the object, formula ( 10), Expression (11), Expression (12), etc., the priority information of each object is generated.
- step S12 the packing unit 24 stores the priority information of the audio signal of each object supplied from the priority information generation unit 52 in the DSE of the bit stream.
- step S13 the packing unit 24 stores the metadata and content information of each object supplied from the metadata input unit 23 in the DSE of the bitstream.
- the priority information of the audio signals of all objects and the metadata and content information of all objects are stored in the DSE of the bitstream.
- step S14 the channel audio encoding unit 21 encodes the supplied audio signal of each channel.
- the channel audio encoding unit 21 performs MDCT on the audio signal of each channel, encodes the MDCT coefficient of each channel obtained by MDCT, and encodes each channel obtained as a result. Data is supplied to the packing unit 24.
- step S15 the packing unit 24 stores the encoded data of the audio signal of each channel supplied from the channel audio encoding unit 21 in the SCE or CPE of the bit stream. That is, the encoded data is stored in each element arranged after the DSE in the bit stream.
- step S16 the encoding unit 51 of the object audio encoding unit 22 encodes the supplied audio signal of each object.
- the MDCT unit 61 performs MDCT on the audio signal of each object, and the encoding unit 51 encodes the MDCT coefficient of each object obtained by MDCT, and obtains each object obtained as a result.
- the encoded data is supplied to the packing unit 24.
- step S17 the packing unit 24 stores the encoded data of the audio signal of each object supplied from the encoding unit 51 in the SCE of the bit stream. That is, encoded data is stored in some elements arranged after DSE in the bitstream.
- the encoded data of the audio signals of all channels, the priority information and encoded data of the audio signals of all objects, and the metadata and content information of all objects are stored for the frame to be processed.
- a bitstream is obtained.
- step S18 the packing unit 24 outputs the obtained bit stream, and the encoding process ends.
- the encoding device 11 generates the priority information of the audio signal of each object, stores it in the bit stream, and outputs it. Therefore, on the decoding side, it becomes possible to easily grasp which audio signal has a higher priority.
- the encoded audio signal can be selectively decoded according to the priority information.
- the encoded audio signal can be selectively decoded according to the priority information.
- the encoding device 11 can obtain the priority information of the object based on the metadata of the object, the content information, the audio signal of the object, and the like, thereby obtaining more appropriate priority information at a low cost. .
- priority information may be generated in the decoding device.
- a decoding device that receives the bit stream output from the encoding device and decodes the encoded data included in the bit stream is configured as shown in FIG. 4, for example.
- the decoding device 101 illustrated in FIG. 4 includes an unpacking / decoding unit 111, a rendering unit 112, and a mixing unit 113.
- the decoding device 101 illustrated in FIG. 4 is a diagrammatic representation of the decoding device 101 illustrated in FIG.
- the unpacking / decoding unit 111 acquires the bitstream output from the encoding device and performs unpacking and decoding of the bitstream.
- the unpacking / decoding unit 111 supplies the audio signal of each object obtained by unpacking and decoding and the metadata of each object to the rendering unit 112. At this time, the unpacking / decoding unit 111 generates priority information of each object based on the metadata and content information of the object, and decodes the encoded data of each object according to the obtained priority information. .
- the unpacking / decoding unit 111 supplies the audio signal of each channel obtained by unpacking and decoding to the mixing unit 113.
- the rendering unit 112 generates an M channel audio signal based on the audio signal of each object supplied from the unpacking / decoding unit 111 and the object position information included in the metadata of each object, and supplies the M channel audio signal to the mixing unit 113. To do. At this time, the rendering unit 112 generates audio signals of M channels so that the sound image of each object is localized at a position indicated by the object position information of those objects.
- the mixing unit 113 weights and adds the audio signals of the respective channels supplied from the unpacking / decoding unit 111 and the audio signals of the respective channels supplied from the rendering unit 112 for each channel, and finally performs the audio of each channel. Generate a signal.
- the mixing unit 113 supplies the final audio signal of each channel obtained in this way to a speaker corresponding to each external channel, and reproduces the sound.
- the unpacking / decoding unit 111 of the decoding apparatus 101 shown in FIG. 4 is configured in more detail as shown in FIG. 5, for example.
- the unpacking / decoding unit 111 illustrated in FIG. 5 includes a channel audio signal acquisition unit 141, a channel audio signal decoding unit 142, an IMDCT (Inverse / Modified / Discrete / Cosine / Transform) unit 143, an object audio signal acquisition unit 144, and an object audio signal decoding unit 145. , A priority information generation unit 146, an output selection unit 147, a zero value output unit 148, and an IMDCT unit 149.
- IMDCT Inverse / Modified / Discrete / Cosine / Transform
- the channel audio signal acquisition unit 141 acquires the encoded data of each channel from the supplied bit storm and supplies it to the channel audio signal decoding unit 142.
- the channel audio signal decoding unit 142 decodes the encoded data of each channel supplied from the channel audio signal acquisition unit 141, and supplies the MDCT coefficient obtained as a result to the IMDCT unit 143.
- the IMDCT unit 143 performs IMDCT based on the MDCT coefficient supplied from the channel audio signal decoding unit 142, generates an audio signal, and supplies the audio signal to the mixing unit 113.
- the IMDCT unit 143 performs IMDCT (Inverse Modified Discrete Cosine Transform) on the MDCT coefficient to generate an audio signal.
- IMDCT Inverse Modified Discrete Cosine Transform
- the object audio signal acquisition unit 144 acquires encoded data of each object from the supplied bit stream and supplies the encoded data to the object audio signal decoding unit 145.
- the object audio signal acquisition unit 144 acquires the metadata and content information of each object from the supplied bitstream, supplies the metadata and content information to the priority information generation unit 146, and renders the metadata. To the unit 112.
- the object audio signal decoding unit 145 decodes the encoded data of each object supplied from the object audio signal acquisition unit 144, and supplies the MDCT coefficient obtained as a result to the output selection unit 147 and the priority information generation unit 146. .
- the priority information generation unit 146 includes at least one of the metadata supplied from the object audio signal acquisition unit 144, the content information supplied from the object audio signal acquisition unit 144, and the MDCT coefficient supplied from the object audio signal decoding unit 145. Then, priority information of each object is generated and supplied to the output selection unit 147.
- the output selection unit 147 selectively switches the output destination of the MDCT coefficient of each object supplied from the object audio signal decoding unit 145 based on the priority information of each object supplied from the priority information generation unit 146.
- the output selection unit 147 supplies the 0-value output unit 148 with the MDCT coefficient of the object as 0. Further, when the priority information about a predetermined object is equal to or higher than the predetermined threshold Q, the output selection unit 147 supplies the MDCT coefficient of the object supplied from the object audio signal decoding unit 145 to the IMDCT unit 149.
- the value of the threshold value Q is appropriately determined according to, for example, the calculation capability of the decoding device 101. By appropriately determining the threshold value Q, it is possible to reduce the calculation amount of the audio signal decoding to a calculation amount within a range that the decoding apparatus 101 can decode in real time.
- the zero value output unit 148 generates an audio signal based on the MDCT coefficient supplied from the output selection unit 147 and supplies the audio signal to the rendering unit 112. In this case, since the MDCT coefficient is 0, a silent audio signal is generated.
- the IMDCT unit 149 performs IMDCT based on the MDCT coefficient supplied from the output selection unit 147, generates an audio signal, and supplies the audio signal to the rendering unit 112.
- the decoding apparatus 101 When the decoding apparatus 101 is supplied with a bit stream for one frame from the encoding apparatus, it performs a decoding process to generate an audio signal and outputs it to a speaker.
- the decoding process performed by the decoding apparatus 101 will be described with reference to the flowchart of FIG.
- step S51 the unpacking / decoding unit 111 acquires the bitstream transmitted from the encoding device. That is, a bit stream is received.
- step S52 the unpacking / decoding unit 111 performs selective decoding processing.
- encoded data of each channel is decoded, priority information is generated for each object, and the encoded data of the object is based on the priority information. Selectively decoded.
- the audio signal of each channel is supplied to the mixing unit 113, and the audio signal of each object is supplied to the rendering unit 112. Further, the metadata of each object acquired from the bit stream is supplied to the rendering unit 112.
- step S53 the rendering unit 112 renders the audio signal of the object based on the object audio signal supplied from the unpacking / decoding unit 111 and the object position information included in the object metadata.
- the rendering unit 112 generates an audio signal of each channel by VBAP (Vector ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ Base Amplitude Pannning) based on the object position information so that the sound image of the object is localized at the position indicated by the object position information, and sends it to the mixing unit 113. Supply.
- VBAP Vector ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ Base Amplitude Pannning
- step S54 the mixing unit 113 weights and adds, for each channel, the audio signal of each channel supplied from the unpacking / decoding unit 111 and the audio signal of each channel supplied from the rendering unit 112 to an external speaker. To supply. Thereby, since the audio signal of the channel corresponding to those speakers is supplied to each speaker, each speaker reproduces sound based on the supplied audio signal.
- the decoding apparatus 101 generates priority information, and decodes encoded data of each object according to the priority information.
- step S81 the channel audio signal acquisition unit 141 sets 0 to the channel number of the channel to be processed and holds it.
- step S82 the channel audio signal acquisition unit 141 determines whether or not the held channel number is less than the number M of channels.
- step S83 the channel audio signal decoding unit 142 decodes the encoded data of the audio signal of the processing target channel.
- the channel audio signal acquisition unit 141 acquires encoded data of a channel to be processed from the supplied bit stream and supplies the encoded data to the channel audio signal decoding unit 142. Then, the channel audio signal decoding unit 142 decodes the encoded data supplied from the channel audio signal acquisition unit 141, and supplies the MDCT coefficient obtained as a result to the IMDCT unit 143.
- step S84 the IMDCT unit 143 performs IMDCT based on the MDCT coefficient supplied from the channel audio signal decoding unit 142, generates an audio signal of the processing target channel, and supplies the generated audio signal to the mixing unit 113.
- step S85 the channel audio signal acquisition unit 141 adds 1 to the held channel number and updates the channel number of the processing target channel.
- step S82 When the channel number is updated, the process returns to step S82 and the above-described process is repeated. That is, an audio signal of a new channel to be processed is generated.
- step S82 If it is determined in step S82 that the channel number of the processing target channel is not less than M, audio signals have been obtained for all channels, and the process proceeds to step S86.
- step S86 the object audio signal acquisition unit 144 sets 0 to the object number of the object to be processed and holds it.
- step S87 the object audio signal acquisition unit 144 determines whether or not the held object number is less than the number N of objects.
- step S88 the object audio signal decoding unit 145 decodes the encoded data of the audio signal of the object to be processed.
- the object audio signal acquisition unit 144 acquires encoded data of an object to be processed from the supplied bit stream and supplies the encoded data to the object audio signal decoding unit 145. Then, the object audio signal decoding unit 145 decodes the encoded data supplied from the object audio signal acquisition unit 144, and supplies the MDCT coefficient obtained as a result to the priority information generation unit 146 and the output selection unit 147.
- the object audio signal acquisition unit 144 acquires the metadata and content information of the object to be processed from the supplied bitstream, supplies the metadata and content information to the priority information generation unit 146, and also stores the metadata. Is supplied to the rendering unit 112.
- step S89 the priority information generation unit 146 generates the priority information of the audio signal of the object to be processed and supplies it to the output selection unit 147.
- the priority information generation unit 146 includes the metadata supplied from the object audio signal acquisition unit 144, the content information supplied from the object audio signal acquisition unit 144, and the MDCT coefficient supplied from the object audio signal decoding unit 145. Priority information is generated based on at least one of them.
- priority information is generated by performing the same processing as in step S11 of FIG.
- the priority information generation unit 146 generates priority information based on any of the above-described formulas (1) to (9), or the sound pressure and gain information of the audio signal of the object.
- the priority information of the object is generated by the equation (10), the equation (11), the equation (12), and the like.
- the priority information generation unit 146 uses the sum of squares of the MDCT coefficients supplied from the object audio signal decoding unit 145 as the sound pressure of the audio signal. Use.
- step S90 the output selection unit 147 determines whether or not the priority information of the processing target object supplied from the priority information generation unit 146 is equal to or higher than a threshold value Q specified by an upper control device (not shown) or the like. Determine.
- the threshold value Q is determined according to, for example, the calculation capability of the decoding device 101.
- step S90 If it is determined in step S90 that the priority information is greater than or equal to the threshold value Q, the output selection unit 147 supplies the MDCT coefficient of the object to be processed supplied from the object audio signal decoding unit 145 to the IMDCT unit 149. The process proceeds to step S91. In this case, decoding of the object to be processed, more specifically IMDCT, is performed.
- step S91 the IMDCT unit 149 performs IMDCT based on the MDCT coefficient supplied from the output selection unit 147, generates an audio signal of the object to be processed, and supplies the generated audio signal to the rendering unit 112. After the audio signal is generated, the process proceeds to step S92.
- the output selection unit 147 supplies the MDCT coefficient to 0 to the 0-value output unit 148.
- the zero value output unit 148 generates an audio signal of the object to be processed from the MDCT coefficient that is 0 supplied from the output selection unit 147 and supplies the generated audio signal to the rendering unit 112. Therefore, in the zero value output unit 148, substantially no processing for generating an audio signal such as IMDCT is performed. In other words, decoding of encoded data, more specifically, IMDCT for MDCT coefficients is not substantially performed.
- the audio signal generated by the zero-value output unit 148 is a silence signal. After the audio signal is generated, the process proceeds to step S92.
- step S90 If it is determined in step S90 that the priority information is less than the threshold value Q, or if an audio signal is generated in step S91, the object audio signal acquisition unit 144 sets 1 to the held object number in step S92. In addition, the object number of the object to be processed is updated.
- step S87 the process returns to step S87, and the above-described process is repeated. That is, an audio signal of a new object to be processed is generated.
- step S87 If it is determined in step S87 that the object number of the object to be processed is not less than N, since the audio signals have been obtained for all channels and necessary objects, the selective decoding process ends. Proceed to step S53 of FIG.
- the decoding apparatus 101 generates priority information for each object, and compares the priority information with a threshold value to determine whether to decode the encoded audio signal.
- the decoded audio signal is decoded.
- the above-described series of processing can be executed by hardware or can be executed by software.
- a program constituting the software is installed in the computer.
- the computer includes, for example, a general-purpose personal computer capable of executing various functions by installing a computer incorporated in dedicated hardware and various programs.
- FIG. 8 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
- a CPU Central Processing Unit
- ROM Read Only Memory
- RAM Random Access Memory
- An input / output interface 505 is further connected to the bus 504.
- An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are connected to the input / output interface 505.
- the input unit 506 includes a keyboard, a mouse, a microphone, an image sensor, and the like.
- the output unit 507 includes a display, a speaker, and the like.
- the recording unit 508 includes a hard disk, a nonvolatile memory, and the like.
- the communication unit 509 includes a network interface or the like.
- the drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
- the CPU 501 loads the program recorded in the recording unit 508 to the RAM 503 via the input / output interface 505 and the bus 504 and executes the program, for example. Is performed.
- the program executed by the computer (CPU 501) can be provided by being recorded in a removable recording medium 511 as a package medium, for example.
- the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
- the program can be installed in the recording unit 508 via the input / output interface 505 by attaching the removable recording medium 511 to the drive 510. Further, the program can be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.
- the program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.
- the present technology can take a cloud computing configuration in which one function is shared by a plurality of devices via a network and is jointly processed.
- each step described in the above flowchart can be executed by one device or can be shared by a plurality of devices.
- the plurality of processes included in the one step can be executed by being shared by a plurality of apparatuses in addition to being executed by one apparatus.
- the present technology can be configured as follows.
- a signal processing device comprising: a priority information generation unit that generates priority information of the audio object based on a plurality of elements representing characteristics of the audio object.
- a priority information generation unit that generates priority information of the audio object based on a plurality of elements representing characteristics of the audio object.
- the element is metadata of the audio object.
- the element is a position of the audio object in space.
- the element is a distance from a reference position in the space to the audio object.
- the signal processing device according to (3), wherein the element is a horizontal angle indicating a horizontal position of the audio object in the space.
- the signal processing device according to any one of (2) to (5), wherein the priority information generation unit generates the priority information according to a moving speed of the audio object based on the metadata.
- the signal processing apparatus according to any one of (1) to (6), wherein the element is gain information to be multiplied by an audio signal of the audio object.
- the priority information generation unit generates the priority information of the processing target unit time based on a difference between the gain information of the processing target unit time and an average value of the gain information of a plurality of unit times.
- the signal processing apparatus according to (7).
- (9) The signal processing device according to (7), wherein the priority information generation unit generates the priority information based on a sound pressure of the audio signal multiplied by the gain information.
- the signal processing apparatus according to any one of (1) to (9), wherein the element is spread information.
- the priority information generation unit generates the priority information according to an area of the area of the audio object based on the spread information.
- the element is information indicating a sound attribute of the audio object.
- the element is an audio signal of the audio object.
- the priority information generation unit generates the priority information based on a result of a voice section detection process on the audio signal.
- 11 encoding device 22 object audio encoding unit, 23 metadata input unit, 51 encoding unit, 52 priority information generation unit, 101 decoding device, 111 unpacking / decoding unit, 144 object audio signal acquisition unit, 145 object Audio signal decoding unit, 146 priority information generation unit, 147 output selection unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Stereophonic System (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
〈符号化装置の構成例〉
本技術は、オーディオオブジェクトのメタデータや、コンテンツ情報、オーディオオブジェクトのオーディオ信号などのオーディオオブジェクトの特徴を表す要素に基づいて、オーディオオブジェクトの優先度情報を生成することで、低コストで復号の計算量を低減させることができるようにするものである。
また、図1のオブジェクトオーディオ符号化部22は、より詳細には例えば図2に示すように構成される。
ここで、優先度情報生成部52において生成されるオブジェクトの優先度情報について説明する。
(2)メタデータ以外の他の情報に基づいて優先度情報を生成する
(3)複数の方法により得られた優先度情報を組み合わせて1つの優先度情報を生成する
(4)優先度情報を時間方向に平滑化して最終的な1つの優先度情報を生成する
まず、オブジェクト位置情報に基づいて優先度情報を生成する例について説明する。
次に、ゲイン情報に基づいて優先度情報を生成する例について説明する。
続いて、スプレッド情報に基づいて優先度情報を生成する例について説明する。
まず、メタデータ以外の情報に基づく優先度情報の生成例として、コンテンツ情報を用いて優先度情報を生成する例について説明する。
また、各オブジェクトが音声であるか否かはVAD(Voice Activity Detection)技術を用いることで識別することができる。
さらに、例えば上述したように、オブジェクトのオーディオ信号の音圧のみに基づいて優先度情報を生成することも考えられる。しかし、復号側では、オブジェクトのメタデータに含まれるゲイン情報がオーディオ信号に乗算されるため、ゲイン情報の乗算前後ではオーディオ信号の音圧が変化する。
また、互いに異なる複数の方法により求めた優先度情報のそれぞれを線形結合や非線形結合などにより結合(合成)し、最終的な1つの優先度情報とするようにしてもよい。換言すれば、オブジェクトの特徴を表す複数の要素に基づいて優先度情報を生成してもよい。
さらに、互いに異なる複数の方法により求めた優先度情報のそれぞれを非線形結合して、最終的な1つの優先度情報とする例について説明する。
また、以上においては、オブジェクトのメタデータやコンテンツ情報などから優先度情報を生成したり、複数の優先度情報を結合して最終的な1つの優先度情報を生成する例について説明した。しかし、短い期間の間に複数のオブジェクトの優先度情報の大小関係が何度も変化することは望ましくない。
次に、符号化装置11により行われる処理について説明する。
〈復号装置の構成例〉
なお、以上においては、符号化装置11から出力されるビットストリームに優先度情報が含まれている例について説明したが、符号化装置によっては、ビットストリームに優先度情報が含まれていないこともあり得る。
また、図4に示した復号装置101のアンパッキング/復号部111は、より詳細には例えば図5に示すように構成される。
次に、復号装置101の動作について説明する。
続いて、図7のフローチャートを参照して、図6のステップS52の処理に対応する選択復号処理について説明する。
ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどが含まれる。
オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する優先度情報生成部を備える
信号処理装置。
(2)
前記要素は前記オーディオオブジェクトのメタデータである
(1)に記載の信号処理装置。
(3)
前記要素は空間上における前記オーディオオブジェクトの位置である
(1)または(2)に記載の信号処理装置。
(4)
前記要素は前記空間上における基準位置から前記オーディオオブジェクトまでの距離である
(3)に記載の信号処理装置。
(5)
前記要素は前記空間上における前記オーディオオブジェクトの水平方向の位置を示す水平方向角度である
(3)に記載の信号処理装置。
(6)
前記優先度情報生成部は、前記メタデータに基づいて前記オーディオオブジェクトの移動速度に応じた前記優先度情報を生成する
(2)乃至(5)の何れか一項に記載の信号処理装置。
(7)
前記要素は前記オーディオオブジェクトのオーディオ信号に乗算されるゲイン情報である
(1)乃至(6)の何れか一項に記載の信号処理装置。
(8)
前記優先度情報生成部は、処理対象の単位時間の前記ゲイン情報と、複数の単位時間の前記ゲイン情報の平均値との差分に基づいて、前記処理対象の単位時間の前記優先度情報を生成する
(7)に記載の信号処理装置。
(9)
前記優先度情報生成部は、前記ゲイン情報が乗算された前記オーディオ信号の音圧に基づいて前記優先度情報を生成する
(7)に記載の信号処理装置。
(10)
前記要素はスプレッド情報である
(1)乃至(9)の何れか一項に記載の信号処理装置。
(11)
前記優先度情報生成部は、前記スプレッド情報に基づいて、前記オーディオオブジェクトの領域の面積に応じた前記優先度情報を生成する
(10)に記載の信号処理装置。
(12)
前記要素は前記オーディオオブジェクトの音の属性を示す情報である
(1)乃至(11)の何れか一項に記載の信号処理装置。
(13)
前記要素は前記オーディオオブジェクトのオーディオ信号である
(1)乃至(12)の何れか一項に記載の信号処理装置。
(14)
前記優先度情報生成部は、前記オーディオ信号に対する音声区間検出処理の結果に基づいて前記優先度情報を生成する
(13)に記載の信号処理装置。
(15)
前記優先度情報生成部は、生成した前記優先度情報に対して時間方向の平滑化を行い、最終的な前記優先度情報とする
(1)乃至(14)の何れか一項に記載の信号処理装置。
(16)
オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する
ステップを含む信号処理方法。
(17)
オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する
ステップを含む処理をコンピュータに実行させるプログラム。
Claims (17)
- オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する優先度情報生成部を備える
信号処理装置。 - 前記要素は前記オーディオオブジェクトのメタデータである
請求項1に記載の信号処理装置。 - 前記要素は空間上における前記オーディオオブジェクトの位置である
請求項1に記載の信号処理装置。 - 前記要素は前記空間上における基準位置から前記オーディオオブジェクトまでの距離である
請求項3に記載の信号処理装置。 - 前記要素は前記空間上における前記オーディオオブジェクトの水平方向の位置を示す水平方向角度である
請求項3に記載の信号処理装置。 - 前記優先度情報生成部は、前記メタデータに基づいて前記オーディオオブジェクトの移動速度に応じた前記優先度情報を生成する
請求項2に記載の信号処理装置。 - 前記要素は前記オーディオオブジェクトのオーディオ信号に乗算されるゲイン情報である
請求項1に記載の信号処理装置。 - 前記優先度情報生成部は、処理対象の単位時間の前記ゲイン情報と、複数の単位時間の前記ゲイン情報の平均値との差分に基づいて、前記処理対象の単位時間の前記優先度情報を生成する
請求項7に記載の信号処理装置。 - 前記優先度情報生成部は、前記ゲイン情報が乗算された前記オーディオ信号の音圧に基づいて前記優先度情報を生成する
請求項7に記載の信号処理装置。 - 前記要素はスプレッド情報である
請求項1に記載の信号処理装置。 - 前記優先度情報生成部は、前記スプレッド情報に基づいて、前記オーディオオブジェクトの領域の面積に応じた前記優先度情報を生成する
請求項10に記載の信号処理装置。 - 前記要素は前記オーディオオブジェクトの音の属性を示す情報である
請求項1に記載の信号処理装置。 - 前記要素は前記オーディオオブジェクトのオーディオ信号である
請求項1に記載の信号処理装置。 - 前記優先度情報生成部は、前記オーディオ信号に対する音声区間検出処理の結果に基づいて前記優先度情報を生成する
請求項13に記載の信号処理装置。 - 前記優先度情報生成部は、生成した前記優先度情報に対して時間方向の平滑化を行い、最終的な前記優先度情報とする
請求項1に記載の信号処理装置。 - オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する
ステップを含む信号処理方法。 - オーディオオブジェクトの特徴を表す複数の要素に基づいて、前記オーディオオブジェクトの優先度情報を生成する
ステップを含む処理をコンピュータに実行させるプログラム。
Priority Applications (14)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410360122.5A CN118248153A (zh) | 2017-04-26 | 2018-04-12 | 信号处理设备和方法及程序 |
EP24162190.3A EP4358085A3 (en) | 2017-04-26 | 2018-04-12 | Signal processing device, method, and program |
US16/606,276 US11574644B2 (en) | 2017-04-26 | 2018-04-12 | Signal processing device and method, and program |
BR112019021904-8A BR112019021904A2 (pt) | 2017-04-26 | 2018-04-12 | Dispositivo e método de processamento de sinal, e, programa. |
KR1020197030401A KR20190141669A (ko) | 2017-04-26 | 2018-04-12 | 신호 처리 장치 및 방법, 및 프로그램 |
RU2019132898A RU2019132898A (ru) | 2017-04-26 | 2018-04-12 | Способ и устройство для обработки сигнала и программа |
EP18790825.6A EP3618067B1 (en) | 2017-04-26 | 2018-04-12 | Signal processing device, method, and program |
KR1020247008685A KR20240042125A (ko) | 2017-04-26 | 2018-04-12 | 신호 처리 장치 및 방법, 및 프로그램 |
JP2019514367A JP7160032B2 (ja) | 2017-04-26 | 2018-04-12 | 信号処理装置および方法、並びにプログラム |
CN201880025687.0A CN110537220B (zh) | 2017-04-26 | 2018-04-12 | 信号处理设备和方法及程序 |
JP2022164511A JP7459913B2 (ja) | 2017-04-26 | 2022-10-13 | 信号処理装置および方法、並びにプログラム |
US18/154,187 US11900956B2 (en) | 2017-04-26 | 2023-01-13 | Signal processing device and method, and program |
US18/416,154 US20240153516A1 (en) | 2017-04-26 | 2024-01-18 | Signal processing device and method, and program |
JP2024043562A JP2024075675A (ja) | 2017-04-26 | 2024-03-19 | 信号処理装置および方法、並びにプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017-087208 | 2017-04-26 | ||
JP2017087208 | 2017-04-26 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/606,276 A-371-Of-International US11574644B2 (en) | 2017-04-26 | 2018-04-12 | Signal processing device and method, and program |
US18/154,187 Continuation US11900956B2 (en) | 2017-04-26 | 2023-01-13 | Signal processing device and method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018198789A1 true WO2018198789A1 (ja) | 2018-11-01 |
Family
ID=63918157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/015352 WO2018198789A1 (ja) | 2017-04-26 | 2018-04-12 | 信号処理装置および方法、並びにプログラム |
Country Status (8)
Country | Link |
---|---|
US (3) | US11574644B2 (ja) |
EP (2) | EP4358085A3 (ja) |
JP (3) | JP7160032B2 (ja) |
KR (2) | KR20190141669A (ja) |
CN (2) | CN110537220B (ja) |
BR (1) | BR112019021904A2 (ja) |
RU (1) | RU2019132898A (ja) |
WO (1) | WO2018198789A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020105423A1 (ja) | 2018-11-20 | 2020-05-28 | ソニー株式会社 | 情報処理装置および方法、並びにプログラム |
JP2020167629A (ja) * | 2019-03-29 | 2020-10-08 | 日本放送協会 | 受信装置、配信サーバ及び受信プログラム |
WO2024034389A1 (ja) * | 2022-08-09 | 2024-02-15 | ソニーグループ株式会社 | 信号処理装置、信号処理方法、およびプログラム |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110537220B (zh) | 2017-04-26 | 2024-04-16 | 索尼公司 | 信号处理设备和方法及程序 |
GB2575510A (en) * | 2018-07-13 | 2020-01-15 | Nokia Technologies Oy | Spatial augmentation |
CN112740721A (zh) * | 2018-09-28 | 2021-04-30 | 索尼公司 | 信息处理装置、方法和程序 |
CN114390401A (zh) * | 2021-12-14 | 2022-04-22 | 广州市迪声音响有限公司 | 用于音响的多通道数字音频信号实时音效处理方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015056383A1 (ja) * | 2013-10-17 | 2015-04-23 | パナソニック株式会社 | オーディオエンコード装置及びオーディオデコード装置 |
JP2016509249A (ja) * | 2012-12-21 | 2016-03-24 | ドルビー ラボラトリーズ ライセンシング コーポレイション | 知覚的基準に基づいてオブジェクト・ベースのオーディオ・コンテンツをレンダリングするためのオブジェクト・クラスタリング |
WO2016126907A1 (en) * | 2015-02-06 | 2016-08-11 | Dolby Laboratories Licensing Corporation | Hybrid, priority-based rendering system and method for adaptive audio |
WO2016208406A1 (ja) * | 2015-06-24 | 2016-12-29 | ソニー株式会社 | 音声処理装置および方法、並びにプログラム |
JP2017507365A (ja) * | 2014-03-06 | 2017-03-16 | ディーティーエス・インコーポレイテッドDTS,Inc. | 複数のオブジェクトオーディオのポスト符号化ビットレート低減 |
JP2017508175A (ja) * | 2014-01-09 | 2017-03-23 | ドルビー ラボラトリーズ ライセンシング コーポレイション | オーディオ・コンテンツの空間的誤差メトリック |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7032236B1 (en) * | 1998-02-20 | 2006-04-18 | Thomson Licensing | Multimedia system for processing program guides and associated multimedia objects |
US7079658B2 (en) * | 2001-06-14 | 2006-07-18 | Ati Technologies, Inc. | System and method for localization of sounds in three-dimensional space |
JP5340296B2 (ja) * | 2009-03-26 | 2013-11-13 | パナソニック株式会社 | 復号化装置、符号化復号化装置および復号化方法 |
JP5036797B2 (ja) * | 2009-12-11 | 2012-09-26 | 株式会社スクウェア・エニックス | 発音処理装置、発音処理方法、及び発音処理プログラム |
WO2012122397A1 (en) * | 2011-03-09 | 2012-09-13 | Srs Labs, Inc. | System for dynamically creating and rendering audio objects |
US9344815B2 (en) * | 2013-02-11 | 2016-05-17 | Symphonic Audio Technologies Corp. | Method for augmenting hearing |
US9338420B2 (en) * | 2013-02-15 | 2016-05-10 | Qualcomm Incorporated | Video analysis assisted generation of multi-channel audio data |
CN104882145B (zh) * | 2014-02-28 | 2019-10-29 | 杜比实验室特许公司 | 使用音频对象的时间变化的音频对象聚类 |
JP6439296B2 (ja) * | 2014-03-24 | 2018-12-19 | ソニー株式会社 | 復号装置および方法、並びにプログラム |
JP6432180B2 (ja) * | 2014-06-26 | 2018-12-05 | ソニー株式会社 | 復号装置および方法、並びにプログラム |
CN106162500B (zh) * | 2015-04-08 | 2020-06-16 | 杜比实验室特许公司 | 音频内容的呈现 |
US10136240B2 (en) * | 2015-04-20 | 2018-11-20 | Dolby Laboratories Licensing Corporation | Processing audio data to compensate for partial hearing loss or an adverse hearing environment |
EP4333461A3 (en) * | 2015-11-20 | 2024-04-17 | Dolby Laboratories Licensing Corporation | Improved rendering of immersive audio content |
KR101968456B1 (ko) * | 2016-01-26 | 2019-04-11 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | 적응형 양자화 |
WO2018096599A1 (en) * | 2016-11-22 | 2018-05-31 | Sony Mobile Communications Inc. | Environment-aware monitoring systems, methods, and computer program products for immersive environments |
EP3618463A4 (en) | 2017-04-25 | 2020-04-29 | Sony Corporation | SIGNAL PROCESSING DEVICE, METHOD AND PROGRAM |
CN110537220B (zh) | 2017-04-26 | 2024-04-16 | 索尼公司 | 信号处理设备和方法及程序 |
BR112021009306A2 (pt) * | 2018-11-20 | 2021-08-10 | Sony Group Corporation | dispositivo e método de processamento de informações, e, programa. |
-
2018
- 2018-04-12 CN CN201880025687.0A patent/CN110537220B/zh active Active
- 2018-04-12 JP JP2019514367A patent/JP7160032B2/ja active Active
- 2018-04-12 KR KR1020197030401A patent/KR20190141669A/ko not_active IP Right Cessation
- 2018-04-12 RU RU2019132898A patent/RU2019132898A/ru unknown
- 2018-04-12 WO PCT/JP2018/015352 patent/WO2018198789A1/ja unknown
- 2018-04-12 EP EP24162190.3A patent/EP4358085A3/en active Pending
- 2018-04-12 EP EP18790825.6A patent/EP3618067B1/en active Active
- 2018-04-12 US US16/606,276 patent/US11574644B2/en active Active
- 2018-04-12 CN CN202410360122.5A patent/CN118248153A/zh active Pending
- 2018-04-12 KR KR1020247008685A patent/KR20240042125A/ko active Search and Examination
- 2018-04-12 BR BR112019021904-8A patent/BR112019021904A2/pt unknown
-
2022
- 2022-10-13 JP JP2022164511A patent/JP7459913B2/ja active Active
-
2023
- 2023-01-13 US US18/154,187 patent/US11900956B2/en active Active
-
2024
- 2024-01-18 US US18/416,154 patent/US20240153516A1/en active Pending
- 2024-03-19 JP JP2024043562A patent/JP2024075675A/ja active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016509249A (ja) * | 2012-12-21 | 2016-03-24 | ドルビー ラボラトリーズ ライセンシング コーポレイション | 知覚的基準に基づいてオブジェクト・ベースのオーディオ・コンテンツをレンダリングするためのオブジェクト・クラスタリング |
WO2015056383A1 (ja) * | 2013-10-17 | 2015-04-23 | パナソニック株式会社 | オーディオエンコード装置及びオーディオデコード装置 |
JP2017508175A (ja) * | 2014-01-09 | 2017-03-23 | ドルビー ラボラトリーズ ライセンシング コーポレイション | オーディオ・コンテンツの空間的誤差メトリック |
JP2017507365A (ja) * | 2014-03-06 | 2017-03-16 | ディーティーエス・インコーポレイテッドDTS,Inc. | 複数のオブジェクトオーディオのポスト符号化ビットレート低減 |
WO2016126907A1 (en) * | 2015-02-06 | 2016-08-11 | Dolby Laboratories Licensing Corporation | Hybrid, priority-based rendering system and method for adaptive audio |
WO2016208406A1 (ja) * | 2015-06-24 | 2016-12-29 | ソニー株式会社 | 音声処理装置および方法、並びにプログラム |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020105423A1 (ja) | 2018-11-20 | 2020-05-28 | ソニー株式会社 | 情報処理装置および方法、並びにプログラム |
JP2020167629A (ja) * | 2019-03-29 | 2020-10-08 | 日本放送協会 | 受信装置、配信サーバ及び受信プログラム |
JP7236914B2 (ja) | 2019-03-29 | 2023-03-10 | 日本放送協会 | 受信装置、配信サーバ及び受信プログラム |
WO2024034389A1 (ja) * | 2022-08-09 | 2024-02-15 | ソニーグループ株式会社 | 信号処理装置、信号処理方法、およびプログラム |
Also Published As
Publication number | Publication date |
---|---|
JP7160032B2 (ja) | 2022-10-25 |
EP3618067A4 (en) | 2020-05-06 |
EP4358085A3 (en) | 2024-07-10 |
BR112019021904A2 (pt) | 2020-05-26 |
CN110537220A (zh) | 2019-12-03 |
EP4358085A2 (en) | 2024-04-24 |
RU2019132898A (ru) | 2021-04-19 |
EP3618067B1 (en) | 2024-04-10 |
US20230154477A1 (en) | 2023-05-18 |
JP2024075675A (ja) | 2024-06-04 |
KR20240042125A (ko) | 2024-04-01 |
JP7459913B2 (ja) | 2024-04-02 |
US11900956B2 (en) | 2024-02-13 |
CN110537220B (zh) | 2024-04-16 |
JP2022188258A (ja) | 2022-12-20 |
RU2019132898A3 (ja) | 2021-07-22 |
US11574644B2 (en) | 2023-02-07 |
US20210118466A1 (en) | 2021-04-22 |
JPWO2018198789A1 (ja) | 2020-03-05 |
EP3618067A1 (en) | 2020-03-04 |
KR20190141669A (ko) | 2019-12-24 |
US20240153516A1 (en) | 2024-05-09 |
CN118248153A (zh) | 2024-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7459913B2 (ja) | 信号処理装置および方法、並びにプログラム | |
US20210398546A1 (en) | Encoding device and encoding method, decoding device and decoding method, and program | |
US12114146B2 (en) | Determination of targeted spatial audio parameters and associated spatial audio playback | |
US11805383B2 (en) | Signal processing device, method, and program | |
US11743646B2 (en) | Signal processing apparatus and method, and program to reduce calculation amount based on mute information | |
TWI762949B (zh) | 用於丟失消隱之方法、用於解碼Dirac經編碼音訊場景之方法及對應電腦程式、丟失消隱設備及解碼器 | |
RU2807473C2 (ru) | Маскировка потерь пакетов для пространственного кодирования аудиоданных на основе dirac |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18790825 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019514367 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197030401 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112019021904 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2018790825 Country of ref document: EP Effective date: 20191126 |
|
ENP | Entry into the national phase |
Ref document number: 112019021904 Country of ref document: BR Kind code of ref document: A2 Effective date: 20191018 |