CN115966216A - Audio stream processing method and device - Google Patents

Audio stream processing method and device Download PDF

Info

Publication number
CN115966216A
CN115966216A CN202211648646.1A CN202211648646A CN115966216A CN 115966216 A CN115966216 A CN 115966216A CN 202211648646 A CN202211648646 A CN 202211648646A CN 115966216 A CN115966216 A CN 115966216A
Authority
CN
China
Prior art keywords
audio
audio stream
processed
coding format
stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211648646.1A
Other languages
Chinese (zh)
Inventor
于雷
张皓羽
何钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202211648646.1A priority Critical patent/CN115966216A/en
Publication of CN115966216A publication Critical patent/CN115966216A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides an audio stream processing method and an audio stream processing device, wherein the audio stream processing method comprises the following steps: acquiring an audio stream to be processed, and identifying the encoding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format. The audio stream to be processed is coded into the audio stream in a stereo coding format through the acquired set audio loudness parameter, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and the normal production of stereo audio can be compatible.

Description

Audio stream processing method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio stream processing method. The application also relates to an audio stream processing apparatus, a computing device, and a computer-readable storage medium.
Background
With the rapid development of computer technology, signal processing technology has also advanced dramatically, with audio processing technology being the most prominent. In current on-demand streaming media, the most widely used audio is stereo audio and panoramically audio. In order to be compatible with the audio playing of all devices, when the audio is produced, the audio with two specifications of panoramic sound and stereo sound is generated.
In the prior art, panoramic audio is usually transcoded directly to generate two audio streams of a panoramic coding format and a stereo coding format, which are provided for different hardware devices to play. However, since the mixing and mode of panoramic audio are greatly different from that of stereo audio, after the panoramic audio is transcoded into stereo audio, the volume will be lower, the audio quality will be reduced, and stereo audio can hardly be played. Therefore, an effective solution to solve the above problems is needed.
Disclosure of Invention
In view of this, an embodiment of the present application provides an audio stream processing method. The application also relates to an audio stream processing device, a computing device and a computer readable storage medium, which are used for solving the technical defect of low audio quality after transcoding in the prior art.
According to a first aspect of embodiments of the present application, there is provided an audio stream processing method, including:
acquiring an audio stream to be processed, and identifying the encoding format of the audio stream to be processed;
acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format;
according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format;
and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
Optionally, the transcoding the to-be-processed audio stream into an audio stream in a stereo coding format according to the set audio loudness parameter includes:
decoding the audio stream to be processed to obtain audio sampling data;
and according to the set audio loudness parameter and a set stereo coding strategy, coding the audio sampling data to obtain an audio stream in a stereo coding format.
Optionally, the encoding, according to the set audio loudness parameter and according to a set stereo coding policy, the audio sample data to obtain an audio stream in the stereo coding format includes:
and coding the audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the audio sampling data within a set time length according to the set audio loudness parameter to obtain an audio stream in the stereo coding format.
Optionally, the audio stream to be processed includes a plurality of audio data packets;
the decoding processing of the audio stream to be processed to obtain audio sample data includes:
decoding each audio data packet to obtain sub-audio sampling data corresponding to each audio sampling data;
the step of coding the audio sampling data according to the set audio loudness parameter and the set stereo coding strategy to obtain an audio stream in a stereo coding format includes:
for each piece of sub-audio sampling data, according to the set audio loudness parameter and a set stereo coding strategy, coding the sub-audio sampling data to obtain a sub-audio stream in a stereo coding format;
and splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format.
Optionally, the obtaining the audio stream to be processed includes:
acquiring a video stream to be processed, wherein the video stream to be processed comprises an audio stream to be processed and an image sequence to be processed;
after the obtaining of the multimedia stream to be processed, the method further includes:
transcoding the image sequence to be processed to obtain a target image sequence;
and aligning the target audio stream and the target image sequence to determine a target video stream.
Optionally, before the obtaining and setting the audio loudness parameter, the method further includes:
acquiring audio track information of the audio stream to be processed;
and identifying the coding format of the audio stream to be processed under the condition that the audio track information is a single-channel audio track.
Optionally, after the obtaining of the audio track information of the audio stream to be processed, the method further includes:
under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks;
and performing file format transcoding on the audio stream to be processed with the coding format being a stereo coding format to obtain a target audio stream with the stereo coding format, and/or performing file format transcoding on the audio stream to be processed with the coding format being a panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
According to a second aspect of embodiments of the present application, there is provided an audio stream processing apparatus including:
the device comprises a first identification module, a second identification module and a processing module, wherein the first identification module is configured to acquire an audio stream to be processed and identify the coding format of the audio stream to be processed;
the acquisition module is configured to acquire a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format;
a first transcoding module configured to transcode the audio stream to be processed into an audio stream in a stereo coding format according to the set audio loudness parameter;
and the determining module is configured to determine a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
According to a third aspect of embodiments herein, there is provided a computing device comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the audio stream processing method when executing the computer instructions.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the audio stream processing method.
The audio stream processing method provided by the application acquires an audio stream to be processed and identifies the coding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format. The audio stream to be processed is coded into the audio stream in a stereo coding format through the acquired set audio loudness parameters, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into the stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and the normal production of the stereo audio can be compatible.
Drawings
Fig. 1 is a processing flow chart of an audio stream processing method provided by the prior art;
fig. 2 is a schematic structural diagram of an audio stream processing system according to an embodiment of the present application;
fig. 3 is a flowchart of an audio stream processing method according to an embodiment of the present application;
fig. 4A is a flowchart illustrating an audio stream processing method according to an embodiment of the present application;
fig. 4B is a flowchart illustrating an audio stream processing method applied to a movie scene according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an audio stream processing apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of a computing device according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
First, the noun terms to which one or more embodiments of the present application relate are explained.
Stereo: two or more independent sound effect channels are used, appearing on a pair of symmetrically arranged loudspeakers (so-called horns). The sound produced by this method can still be natural and pleasant in different directions. Stereo of multiple channels is also known as stereo surround sound, such as the common 5.1 surround sound and 7.1 surround sound.
Panoramic sound: that is, dolby panoramas, is a three-dimensional surround sound technology, which is expanded on the basis of stereo surround sound, and adds the surround sound effect of sky channels, and can present the contents of 64 independent speakers, and can simultaneously transmit up to 128 channels or objects, which is finer than 7.1 surround channels. Dolby panoramas are an implementation of spatial audio.
LUFS (loudness unitsrelativefullscale) represents a relatively full scale loudness unit or a full scale loudness unit (i.e., the maximum level that the system can handle). This is a standardized measure of sound loudness that takes human perception and electrical signal strength into account.
With the rapid development of computer technology, signal processing technology has also advanced dramatically, with audio processing technology being the most prominent. In current on-demand streaming media, the most widely used audio is stereo audio and panoramically audio. Conventional stereo sound may exhibit sound in a horizontal plane, and sound localization has two dimensions, front-back and left-right, which may be referred to as two-dimensional (2d, 2-Dimension) audio. When one audio has two dimensions, namely front-back, left-right, and up-down, it can be called three-dimensional (3 d, 3-Dimension) audio, that is, spatial audio.
In current on-demand streaming media, the most widely used audio is in stereo coding format, which is adopted by many internet companies due to its simplicity and high compatibility. With the development of hardware technology, more and more user devices support playing spatial audio. Therefore, in order to be compatible with audio playback of all devices, audio of both the panoramic sound and the stereo sound specifications is generated at the time of audio production.
In the prior art, panoramic audio is usually transcoded directly to generate two audio streams of a panoramic coding format and a stereo coding format, which are provided for different hardware devices to play. Referring to fig. 1, fig. 1 shows a schematic flow chart of an audio stream processing method provided in the prior art: the video or audio, that is, the producer of the media resource, generates the panoramic sound media resource, and then directly transcodes the panoramic sound media resource to obtain the target media resource of the panoramic sound and the target media resource of the stereo sound, so that the audience can play the panoramic sound and the stereo sound by using the playing device.
However, because of the mixing and mode of panoramic audio, it is very different from stereo audio, and the ordinary stereo audio generally has only two fixed channels, and the sound only has the sound sources on the left and right sides when played. The sound source of the panoramic sound comprises the left side, the right side, the front side, the rear side, the upper side and the lower side, and is a 360-degree surround sound. Therefore, when the panoramic audio is transcoded into the stereo audio, the volume becomes low, the audio quality is degraded, and the stereo audio is hardly played.
Therefore, the present specification provides an audio stream processing method, which obtains an audio stream to be processed, and identifies an encoding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format. The audio stream to be processed is coded into the audio stream in a stereo coding format through the acquired set audio loudness parameter, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and the normal production of stereo audio can be compatible.
In the present application, an audio stream processing method is provided, and the present application relates to an audio stream processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
The execution main body of the audio stream processing method provided by the embodiment of the present application may be a terminal, a server, or a combination of the terminal and the server, which is not limited in the embodiment of the present application. The terminal may be any electronic product capable of performing human-Computer interaction with a user, such as a PC (Personal Computer), a mobile phone, a Pocket PC (PPC), a tablet PC, and the like. The server may be one server, or a server cluster composed of a plurality of servers, or a cloud computing service center, which is not limited in this embodiment of the present application.
Taking the example of the co-assistance of the terminal and the server, referring to fig. 2, fig. 2 is a schematic structural diagram of an audio stream processing system according to an embodiment of the present application:
and the terminal uploads the audio stream to be processed to the server.
The server identifies the coding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
Correspondingly, the server can also transmit the target audio stream to a playing device corresponding to the listener so as to be played by the listener conveniently.
By applying the scheme of the embodiment, the audio stream to be processed is coded into the audio stream in the stereo coding format through the acquired set audio loudness parameter, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into the stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and the normal production of the stereo audio can be compatible.
Fig. 3 is a flowchart illustrating an audio stream processing method according to an embodiment of the present application, which specifically includes the following steps:
step 302: the method comprises the steps of obtaining an audio stream to be processed and identifying the coding format of the audio stream to be processed.
Specifically, audio refers to a file storing sound content. The audio stream may control the output quality of "data stream" isochronous type audio. The audio stream to be processed refers to an audio stream that needs to be processed or transcoded.
In practical applications, there are various ways to acquire the audio stream to be processed, for example, a certain user may send an acquisition instruction of the audio stream to be processed to the execution main body, or send an audio stream processing instruction, and accordingly, the execution main body starts to acquire the audio stream to be processed after receiving the acquisition instruction; or, the execution main body may automatically acquire the audio stream to be processed every preset time, for example, after the preset time, the server having the audio stream processing function automatically acquires the audio stream to be processed in the specified access area. The present specification does not set any limit to the manner in which the audio stream to be processed is acquired.
After the audio stream to be processed is obtained, the encoding format of the audio stream to be processed can be identified according to the data attribute information of the audio stream, and the encoding format can also be identified by detecting the audio stream to be processed by a media file information viewing tool.
For example, the mediainfo command may be invoked to identify the encoding format of the audio stream to be processed. The MediaInfo is a very practical video parameter detection tool, and can perform coding analysis query on video and detect coding and information of audio files.
Step 304: and acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format.
Specifically, encoding is to encode characters, numbers, or other objects into digital codes or to convert information and data into predetermined electric pulse signals by a predetermined method. The panoramagram coding format refers to a coding format corresponding to panoramagram, such as an E-AC-3JOC format. Setting the audio loudness parameter refers to a parameter for adjusting the loudness of audio data, such as the LUFS (loudness unit relative to a complete scale or full-scale loudness unit), decibel, and the like.
In practical application, if the coding format of the audio stream to be processed is identified as the panoramagram coding format, the set audio loudness parameter is obtained, the set audio loudness parameter may be preset, or the panoramagram coding format is displayed on a display after the coding format of the audio stream to be processed is identified as the panoramagram coding format, and the audio loudness parameter is set by a user in real time, so that the set audio loudness parameter is obtained.
Step 306: and according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format.
Specifically, the stereo coding format refers to a coding format corresponding to stereo, such as an AACLC format.
In practical application, the audio stream to be processed can be transcoded into an audio stream in a stereo coding format based on the set audio loudness parameter according to the set transcoding strategy.
In one or more alternative embodiments of the present specification, the audio stream to be processed may be decoded and encoded based on the set audio loudness parameter, so as to obtain the audio stream in the stereo encoding format. That is to say, the to-be-processed audio stream is transcoded into an audio stream in a stereo coding format according to the set audio loudness parameter, and the specific implementation process may be as follows:
decoding the audio stream to be processed to obtain audio sampling data;
and according to the set audio loudness parameter and a set stereo coding strategy, coding the audio sampling data to obtain an audio stream in a stereo coding format.
Specifically, the audio sample data, that is, the audio data in PCM (pulse code modulation) format, is the format closest to the original audio, which is also called naked audio. Setting a stereo encoding strategy refers to an encoding method, scheme, or the like for converting a panoramic sound into a stereo sound.
In practical application, a decoder may be used to decode an audio stream to be processed to obtain original audio data, that is, audio sample data, and then an audio loudness parameter is set, and according to a set stereo coding strategy, an encoder is used to encode the audio sample data to obtain an audio stream in a stereo coding format, and a difference value between the loudness of the audio stream in the stereo coding format and the set audio loudness parameter is within a set range. Therefore, the audio transcoding efficiency can be improved, the quality of the audio stream in the stereo coding format can be improved, and the satisfaction degree of users can be improved.
In one or more alternative embodiments of the present description, the loudness of the audio stream may be adjusted based on setting audio loudness parameters while encoding. Namely, according to the set audio loudness parameter and according to a set stereo coding strategy, the audio sample data is coded to obtain an audio stream in the stereo coding format, and the specific implementation process may be as follows:
and coding the audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the audio sampling data within a set time length according to the set audio loudness parameter to obtain an audio stream in the stereo coding format.
Specifically, the sound level in the set time period refers to an average level of sound in a period of time.
In practical application, a set stereo coding strategy is utilized to set an encoder, then the set encoder encodes audio sampling data, and simultaneously a set audio loudness parameter is input into the encoder, so that the encoder adjusts the sound level, namely the loudness level, of the audio sampling data within a set time length, and further obtains an audio stream in a stereo coding format. Therefore, the accuracy and the quality of the audio stream in the transcoded stereo coding format can be improved, and the quality of the target audio stream is further improved.
In one or more alternative embodiments of the present description, the pending audio stream comprises a plurality of audio data packets; at this time, each audio data packet is decoded and encoded separately. That is, the audio stream to be processed is decoded to obtain audio sample data, and the specific implementation process may be as follows:
decoding each audio data packet to obtain sub-audio sampling data corresponding to each audio sampling data;
correspondingly, the encoding the audio sampling data according to the set audio loudness parameter and the set stereo encoding strategy to obtain an audio stream in a stereo encoding format includes:
for each sub-audio sampling data, according to the set audio loudness parameter and a set stereo coding strategy, coding the sub-audio sampling data to obtain a sub-audio stream in a stereo coding format;
and splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format.
Specifically, the audio packets, i.e., audiopacks, refer to the constituent units constituting the audio streams, i.e., each audio stream is composed of N audiopacks, and encoding and decoding can be performed on the audiopacks.
In practical application, for each audio data packet in the audio stream to be processed, a decoder may be used to decode a current audio data packet to obtain original sub-audio data, that is, sub-audio sample data, then an audio loudness parameter is set, according to a set stereo coding strategy, an encoder is used to encode the sub-audio sample data to obtain a sub-audio stream in a stereo coding format, and a difference between a loudness of the sub-audio stream in the stereo coding format and the set audio loudness parameter is within a set range. And after traversing all the audio data packets, splicing the sub audio streams of the stereo coding format corresponding to each audio data packet to obtain the audio stream of the stereo coding format. Therefore, the audio transcoding efficiency can be further improved, the quality of the audio stream in the stereo coding format is improved, and the satisfaction degree of users is improved.
It should be noted that the audio information in each audioack is different, such as loudness, bitrate, and the like. When transcoding is performed, a single audiopacket is processed, fixed encoding parameters are specified, namely, loudness parameters are set, actual data of the finally encoded audiopacket are different, and the data are processed independently according to stream information of the audiopacket.
Optionally, for each piece of sub-audio sample data, according to the set audio loudness parameter and a set stereo coding strategy, performing coding processing on the sub-audio sample data to obtain a sub-audio stream in a stereo coding format; splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format, which may further be: for each sub-audio sampling data, coding the sub-audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the sub-audio sampling data within a set time length according to a set audio loudness parameter to obtain a sub-audio stream in a stereo coding format; and splicing the sub audio streams of each stereo coding format to obtain the audio stream of the stereo coding format.
In an implementation embodiment of this specification, on the basis of obtaining the audio stream to be processed, the encoding format of the audio stream to be processed may be directly identified, so as to simplify the processing flow, thereby improving the processing efficiency.
In another practical embodiment of the present specification, on the basis of obtaining the audio stream to be processed, the audio track information of the audio stream to be processed may also be obtained first, and then the encoding format of the audio stream to be processed may be identified based on the audio track information. That is, before the obtaining the set audio loudness parameter, the method further includes:
acquiring audio track information of the audio stream to be processed;
and identifying the coding format of the audio stream to be processed under the condition that the audio track information is a single-channel audio track.
In particular, a track refers to a track of music, with each track or track corresponding to an audio stream. A single soundtrack refers to an audio stream to be processed that corresponds to one audio stream.
In practical application, the audio track information of the audio stream to be processed can be identified according to the data attribute information of the audio stream, and the audio track information of the audio stream to be processed can also be identified by detecting the audio stream to be processed through a media file information viewing tool.
For example, the mediainfo command may be invoked to identify the track information of the audio stream to be processed.
Therefore, by identifying the audio track information of the audio stream to be processed firstly and then identifying the coding format of the audio stream to be processed under the condition of a single-path audio track, the identification error caused by the audio stream to be processed in two different coding formats under the condition of a double-path audio track can be avoided, and the identification efficiency is further improved.
Optionally, the track information may also be two tracks, and at this time, transcoding processing needs to be performed on the to-be-processed audio stream corresponding to each track. That is, after the audio track information of the audio stream to be processed is obtained, the method further includes:
under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks;
and performing file format transcoding on the audio stream to be processed with the coding format being a stereo coding format to obtain a target audio stream with the stereo coding format, and/or performing file format transcoding on the audio stream to be processed with the coding format being a panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
Specifically, the one-way audio track refers to that the audio stream to be processed corresponds to two audio streams, that is, there are two audio streams to be processed.
In practical application, under the condition that the audio track information is a two-way audio track, a media file information viewing tool can be called to respectively identify the coding formats of the audio streams to be processed corresponding to the audio tracks, and the audio streams to be processed with the coding formats being stereo coding formats are transcoded to obtain target audio streams with the stereo coding formats, for example, stereo audio streams with acc file formats (audio streams to be processed with the stereo coding formats) are converted into stereo audio streams with mp3 file formats; and transcoding the audio stream to be processed with the coding format of the panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
Due to the fact that the two paths of sound tracks are adopted, only the audio stream to be processed in the stereo coding format needs to be transcoded into the target audio stream in the stereo coding format, the audio stream to be processed in the panoramic sound coding format needs to be transcoded into the target audio stream in the panoramic sound coding format, conversion between the stereo coding format and the panoramic sound coding format does not exist, loudness change before and after transcoding is very small, and therefore the set audio loudness parameter does not need to be obtained. Therefore, the transcoding quality is guaranteed, the processing flow can be simplified, and the processing efficiency is improved.
In addition, after identifying the encoding format of the audio stream to be processed, the method further includes: and under the condition that the coding format of the audio stream to be processed is the panoramic sound coding format, the audio stream to be processed is transcoded into a target audio stream in the panoramic sound coding format.
It should be noted that, the process is similar whether the audio stream to be processed in the stereo coding format is transcoded into the target audio stream in the stereo coding format or the audio stream to be processed in the panned coding format is transcoded into the target audio stream in the panned coding format. Namely, decoding the audio stream to be processed with the specified coding format to obtain audio sampling data; and according to the target file format, coding the audio sampling data to obtain a target audio stream with a specified coding format. The specified coding format is a stereo coding format or a panoramic coding format, and the target file format is a general file format, so that various playing devices can recognize the target audio stream for playing.
Step 308: and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
In practical application, after the audio stream in the stereo coding format is obtained, the audio stream in the stereo coding format may be subjected to file format conversion to obtain a target audio stream in the stereo coding format.
In one or more alternative embodiments of the present specification, a to-be-processed video stream may be obtained, and a to-be-processed audio stream in the to-be-processed video stream may be processed. That is, the obtaining of the audio stream to be processed includes:
acquiring a video stream to be processed, wherein the video stream to be processed comprises an audio stream to be processed and an image sequence to be processed;
correspondingly, after the obtaining of the multimedia stream to be processed, the method further includes:
transcoding the image sequence to be processed to obtain a target image sequence;
and aligning the target audio stream and the target image sequence to determine a target video stream.
In practical application, the object to be processed is a video stream to be processed, and the video stream to be processed includes an audio stream to be processed and an image sequence to be processed, where the image sequence to be processed refers to a plurality of images arranged in a certain order. Then, identifying the coding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameters, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format. And transcoding the image sequence to be processed to obtain a target image sequence in a target image format, wherein the target image format is a passed image format, namely an image format which can be displayed by display equipment. Further, the target audio stream and the target image sequence are aligned according to the corresponding time stamps to obtain a target video stream.
Referring to fig. 4A, fig. 4A shows a processing flow chart of an audio stream processing method according to an embodiment of the present application:
in the video production phase, the original audio meta-information uploaded by the user, namely the audio stream to be processed, is detected: and (4) carrying out pre-detection (meta-information detection) on the audio stream by using the mediainfo so as to acquire the coding format of the audio stream to be processed and corresponding audio track information. When the coding format is E-AC-3JOC (panoramically coded format), the audio stream to be processed is represented as panoramically sound; when the encoding format is AACLC (stereo encoding format), it indicates that the audio stream to be processed is stereo. When multiple audio tracks are available, the coding formats of different audio streams to be processed can be obtained simultaneously.
If the audio stream to be processed is detected to be a one-way audio track and the encoding format is E-AC-3JOC, namely the one-way Dolby panoramic sound track, the set audio loudness parameter-18 LUFS is obtained, the audio stream to be processed of the E-AC-3JOC is encoded into an AACLC target audio stream (panoramic sound- > stereo) with the audio loudness parameter-18 LUFS and the audio loudness parameter being in a universal file format. When the original audio stream is single-track Dolby panoramic sound, the loudness parameter for restoring the volume of sound to normal is adjusted firstly when the audio is transcoded, the actual loudness parameter is a fixed loudness value-18 LUFS, the loudness value is friendly to the audio with a large dynamic range, and the effect of actual application is that bass is not lost and treble is clear. In addition, the audio stream to be processed of the E-AC-3JOC is also required to be transcoded in a file format, so as to obtain an E-AC-3JOC target audio stream (panoram sound- > panoram sound) in a general file format.
If the audio stream to be processed is detected to be two paths of audio tracks, the coding format of the first path of audio track is E-AC-3JOC, and the coding format of the second path of audio track is AACLC, namely one path of Dolby panoramic sound audio track and one path of stereo audio track. When transcoding, the appropriate soundtrack is automatically selected for transcoding based on the desired outcome. And transcoding to generate corresponding stereo audio by using a stereo track in the original audio, and transcoding to generate corresponding panoramic audio by using a panoramic audio track in the original audio. Namely transcoding the file format of the audio stream to be processed corresponding to the first path of audio track to obtain an E-AC-3JOC target audio stream (panoramic sound- > panoramic sound) in a general file format; and transcoding the to-be-processed audio stream corresponding to the second path of audio track in a file format to obtain an AACLC target audio stream (stereo- > stereo) in a general file format.
The audio stream processing method provided by the application acquires an audio stream to be processed and identifies the coding format of the audio stream to be processed; acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format; according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format; and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format. The audio stream to be processed is coded into the audio stream in a stereo coding format through the acquired set audio loudness parameter, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and the normal production of stereo audio can be compatible.
The following will further describe the audio stream processing method by taking an application of the audio stream processing method provided by the present application in a movie scene as an example, with reference to fig. 4B. Fig. 4B shows a processing flow chart of an audio stream processing method applied to a movie scene according to an embodiment of the present application, which specifically includes the following steps:
step 402: the method comprises the steps of obtaining a video stream of a film to be processed, wherein the video stream of the film to be processed comprises an audio stream to be processed and an image sequence to be processed.
Step 404: and transcoding the image sequence to be processed to obtain a target image sequence.
Step 406: and acquiring the audio track information of the audio stream to be processed.
Step 408: in the case where the audio track information is a one-way audio track, the encoding format of the audio stream to be processed is identified.
Step 410: and acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format.
Step 412: and decoding the audio stream to be processed to obtain audio sampling data.
Optionally, the audio stream to be processed comprises a plurality of audio data packets;
the method for decoding the audio stream to be processed to obtain audio sample data includes:
and decoding each audio data packet to obtain sub-audio sample data corresponding to each audio sample data.
Step 414: and according to the set stereo coding strategy, coding the audio sampling data, and according to the set audio loudness parameter, adjusting the sound level of the audio sampling data in the set time length to obtain an audio stream in a stereo coding format.
Optionally, the audio stream to be processed comprises a plurality of audio data packets;
according to the set stereo coding strategy, coding the audio sampling data, and according to the set audio loudness parameter, adjusting the sound level of the audio sampling data in the set time length to obtain the audio stream in the stereo coding format, including:
for each sub-audio sampling data, coding the sub-audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the sub-audio sampling data within a set time length according to a set audio loudness parameter to obtain a sub-audio stream in a stereo coding format;
and splicing the sub audio streams of each stereo coding format to obtain the audio stream of the stereo coding format.
Step 416: and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
Step 418: and under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks.
Step 420: and transcoding the audio stream to be processed with the coding format being the stereo coding format by using the file format to obtain the target audio stream with the stereo coding format.
Step 422: and transcoding the file format of the audio stream to be processed with the coding format of the panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
Step 424: and aligning the target audio stream and the target image sequence to determine a target film video stream.
According to the audio stream processing method, the audio stream to be processed is coded into the audio stream in the stereo coding format through the acquired set audio loudness parameter, so that the target video stream is obtained, the problem of too low volume generated when the panoramic sound is converted into the stereo sound can be effectively solved, the audio quality is improved, the audio processing process is simplified, the processing efficiency is improved, and meanwhile, the normal production of the stereo audio can be compatible.
Corresponding to the above method embodiment, the present application further provides an audio stream processing apparatus embodiment, and fig. 5 shows a schematic structural diagram of an audio stream processing apparatus provided in an embodiment of the present application. As shown in fig. 5, the apparatus includes:
a first identification module 502 configured to obtain an audio stream to be processed and identify an encoding format of the audio stream to be processed;
an obtaining module 504, configured to obtain a set audio loudness parameter when a coding format of the audio stream to be processed is a panoramically coded format;
a first transcoding module 506, configured to transcode the audio stream to be processed into an audio stream in a stereo coding format according to the set audio loudness parameter;
a determining module 508 configured to determine, according to the audio stream in the stereo encoding format, a target audio stream after transcoding the audio stream to be processed.
Optionally, the first transcoding module 506 is further configured to:
decoding the audio stream to be processed to obtain audio sampling data;
and according to the set audio loudness parameter and a set stereo coding strategy, carrying out coding processing on the audio sampling data to obtain an audio stream in a stereo coding format.
Optionally, the first transcoding module 506 is further configured to:
and coding the audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the audio sampling data within a set time length according to the set audio loudness parameter to obtain an audio stream in the stereo coding format.
Optionally, the audio stream to be processed includes a plurality of audio data packets;
the first transcoding module 506, further configured to:
decoding each audio data packet to obtain sub-audio sampling data corresponding to each audio sampling data;
for each sub-audio sampling data, according to the set audio loudness parameter and a set stereo coding strategy, coding the sub-audio sampling data to obtain a sub-audio stream in a stereo coding format;
and splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format.
Optionally, the first identifying module 502 is further configured to:
acquiring a video stream to be processed, wherein the video stream to be processed comprises an audio stream to be processed and an image sequence to be processed;
the apparatus also includes a second transcoding module configured to:
transcoding the image sequence to be processed to obtain a target image sequence;
and aligning the target audio stream and the target image sequence to determine a target video stream.
Optionally, the apparatus further comprises a second identification module configured to:
acquiring audio track information of the audio stream to be processed;
and identifying the coding format of the audio stream to be processed under the condition that the audio track information is a single-channel audio track.
Optionally, the apparatus further comprises a third identifying module configured to:
under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks;
and performing file format transcoding on the audio stream to be processed with the coding format being a stereo coding format to obtain a target audio stream with the stereo coding format, and/or performing file format transcoding on the audio stream to be processed with the coding format being a panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
The application provides an audio stream processing apparatus, through the setting audio loudness parameter of acquireing, with the audio stream of pending audio stream transcoding stereo encoding form, and then obtain the target video stream, the volume that produces when can effectively solve the panoramic sound and change stereo is low problem excessively, when improving audio quality, has simplified the audio frequency processing procedure, improves the treatment effeciency, can compatible stereo audio's normal production simultaneously.
The above is a schematic scheme of an audio stream processing apparatus of the present embodiment. It should be noted that the technical solution of the audio stream processing apparatus belongs to the same concept as the technical solution of the audio stream processing method described above, and for details that are not described in detail in the technical solution of the audio stream processing apparatus, reference may be made to the description of the technical solution of the audio stream processing method described above.
FIG. 6 shows a block diagram of a computing device provided in accordance with an embodiment of the present application. The components of the computing device 600 include, but are not limited to, a memory 610 and a processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to store data.
Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The Access device 640 may include one or more of any type of network interface (e.g., a network interface controller) that may be wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave Access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the application, the above-described components of computing device 600, as well as other components not shown in fig. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 6 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 600 may also be a mobile or stationary server.
Wherein the processor 620, when executing the computer instructions, performs the steps of the audio stream processing method.
The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the audio stream processing method, and for details that are not described in detail in the technical solution of the computing device, reference may be made to the description of the technical solution of the audio stream processing method.
An embodiment of the present application further provides a computer readable storage medium storing computer instructions, which when executed by a processor implement the steps of the audio stream processing method as described above.
The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above audio stream processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above audio stream processing method.
The foregoing description has been directed to specific embodiments of this application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will appreciate that the embodiments described in this specification are presently considered to be preferred embodiments and that acts and modules are not required in the present application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (10)

1. An audio stream processing method, comprising:
acquiring an audio stream to be processed, and identifying the coding format of the audio stream to be processed;
acquiring a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramic sound coding format;
according to the set audio loudness parameter, the audio stream to be processed is coded into an audio stream in a stereo coding format;
and determining a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
2. The method of claim 1, wherein the transcoding the audio stream to be processed into an audio stream in a stereo encoding format according to the set audio loudness parameter comprises:
decoding the audio stream to be processed to obtain audio sampling data;
and according to the set audio loudness parameter and a set stereo coding strategy, carrying out coding processing on the audio sampling data to obtain an audio stream in a stereo coding format.
3. The method according to claim 2, wherein the encoding the audio sample data according to the set audio loudness parameter and the set stereo coding strategy to obtain the audio stream in the stereo coding format comprises:
and coding the audio sampling data according to a set stereo coding strategy, and adjusting the sound level of the audio sampling data within a set time length according to the set audio loudness parameter to obtain an audio stream in the stereo coding format.
4. A method according to claim 2 or 3, wherein the audio stream to be processed comprises a plurality of audio data packets;
the decoding processing of the audio stream to be processed to obtain audio sample data includes:
decoding each audio data packet to obtain sub-audio sampling data corresponding to each audio sampling data;
the step of coding the audio sampling data according to the set audio loudness parameter and the set stereo coding strategy to obtain an audio stream in a stereo coding format includes:
for each piece of sub-audio sampling data, according to the set audio loudness parameter and a set stereo coding strategy, coding the sub-audio sampling data to obtain a sub-audio stream in a stereo coding format;
and splicing the sub audio streams of the stereo coding format to obtain the audio stream of the stereo coding format.
5. The method of claim 1, wherein the obtaining the audio stream to be processed comprises:
acquiring a video stream to be processed, wherein the video stream to be processed comprises an audio stream to be processed and an image sequence to be processed;
after the obtaining of the multimedia stream to be processed, the method further includes:
transcoding the image sequence to be processed to obtain a target image sequence;
and aligning the target audio stream and the target image sequence to determine a target video stream.
6. The method of claim 1, wherein before obtaining the set audio loudness parameter, further comprising:
acquiring audio track information of the audio stream to be processed;
and identifying the coding format of the audio stream to be processed under the condition that the audio track information is a single-channel audio track.
7. The method according to claim 6, wherein after the obtaining of the track information of the audio stream to be processed, the method further comprises:
under the condition that the audio track information is two paths of audio tracks, respectively identifying the coding formats of the audio streams to be processed corresponding to the audio tracks;
and/or, carrying out file format transcoding on the audio stream to be processed with the coding format being a panoramic sound coding format to obtain a target audio stream with the panoramic sound coding format.
8. An audio stream processing apparatus, comprising:
the device comprises a first identification module, a second identification module and a processing module, wherein the first identification module is configured to acquire an audio stream to be processed and identify the coding format of the audio stream to be processed;
the acquisition module is configured to acquire a set audio loudness parameter under the condition that the coding format of the audio stream to be processed is a panoramagram coding format;
a first transcoding module configured to transcode the audio stream to be processed into an audio stream in a stereo coding format according to the set audio loudness parameter;
the determining module is configured to determine a target audio stream after transcoding the audio stream to be processed according to the audio stream in the stereo coding format.
9. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-7 when executing the computer instructions.
10. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
CN202211648646.1A 2022-12-21 2022-12-21 Audio stream processing method and device Pending CN115966216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211648646.1A CN115966216A (en) 2022-12-21 2022-12-21 Audio stream processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211648646.1A CN115966216A (en) 2022-12-21 2022-12-21 Audio stream processing method and device

Publications (1)

Publication Number Publication Date
CN115966216A true CN115966216A (en) 2023-04-14

Family

ID=87357300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211648646.1A Pending CN115966216A (en) 2022-12-21 2022-12-21 Audio stream processing method and device

Country Status (1)

Country Link
CN (1) CN115966216A (en)

Similar Documents

Publication Publication Date Title
CN112262585B (en) Ambient stereo depth extraction
JP6105062B2 (en) System, method, apparatus and computer readable medium for backward compatible audio encoding
US10356545B2 (en) Method and device for processing audio signal by using metadata
KR102190201B1 (en) Identifying codebooks to use when coding spatial components of a sound field
RU2672175C2 (en) Apparatus and method for low delay object metadata coding
KR101759005B1 (en) Loudspeaker position compensation with 3d-audio hierarchical coding
US20200013426A1 (en) Synchronizing enhanced audio transports with backward compatible audio transports
US10659904B2 (en) Method and device for processing binaural audio signal
CN102122509A (en) Multi-channel encoder and multi-channel encoding method
JP2022511159A (en) Converting audio signals captured in different formats to a smaller number of formats for easier encoding and decoding operations.
US20220286799A1 (en) Apparatus and method for processing multi-channel audio signal
US11081116B2 (en) Embedding enhanced audio transports in backward compatible audio bitstreams
Stein et al. Ambisonics depth extensions for six degrees of freedom
WO2019069710A1 (en) Encoding device and method, decoding device and method, and program
Purnhagen et al. Immersive audio delivery using joint object coding
CN115966216A (en) Audio stream processing method and device
KR20220107913A (en) Apparatus and method of processing multi-channel audio signal
US11062713B2 (en) Spatially formatted enhanced audio data for backward compatible audio bitstreams
CN108206983B (en) Encoder and method for three-dimensional sound signal compatible with existing audio and video system
CN113170270A (en) Spatial audio enhancement and reproduction
WO2022262758A1 (en) Audio rendering system and method and electronic device
WO2022262750A1 (en) Audio rendering system and method, and electronic device
EP4310839A1 (en) Apparatus and method for processing multi-channel audio signal
RU2798821C2 (en) Converting audio signals captured in different formats to a reduced number of formats to simplify encoding and decoding operations
Li et al. The perceptual lossless quantization of spatial parameter for 3D audio signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination