WO2023125169A1 - 音频处理方法、装置、设备及存储介质 - Google Patents

音频处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023125169A1
WO2023125169A1 PCT/CN2022/140468 CN2022140468W WO2023125169A1 WO 2023125169 A1 WO2023125169 A1 WO 2023125169A1 CN 2022140468 W CN2022140468 W CN 2022140468W WO 2023125169 A1 WO2023125169 A1 WO 2023125169A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
audio
decoding
identifier
preset
Prior art date
Application number
PCT/CN2022/140468
Other languages
English (en)
French (fr)
Inventor
刘尧
黄益修
韩立阳
鲍琳
王维斯
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023125169A1 publication Critical patent/WO2023125169A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture

Definitions

  • Embodiments of the present disclosure relate to the technical field of audio processing, for example, to an audio processing method, device, device, and storage medium.
  • the sources of audio files to be processed can be varied, but usually all need to be processed during the processing. Decode the audio file.
  • the decoding process is complicated and time-consuming, which affects the audio processing performance.
  • the front-end of the webpage processes audio files, it usually decodes the audio files in full.
  • the browser crashes, and operating a large amount of memory will seriously affect the performance of the machine.
  • the full decoding process is time-consuming, and it is difficult to guarantee the timeliness of audio processing.
  • Embodiments of the present disclosure provide an audio processing method, device, storage medium, and equipment, which can improve audio processing solutions in related technologies.
  • an embodiment of the present disclosure provides an audio processing method, including:
  • the preset frame sequence includes frame information of multiple audio frames in at least one audio resource
  • the frame information includes a frame identifier
  • the frame identifier includes an audio resource identifier and a frame index
  • the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs
  • the frame index is used to indicate that the corresponding audio frame belongs to all audio frames of the audio resource in the order
  • an audio processing device including:
  • the frame identification determination module is configured to determine a decoding start frame identification and a decoding end frame identification in a preset frame sequence, wherein the preset frame sequence contains frame information of multiple audio frames in at least one audio resource, so
  • the frame information includes a frame identifier, the frame identifier includes an audio resource identifier and a frame index, the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs, and the frame index is used to indicate that the corresponding audio frame is in The order among all audio frames of the belonging audio resource;
  • the data-to-be-decoded acquisition module is configured to acquire segment data to be decoded in the audio resource associated with the corresponding audio resource identifier according to the decoding start frame identifier and the decoding end frame identifier;
  • the decoding module is configured to decode the segment data to be decoded to obtain corresponding target decoded data.
  • an embodiment of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the computer program according to the present disclosure is implemented The audio processing method provided by the embodiment.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the audio processing method provided by the embodiment of the present disclosure is implemented.
  • FIG. 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of an audio processing solution provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of another audio processing method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of an audio playback control process provided by an embodiment of the present disclosure.
  • FIG. 5 is a structural block diagram of an audio processing device provided by an embodiment of the present disclosure.
  • Fig. 6 is a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure.
  • the method can be executed by an audio processing device, and can be applied to an application scenario of decoding audio.
  • the device can be implemented by software and/or hardware, and generally can be integrated in electronic equipment.
  • the electronic device can be a mobile device such as a mobile phone, a smart watch, a tablet computer, and a personal digital assistant; it can also be other devices such as a desktop computer.
  • the method includes:
  • Step 101 Determine the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, wherein the preset frame sequence includes frame information of multiple audio frames in at least one audio resource, and the frame information includes the frame identifier,
  • the frame identifier includes an audio resource identifier and a frame index.
  • the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs, and the frame index is used to indicate the sequence of the corresponding audio frame among all audio frames of the audio resource.
  • the audio resource can be understood as the original audio file, and the specific source is not limited. It can be an audio file stored locally on the electronic device, or an audio file stored on the server (such as the cloud), or other sources. source audio file.
  • the audio resource stored on the server may be an audio file uploaded by the user to the server, or may be an audio file converted (such as format conversion, etc.) from the audio file uploaded by the user.
  • An audio resource is associated with an audio resource identifier, and the audio resource identifier is used to represent the identity of the audio resource, which may be recorded as a resource identification (Identification, ID).
  • an audio file is composed of a series of encoded audio frames (Frame).
  • An audio frame can be understood as the smallest unit that can independently decode an audio clip.
  • the frame structure of an audio frame in an audio file of different formats may be different, based on According to the principle of acoustics, the duration of each frame is generally between 20ms (milliseconds) and 50ms.
  • Information related to it can be maintained in the audio frame, such as the resource ID associated with the audio frame (that is, the audio resource identifier associated with the audio resource to which the audio frame belongs), the order of the audio frame in all audio frames of the audio resource to which the audio frame belongs, and the audio frame in the The position in the audio resource to which the audio frame belongs, the data size of the audio frame, and the meta information of the audio resource to which the audio frame belongs.
  • the audio resource may be divided into frames to obtain frame information of multiple audio frames in the audio resource, and a preset frame sequence may be constructed according to the frame information.
  • framing processing can be understood as determining corresponding frame information for each audio frame in the audio resource.
  • the required information can be obtained in advance from the information maintained by multiple audio frames in one or more audio resources, and the frame information corresponding to multiple audio frames can be obtained by direct extraction and/or secondary calculation, etc.
  • the frame information may include a frame identifier, and may also include other information, which is not specifically limited.
  • the frame index is used to indicate the order of the corresponding audio frame in all audio frames of the audio resource.
  • the frame index of the first audio frame in the audio resource can be recorded as 0, and the frame index of the second audio frame can be recorded is 1, and so on.
  • the frame information of multiple audio frames can be arranged in a preset order to obtain a preset frame sequence, that is, objects in the preset frame sequence are sorted in units of frame information.
  • the preset sequence can be set according to actual needs, without specific limitation, and can also be dynamically adjusted according to actual needs during the application process.
  • the preset order may be sorted according to the audio resource identifier, that is, the frame information associated with the same audio resource identifier is arranged together, and the frame information associated with the same audio resource identifier may be sorted according to the frame index, that is, The ordering of the frame information is consistent with the original order of multiple audio frames in the audio resource to which they belong, and can also be ordered in other orders.
  • the frame index is 1
  • the preset order may be to interleave and sort frame information corresponding to different audio resources, for example, there may be frame information with resource ID 2 between two frame information with resource ID 1 .
  • the decoding start frame identifier can be understood as the frame identifier in the frame information corresponding to the first audio frame that needs to be decoded this time
  • the decoding end frame identifier can be understood as the frame identifier corresponding to the last audio frame that needs to be decoded this time.
  • the decoding start frame identifier and the decoding end frame identifier may be determined in the preset frame sequence.
  • the trigger conditions of preset decoding events are not limited, and can be set according to actual decoding requirements.
  • decoding requirements can include playback requirements, decoding data buffering requirements, audio-to-text requirements, and audio waveform drawing requirements, etc.
  • the decoding requirements can be based on the current
  • the usage scenario can be determined automatically, or it can be determined according to the operation input by the user.
  • the preset decoding event may indicate the requirement parameters of the current decoding requirement, and the requirement parameter may include, for example, a decoding start frame identifier, a decoding end frame identifier, or a target decoding duration, and the like.
  • the decoding start frame identifier and the decoding end frame identifier are determined in the preset frame sequence according to the required parameters.
  • the decoding start frame identifier and the decoding end frame identifier can be found in the preset frame sequence directly according to the decoding start frame identifier and the decoding end frame identifier;
  • the decoding start frame identifier can be found in the preset frame sequence according to the decoding start frame identifier, and the corresponding audio frame from the decoding start frame identifier Start to accumulate the duration of the audio frame corresponding to the subsequent frame information in the preset frame sequence in sequence until the target decoding time is reached, and determine the decoding end frame identifier according to the frame identifier in the frame information at this time.
  • Step 102 According to the decoding start frame identifier and the decoding end frame identifier, obtain the segment data to be decoded in the audio resource associated with the corresponding audio resource identifier.
  • the corresponding audio resource identifier can be understood as the audio resource identifier included in the decoding start frame identifier and/or the decoding end frame identifier.
  • the frame information to which the decoding start frame identifier belongs can be recorded as the start frame information
  • the frame information to which the decoding end frame identifier belongs can be recorded as the end frame information
  • the start frame information, the end frame information and the preset frame sequence The audio resource identifiers corresponding to the frame information (which can be recorded as intermediate frame information) between the start frame information and the end frame information are consistent, which means that the audio frames to be decoded come from the same audio resource, and can be decoded according to the start
  • the frame ID, the decoded frame ID, and the frame ID included in the intermediate frame information respectively contain the corresponding audio frames in the audio resource to obtain the segment data to be decoded.
  • the audio frames to be decoded come from at least two audio resources
  • the frame index included in the decoding start frame ID, decoding end frame ID and intermediate frame ID the corresponding sequence of audio frames can be obtained from the audio resource associated with the corresponding audio resource ID, and the segment data to be decoded can be obtained.
  • Step 103 Decode the segment data to be decoded to obtain corresponding target decoded data.
  • the segment data to be decoded can be decoded by using a preset decoding algorithm or calling a preset decoding interface, and the target decoded data required for this decoding can be determined according to the decoding result.
  • the audio processing method provided in the embodiment of the present disclosure determines the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, wherein the preset frame sequence contains frame information of multiple audio frames in at least one audio resource , the frame information contains a frame identifier, and the frame identifier includes an audio resource identifier and a frame index.
  • the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs
  • the frame index is used to indicate that the corresponding audio frame is in all audio files of the audio resource to which the corresponding audio frame belongs.
  • the segment data to be decoded in the audio resource associated with the corresponding audio resource identifier is obtained, and the segment data to be decoded is decoded to obtain the corresponding target decoded data.
  • the frame information of multiple audio frames in the audio resource is stored in sequence in advance, and when decoding is required, the data range to be decoded is accurately located according to the decoding start frame identifier and the decoding end frame identifier, and the corresponding Obtain and decode fragment data from the audio resources, without decoding the entire audio file, and realize on-demand decoding, making decoding more flexible and improving audio processing efficiency.
  • the determining the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence includes: determining the target decoding duration and the decoding start frame identifier;
  • the frame information corresponding to the start frame identifier is the starting point to start traversing.
  • the preset traversal termination condition is met, the decoding end frame identifier is determined according to the corresponding frame information; wherein, the preset traversal termination condition includes: the audio corresponding to the traversed frame information The cumulative duration of the frame reaches the target decoding duration.
  • the duration of each audio frame is generally related to the sampling rate of the associated audio resource, and the corresponding sampling rate can be obtained according to the audio resource identifier in the traversed current frame information, and then the duration of the audio frame corresponding to the current frame information can be determined.
  • the frame identifier in the current frame information may be determined as the decoding end frame identifier.
  • the preset traversal termination condition further includes at least one of the following: the audio resource identifier in the current frame information is inconsistent with the audio resource identifier in the previous frame information; the frame index in the current frame information is inconsistent with the previous frame index The frame indexes in the frame information are discontinuous; the frame index in the current frame information is the last one in the audio resources it belongs to.
  • determining the decoding end frame identifier according to the corresponding frame information includes: when any one of the preset traversal termination conditions is met, determining the decoding end according to the corresponding frame information Frame ID.
  • the segment data to be decoded can come from the same audio resource, and the audio frames in the segment data to be decoded can also be continuous.
  • the traversal is terminated to ensure that each When acquiring the segment data to be decoded for the first time, it is obtained from the same audio resource, which reduces the difficulty of obtaining the segment data to be decoded and improves the data acquisition efficiency.
  • the frame identifier in the previous frame information can be determined as the decoding end frame identifier; assuming that the frame index in the current frame information If it is not continuous with the frame index in the previous frame information, the frame identifier in the previous frame information can be determined as the decoding end frame identifier; assuming that the frame index in the current frame information is the last one in the audio resource to which it belongs, you can set The frame identifier in the current frame information is determined as the decoding end frame identifier.
  • the frame information also includes a frame offset and a frame data amount; according to the decoding start frame identifier and the decoding end frame identifier, the audio resource associated with the corresponding audio resource identifier is acquired
  • the segment data to be decoded in the resource includes: determining the audio resource associated with the audio resource identifier corresponding to the decoding start frame identifier as the target audio resource; according to the first frame offset corresponding to the decoding start frame identifier Determine the data start position, determine the data end position according to the second frame offset corresponding to the decoding end frame identifier and the frame data amount, determine the target data range according to the data start position and the data end position; obtain the The audio data within the target data range in the target audio resource is obtained to obtain segment data to be decoded. In this way, the segment data to be decoded can be obtained more quickly and accurately.
  • the frame offset may be understood as the starting position of the audio frame in the audio resource to which it belongs, and the unit may be bytes.
  • the amount of frame data can be understood as the size of the audio frame in the audio resource to which it belongs, and the unit is generally the same as the frame offset, which can be bytes.
  • the frame offset corresponding to the frame identifier can be understood as the frame offset contained in the frame information where the frame identifier is located, that is, the corresponding frame identifier and frame offset are in the same frame information, and the frame data amount is the same.
  • the decoding start frame identifier and the decoding end frame identifier correspond to the same audio resource, and the corresponding audio resource identifier can be determined according to any one of the two, and then the The associated audio resource is determined as the target audio resource.
  • the start position of the data to be acquired in the target audio resource can be determined according to the first frame offset
  • the end position of the data to be acquired in the target audio resource can be determined according to the second frame offset and the amount of frame data (for example, end The position can be expressed as the second frame offset+frame data amount-1), so as to obtain the target data range, and corresponding audio data can be extracted from the target audio resource according to the target data range.
  • the traversal starting from the frame information corresponding to the decoding start frame identifier in the preset frame sequence includes: determining the format of the audio frame corresponding to the decoding start frame identifier; If the above format is a preset format, start traversing from the frame information corresponding to the target frame index in the preset frame sequence, where the target frame index is the start frame in the decoding start frame identifier Based on the index, the frame index is obtained by tracing forward the preset frame index difference.
  • obtaining the segment data to be decoded in the audio resource associated with the corresponding audio resource identifier includes: according to the target frame identifier corresponding to the target frame index and The decoding end frame identifier is used to obtain segment data to be decoded in the audio resource associated with the corresponding audio resource identifier.
  • the audio frames may not be completely independent, and a certain number of audio frames (called pre-frames) can be traced back and added to the segment data to be decoded to ensure the integrity of the decoded data and accuracy.
  • the preset frame index difference can be set according to a preset format.
  • the preset format may include Moving Picture Experts Group Audio Layer III (MP3) format, and the corresponding preset frame index difference may be 1.
  • MP3 Moving Picture Experts Group Audio Layer III
  • the decoding start frame ID can be regarded as the target frame ID.
  • decoding the segment data to be decoded to obtain corresponding target decoded data includes: decoding the segment data to be decoded to obtain Corresponding initial decoded data; redundant decoded data is removed from the initial decoded data to obtain corresponding target decoded data, wherein the redundant decoded data includes the audio frame corresponding to the frame index before the start frame index Decode data.
  • the segment data to be decoded determined in the above steps contains redundant data of the preceding frame, therefore, the decoded data of the preceding frame is also contained in the initial decoded data obtained by decoding, in order to avoid decoding Data reuse, such as repeated playback, can eliminate the decoded data of the previous frame.
  • the method further includes: recording the decoding end frame identifier and the decoding duration corresponding to the target decoded data. After setting the above-mentioned preset traversal termination conditions, if the actual decoding duration is not equal to the target decoding duration, record the current decoding position and actual decoding duration in time to facilitate subsequent decoding on this basis.
  • the front end of the web page processes the audio file, it usually decodes the audio file in full.
  • the full decoding process is time-consuming, and it is difficult to ensure the timeliness of audio processing.
  • the audio processing solution in the embodiments of the present disclosure may be applicable to an application scenario of a web page (Web) front end.
  • the method can be applied to the front end of the webpage, and before the determination of the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, it also includes: framing the audio resource processing to obtain the frame information of multiple audio frames in the audio resource; and store the obtained frame information into the preset frame sequence at the front end of the web page. Maintaining the preset frame sequence on the front end of the webpage does not need to store the full amount of decoded data, reduces the large amount of memory usage, and improves the performance of browsers and devices.
  • the audio resources subjected to frame division processing may include all or part of the audio resources involved in this session.
  • the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence before determining the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, it further includes: acquiring meta information of audio resources, wherein the meta information includes storage of audio resources information, the storage information includes the storage location and/or resource data of the audio resource; the meta information is stored in the resource table at the front end of the web page, wherein the resource table includes the audio resource identification and the audio resource involved in this session Describe the association relationship of stored information.
  • the acquiring the segment data to be decoded in the audio resource associated with the corresponding audio resource identifier according to the decoding start frame identifier and the decoding end frame identifier includes: according to the decoding start frame identifier and The decoding end frame identifier obtains target storage information associated with the corresponding audio resource identifier from the resource table, and obtains segment data to be decoded based on the target storage information.
  • the storage information corresponding to the audio resource can be stored in the form of a storage resource table at the front end, and it is convenient to quickly obtain the fragment data to be decoded through the resource table.
  • the meta information may include global information of audio resources, and may include storage information of audio resources.
  • the storage information includes the storage location and/or resource data of the audio resource, and the storage location may include a Uniform Resource Locator (Uniform Resource Locator, URL) address or a local storage path, etc.
  • the resource data may be understood as the complete data of the audio resource.
  • the storage location and resource data can exist alternatively.
  • the meta information can also include the format of the audio resource (which can be an enumerated type), the total file size of the audio resource (the unit can be bytes), the total duration of the audio resource (the unit can be seconds), the audio resource Sampling rate (unit can be hertz), audio file channel number, and other information (such as custom information), etc.
  • it may further include: receiving a preset audio editing operation; performing a corresponding editing operation on corresponding frame information in the preset frame sequence according to the frame identifier to be adjusted indicated by the preset audio editing operation, To realize audio editing, wherein the editing operation includes deleting and/or adjusting the sequence of frame information.
  • the preset audio editing operations may include insertion, deletion, and sorting, etc., and the number of frame identifiers to be adjusted indicated by different preset audio editing operations may be different.
  • the The included audio resource IDs can be the same or different.
  • the frame identifier to be adjusted may include the frame identifier of the audio frame to be inserted (can be recorded as the first frame identifier, and the number may be one or more), and may also include an audio frame used to indicate the insertion position
  • the frame identifier (which may be recorded as the second frame identifier), for example, inserts the frame information corresponding to the first frame identifier after the frame information corresponding to the second frame identifier.
  • the frame identifier to be adjusted may include the frame identifier of the audio frame to be deleted.
  • the frame identifier to be adjusted may include multiple frame identifiers (which may be recorded as the third frame identifier) of the audio frame to be sorted, and the preset audio editing operation may also indicate the target sorting, according to the target sorting.
  • the frame information corresponding to the third frame marker is reordered for more precise audio editing.
  • the preset frame sequence further includes waveform summary information corresponding to a plurality of audio frames; the method further includes: in response to receiving a preset waveform drawing instruction, according to the preset waveform drawing instruction instruction
  • the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence is obtained; and the corresponding waveform diagram is drawn according to the target waveform summary information.
  • the waveform summary information corresponding to multiple audio frames is stored in the preset frame sequence.
  • the waveform summary information may include multiple amplitude values, and may also include the time interval between every two adjacent amplitude values.
  • the multiple amplitude values may be uniformly or non-uniformly distributed in the time dimension. Do limited.
  • the preset waveform drawing instructions can be automatically generated according to the current scene, or generated according to user input operations, etc.
  • the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence before the acquisition of the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence, it also includes: decoding the audio resource corresponding to the preset frame sequence; for the decoded For the decoded frame data of each audio frame, divide the current decoded frame data into a first preset number of sub-interval data, determine the interval amplitudes corresponding to the multiple sub-interval data, and determine the waveform corresponding to the current audio frame according to the multiple interval amplitudes Summary information: storing waveform summary information corresponding to a plurality of audio frames into the preset frame sequence, and establishing association with corresponding frame information.
  • the decoded frame data is divided into intervals, and the interval amplitude is determined in units of sub-intervals, so that the waveform summary information corresponding to multiple audio frames can be quickly and accurately obtained and stored in the preset frame sequence, which is convenient for subsequent waveform diagram drawing .
  • the intervals when dividing the intervals, can be divided according to equal intervals, that is, the size of each interval can be the same, so as to ensure the uniformity of the amplitude distribution of the intervals and reflect the change rule of the audio signal more accurately.
  • the first preset number can be determined according to the duration of the audio frame and the preset amplitude interval, such as the preset amplitude interval can be used to indicate that the amplitude is calculated every preset duration, for example, the amplitude is calculated every 20 ms, and the audio frame duration is 40ms, the first preset number may be 2, then the current decoded frame data is divided into 2 sub-interval data.
  • the maximum value of the amplitude in the sub-interval data may be determined as the interval amplitude.
  • the interval amplitudes can be summarized in the order of the corresponding sub-interval data to form the waveform summary information corresponding to the audio frame, and store it in the position of the frame information corresponding to the audio frame. Or add it to the frame information, so as to establish an association with the corresponding frame information.
  • a target duration (which may be the above-mentioned target decoding duration) can be set, and the audio resource is decoded in batches with the target duration as a unit, that is, in batches Determine the waveform summary information of the audio frames in this batch.
  • the decoded data can be deleted to reduce the occupation of storage resources.
  • Corresponding subsequence amplitude draw a waveform sketch according to the corresponding subsequence amplitudes of multiple subsequences.
  • the second preset number can be set according to actual needs, that is, the amplitude number to be output. For example, if you want to output the amplitude value of the entire preset frame sequence at the preset value equal division, the second preset number can be equal to the preset Set value.
  • the partially decoding the current subsequence and determining the subsequence amplitude corresponding to the current subsequence according to the decoding result includes: dividing the current subsequence into a third preset number of decoding units; Each decoding unit obtains the data to be decoded according to the start frame index corresponding to the current decoding unit and the number of preset decoding frames, and after decoding the data to be decoded, determines the maximum amplitude of the obtained decoded data as the current Decoding the unit amplitude of the unit; determining the maximum unit amplitude among the plurality of unit amplitudes as the subsequence amplitude corresponding to the current subsequence.
  • the decoding unit is used as a unit to perform partial decoding inside the decoding unit, so that the partially decoded data is distributed more evenly and accurately reflects the overall change law of the audio signal.
  • the maximum number and minimum number of audio frames contained in a single decoding unit can be preset, and the number of audio frames contained in a single decoding unit is estimated according to the total number of frames of the current subsequence, so that the number of audio frames is at the maximum number and the minimum number, and then determine a third preset number according to the total frame number and the audio frame number.
  • the target decoded data is used to store in the play buffer
  • the method further includes: determining whether to determine whether to decode in the preset frame sequence according to the data amount of unbroadcast decoded data in the play buffer Start frame ID and decoding end frame ID.
  • the playback buffer For audio playback scenarios, for the on-demand decoding audio decoding method, set the playback buffer to realize the filling playback mode, and dynamically determine whether to decode more audio data according to the amount of unplayed decoded data remaining in the buffer , to ensure smooth playback.
  • the decoding start frame identifier and the decoding end frame identifier are determined in the preset frame sequence.
  • the decoding start frame identifier can be determined according to the decoding end frame identifier recorded after the last decoding is completed, for example, the frame identifier of the next frame information of the frame information to which the decoding end frame identifier belongs in the preset frame sequence is determined as The current decoding start frame identifier.
  • it is applied to the front end of the webpage, and further includes: synchronizing the preset frame sequence corresponding to this session to the server.
  • the preset frame sequence can be synchronized to the server to ensure that the preset frame sequence is not lost when the webpage is refreshed or the like.
  • synchronize other relevant data of this session such as a resource list
  • other relevant data of this session such as a resource list
  • the data volume of the preset frame sequence may be relatively large. In this case, the preset frame sequence may be compressed and then synchronized.
  • FIG. 2 is a schematic structural diagram of an audio processing solution provided by an embodiment of the present disclosure.
  • the architecture mainly includes cloud, software development kit (Software Development Kit, SDK) and web container.
  • the audio processing method provided in the embodiments of the present disclosure can be realized through SDK, and the SDK can be understood as the packaging and interface exposure of the functions realized by the audio processing method, and the SDK can include a frame splitter, a decoder, a player, a waveform drawer, Serializers and compressors, etc.
  • the framer is responsible for analyzing the source file, extracting meta information and frame information; the decoder encapsulates the function of decoding audio by segment, serving the player and the waveform drawer; the player encapsulates the ability to load, decode and play audio in real time; the waveform
  • the renderer is responsible for drawing the waveform and constructing the waveform summary of the frame; the serializer is responsible for serializing the frame information into binary data and corresponding deserialization for persistent storage; the compressor is responsible for compressing and decompressing the serialized data ;
  • the web container contains the data that needs to be maintained by the front end of the webpage when applying the SDK, which may include resource tables, frame sequences (that is, preset frame sequences) and waveforms (waveform diagrams).
  • FIG. 3 is a schematic flowchart of another audio processing method provided by an embodiment of the present disclosure.
  • the embodiments of the present disclosure are refined based on multiple exemplary solutions in the foregoing embodiments, which can be understood in conjunction with FIG. 3 .
  • the method includes the steps of:
  • Step 301 Framing the audio resource to obtain the frame information of multiple audio frames in the audio resource and the meta information of the audio resource, storing the obtained frame information into the preset frame sequence at the front end of the web page, and storing the meta information into the web page In the resource table of the front end.
  • the source file corresponding to the audio resource may be divided into frames by using a framer.
  • the framing processing methods may be different.
  • the format of the audio resource can be analyzed first, and then the corresponding framing method can be matched, that is, the framing device of the corresponding format can be used for framing. deal with. For example, you can first determine the estimated file format based on the file name suffix, then detect the source file to determine whether the source file is in the estimated file format (that is, determine whether the file name suffix matches the real format), if the source file is an estimated file format, select the framer corresponding to the estimated file format.
  • the set of preset file formats may be traversed, the format matching the source file is determined as the target format, and then the frame splitter corresponding to the target format is selected.
  • the preset file format set may include all audio file formats supported by the embodiments of the present disclosure, such as MP3, MP4, Window Wave (Windows Wave, WAV) and Advanced Audio Coding (Advanced Audio Coding, AAC), etc. limited.
  • the meta information of the obtained audio resource can include audio format enumeration type (type), total audio file size (size), total audio duration (duration), audio file storage address (url), audio file integrity Data (data, generally exists differently from url), audio file sampling rate (sampleRate), and audio file channel count (channelCount), etc.
  • Frame information can include the resource id (uri) associated with the frame, the original order (index, generally starting with 0) of the frame in all frames of the original audio file, and the starting position (offset) of the frame in the original audio file , the size of the frame in the original audio file (size), and the number of sample points stored in each channel of the frame (sampleSize), etc.
  • the preset frame sequence can also include waveform summary information (wave), which is subsequently constructed by the waveform plotter.
  • wave wave
  • the storage space for the wave can be reserved, and will be filled after the waveform plotter obtains the waveform summary information.
  • Step 302 Divide the preset frame sequence into multiple sub-sequences, respectively determine the amplitudes of the sub-sequences corresponding to the multiple sub-sequences, and draw a waveform sketch according to the amplitudes of the sub-sequences corresponding to the multiple sub-sequences.
  • the preset frame sequence is divided into a second preset number of subsequences, and for each subsequence, the current subsequence is divided into a third preset number of decoding units, and for each decoding unit, according to the corresponding
  • the starting frame identifier and the preset number of decoding frames are used to obtain the data to be decoded.
  • the maximum amplitude in the obtained decoded data is determined as the unit amplitude of the current decoding unit, and the maximum unit amplitude among the multiple unit amplitudes is determined as Determine the subsequence amplitude corresponding to the current subsequence, and draw a waveform sketch according to the subsequence amplitudes corresponding to the multiple subsequences respectively.
  • the waveform drawing is divided into the first drawing and the drawing according to the waveform summary.
  • the two processes of the first drawing and the construction of the waveform summary can be performed in parallel.
  • the audio is partially decoded, and a rough waveform sketch can be drawn quickly.
  • the waveform sketch can be drawn by the waveform drawer.
  • the preset frame sequence (frames), resource table (resourceMap) and the number of amplitudes to be output (that is, the second preset number, which can be recorded as ampCount) can be input into the waveform drawer, and the entire preset Set the amplitude value of the frame sequence at the equal division of ampCount, that is, output ampCount amplitude values (each amplitude value range can be between 0 and 1) to form a waveform sketch.
  • start and end frame numbers of the current segment that is, the sequence numbers of the frame information in the preset frame sequence
  • (end-begin)/n may be calculated, rounded and adjusted to be between minSegLen and maxSegLen, wherein n may be preset, and the specific value is not limited, for example, it may be 10.
  • beginIndex (equivalent to the decoding start frame identifier)
  • call the decoder start decoding decodeFrameCount frame data from beginIndex
  • the audio resources are sampled and decoded on demand, which greatly reduces the time for the first drawing.
  • the longer the audio the more obvious the improvement effect. It has been verified that for 90 minutes of audio in MP3 format, the performance is improved compared with full decoding. More than 10 times.
  • Step 303 Decode the audio resource corresponding to the preset frame sequence, determine the waveform summary information corresponding to multiple audio frames, store the waveform summary information in the preset frame sequence, and establish association with the corresponding frame information.
  • the audio resource corresponding to the preset frame sequence is decoded, and for the decoded frame data of each decoded audio frame, the current decoded frame data is divided into a first preset number of sub-interval data, and the multiple sub-interval data are determined respectively corresponding interval amplitudes, and determine the waveform abstract information corresponding to the current audio frame according to the multiple interval amplitudes, store the waveform abstract information corresponding to the multiple audio frames in the preset frame sequence, and establish associations with the corresponding frame information.
  • a preset frame sequence (frames) and a resource table (resourceMap) can be input into the waveform drawer, and the wave attribute of each audio frame is output through the waveform drawer, that is, the waveform summary information.
  • the format can be Uin8Array, and each amplitude can be a value between 0 and 255.
  • the following parameters can be set: preset amplitude interval (msPerAmp) and target duration (decodeTime, that is, the duration of each decoding).
  • the decoder For example, call the decoder to perform full decoding with decodeTime as the single decoding target duration.
  • Step 304 Receive a preset audio editing operation, and perform corresponding editing operations on corresponding frame information in the preset frame sequence according to the frame identifier to be adjusted indicated by the preset audio editing operation, so as to realize audio editing.
  • the frame information may be arranged according to the order of the original audio frames in the audio resource.
  • the embodiment of the present disclosure does not need to edit the decoded data Editing can be done quickly by operating on the preset frame sequence to adjust the order of frame information.
  • Step 305 Determine the target decoding duration and the decoding start frame ID, start traversing in the preset frame sequence with the frame information corresponding to the decoding start frame ID as the starting point, and when any of the preset traversal termination conditions is met, according to The corresponding frame information determines the decoding end frame identifier.
  • the preset traversal termination conditions include: the cumulative duration of the audio frame corresponding to the traversed frame information reaches the target decoding duration, the audio resource identifier in the current frame information is inconsistent with the audio resource identifier in the previous frame information, and the frame in the current frame information The index is not continuous with the frame index in the previous frame information, and the frame index in the current frame information is the last one in the audio resources it belongs to.
  • frame information of audio frames of other audio resources may be interspersed between the frame information of two audio frames of the same audio resource.
  • the above-mentioned preset traversal termination condition is set, and the decoding end frame identifier is dynamically determined.
  • the playback buffer is usually empty. At this time, this step can be performed after the frame processing is completed.
  • the target decoding time can be determined according to the settings of the player, and the decoding start frame identifier can be a preset frame Frame ID in the first frame info in the sequence.
  • the continuation of the session it can be determined whether this step needs to be performed according to the actual situation.
  • FIG. 4 is a schematic diagram of an audio playback control process provided by an embodiment of the present disclosure.
  • a circular playback buffer area can be set, and whether a new segment needs to be decoded is currently determined through a data loading scheduling strategy.
  • the audio playback context (AudioContext) in Figure 4 can contain one or more audio processing nodes, such as ScriptPeocessor, which can process audio data through scripts, and control the content to be played by filling the audio processing nodes with decoded audio data.
  • the volume control node (GainNode) can be used for playback control, connect ScriptPeocessor to it when playing, and disconnect the two when pausing.
  • the playback buffer is also called the data buffer, which can be a ring buffer (RingBuffer).
  • the loaded audio data will be written into the playback buffer, and the audio data will be read from it and filled into ScriptPeocessor during playback.
  • the data loading scheduling strategy will constantly or periodically judge whether new data needs to be loaded as the playback progresses. If necessary, it will call the decoder to load new data and write it into the playback buffer.
  • the filling playback design can be realized based on ScriptPeocessor, so that it can better fit the scene of real-time loading and playback, ensure the fluency of playback, and enhance the perception and control of playback progress and status.
  • the preset frame sequence, resource table, decoding start frame identifier, target decoding duration, and decoding sampling rate can be input to the decoder, and the decoder outputs the actually decoded frame identifier, the actual decoding segment duration, and the decoded The sample data of the audio resource and whether it reaches the end of the file of the audio resource, etc.
  • the previous frame index of the frame index in the decoding start frame identifier can be determined first, and if the frame index exists, its corresponding frame identifier As a new decoding start frame identifier, and start frame information traversal, that is, start traversing from the frame information of the previous frame of the start frame to be decoded in the original audio.
  • Step 306 Determine the audio resource associated with the audio resource identifier corresponding to the decoding start frame identifier as the target audio resource, determine the data start position according to the first frame offset corresponding to the decoding start frame identifier, and determine the data start position according to the decoding end frame identifier The corresponding second frame offset and frame data amount determine the data end position, determine the target data range according to the data start position and data end position, obtain the target storage information associated with the target audio resource from the resource table, and store The information acquires the audio data within the target data range in the target audio resource, and obtains the segment data to be decoded.
  • the first frame (beginFrame) and the last frame (endFrame) that need to be decoded are obtained. Since the preset traversal termination condition can ensure that these two frames and the intermediate frame belong to the same audio resource and are in the source file The middle positions are continuous, and HyperText Transfer Protocol (HTTP) data requests can be made.
  • the request address is resourceMap[beginFrame.uri].url, and the request data range: beginFrame.offset ⁇ endFrame.offset+endFrame.size-1.
  • the segment data to be decoded (AudioClipData) can be obtained.
  • Step 307 Decode the segment data to be decoded to obtain the corresponding target decoded data, and record the decoding completion frame identifier and the decoding duration corresponding to the target decoded data.
  • the audio decoding interface (such as BaseAudioContext.decodeAudioDate) at the front end of the web page can be called to perform decoding to obtain decoded audio sample data.
  • the initial decoded data obtained by calling the audio decoding interface is trimmed to remove redundant decoded data to obtain target decoded data.
  • Step 308 Play the target decoded data.
  • the target decoded data obtained after decoding may be pushed to the playback buffer first, and when it needs to be played, it is filled into the audio processing node for playback.
  • Step 309 In response to receiving the preset waveform drawing instruction, according to the frame identifier to be drawn indicated by the preset waveform drawing instruction, obtain the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence, and draw the corresponding target waveform summary information according to the target waveform summary information. waveform diagram.
  • the waveform drawer can also be responsible for drawing the waveform diagram according to the waveform summary information.
  • drawing the waveform diagram according to the waveform summary information you can draw the waveform diagram corresponding to the entire preset frame sequence.
  • the frame identifier to be drawn can be all, or you can draw the waveform diagram corresponding to some frame information in the preset frame sequence.
  • the to-be-drawn identifier may include a start to-be-drawn frame identifier and an end to-be-drawn frame identifier.
  • the frame sequence, the resource table and the output amplitude quantity (which may be recorded as the preset amplitude quantity) may be input to the waveform drawer.
  • Divide the preset frame sequence into subsequences with preset amplitudes by time determine the start frame identifier and end frame identifier corresponding to the current subsequence for each subsequence, and traverse the frame information to which the start frame identifier belongs to the frame information to which the end frame identifier belongs
  • the waveform summary information corresponding to all the frame information in between determines the maximum amplitude value as the amplitude value corresponding to the current subsequence, and obtains the amplitude value of the preset amplitude number, thereby quickly obtaining the waveform diagram.
  • Step 310 Synchronize the preset frame sequence and resource table corresponding to this session with the server.
  • the preset frame sequence and resource table for the first time can be synchronized to the cloud, and the synchronization can also be continued during the duration of the session.
  • the synchronization process may be real-time, or triggered at a preset time interval, or triggered when a resource table or a preset frame sequence changes, which is not specifically limited.
  • the amount of data in the resource table is generally small and can be stored in JSON format without serialization and compression processing.
  • the preset frame sequence data volume is generally large, and the preset frame sequence can be serialized into a binary format, and then compressed, such as gzip compression, to meet the needs of network transmission.
  • the frame information per hour can only account for about 1.2M data volume effect.
  • a frame field enumeration (FrameField) and a value type (FrameType) of each field can be defined, so that each field name of a frame can be stored in a unit8, and field values can be read and written in a specific format.
  • the waveform summary information can adopt a custom data format, and its structure is that the first byte stores the number of amplitudes, and each subsequent byte stores the amplitude value of each amplitude. Traverse each field in the frame, write the field id in the format of unit8, and then write the specific value according to the field value type, and then process the next field in the same way. After all fields are serialized, write the total length in unit8 format at the beginning of the serialized result.
  • the serialization of multiple frames can splice the serialization results of each frame to obtain the serialization results of the preset frame sequence.
  • the audio processing method provided by the embodiment of the present disclosure performs frame processing on audio resources, and outputs frame sequences and resource tables.
  • audio decoding is required, on-demand decoding can be realized, and different audio resources and frames in different formats can be supported.
  • Mixed storage is performed in the preset frame sequence, and the required audio clips are automatically calculated and loaded based on the characteristics of the input and frames, making decoding more flexible and improving audio processing efficiency.
  • the parallel first drawing waveform process and waveform summary construction process the first drawing through partial decoding can greatly reduce the first drawing time, and after one-time construction of waveform summary information, the performance of subsequent waveform drawing can be greatly improved.
  • the resource table and frame sequence are synchronized to the cloud in a timely manner, and the amount of data transmission is reduced through serialization and compression processing to ensure that session information is not lost.
  • FIG. 5 is a structural block diagram of an audio processing device provided by an embodiment of the present disclosure.
  • the device can be implemented by software and/or hardware, and generally can be integrated into an electronic device, and can perform audio processing by executing an audio processing method. As shown in Figure 5, the device includes:
  • the frame identifier determining module 501 is configured to determine a decoding start frame identifier and a decoding end frame identifier in a preset frame sequence, wherein the preset frame sequence includes frame information of multiple audio frames in at least one audio resource,
  • the frame information includes a frame identifier, the frame identifier includes an audio resource identifier and a frame index, the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs, and the frame index is used to indicate the corresponding audio frame order among all audio frames of the belonging audio resource;
  • the data to be decoded acquisition module 502 is configured to acquire segment data to be decoded in the audio resource associated with the corresponding audio resource identifier according to the decoding start frame identifier and the decoding end frame identifier;
  • the decoding module 503 is configured to decode the segment data to be decoded to obtain corresponding target decoded data.
  • the audio processing device stores the frame information of multiple audio frames in the audio resource in sequence in advance, and accurately locates the frame information to be decoded according to the decoding start frame identifier and the decoding end frame identifier when decoding is required. Data range, obtain segment data from the corresponding audio resources and decode it, without decoding the entire audio file, realize on-demand decoding, make decoding more flexible, and improve audio processing efficiency.
  • the frame identification determination module includes: a first determination unit configured to determine a target decoding duration and a decoding start frame identification; a second determination unit configured to identify a corresponding frame with the decoding start frame in a preset frame sequence
  • the information is the starting point to start traversing, and when the preset traversing termination condition is satisfied, the decoding end frame identifier is determined according to the corresponding frame information.
  • the preset traversal termination condition includes: the accumulative duration of the audio frame corresponding to the traversed frame information reaches the target decoding duration.
  • the preset traversal termination condition further includes at least one of the following: the audio resource identifier in the current frame information is inconsistent with the audio resource identifier in the previous frame information; the frame index in the current frame information is inconsistent with the audio resource identifier in the previous frame information; The frame index is not continuous; the frame index in the current frame information is the last one in the audio resource it belongs to.
  • the second determination unit is configured to: when any one of the preset traversal termination conditions is satisfied, determine the decoding end frame identifier according to the corresponding frame information.
  • the frame information also includes a frame offset and a frame data amount.
  • the data to be decoded acquisition module includes: a target audio resource determination unit configured to determine the audio resource associated with the audio resource identifier corresponding to the decoding start frame identifier as a target audio resource; a target data range determination unit configured to Determine the data start position according to the first frame offset corresponding to the decoding start frame identifier, determine the data end position according to the second frame offset and the frame data amount corresponding to the decoding end frame identifier, and determine the data end position according to the data The start position and the end position of the data determine the target data range; the data acquisition unit is configured to acquire audio data within the target data range in the target audio resource, and obtain segment data to be decoded.
  • the second determination unit when the second determination unit performs traversal in the preset frame sequence starting from the frame information corresponding to the decoding start frame identifier, it is configured to: determine the audio frame corresponding to the decoding start frame identifier Format; when the format is a preset format, start traversing in the preset frame sequence starting from the frame information corresponding to the target frame index, where the target frame index is in the decoding start frame identifier The frame index obtained after tracing back the difference of the preset frame index based on the starting frame index; wherein, the data to be decoded acquisition module is set to: according to the target frame identifier corresponding to the target frame index and the end of decoding The frame ID is used to obtain the segment data to be decoded in the audio resource associated with the corresponding audio resource ID.
  • the decoding module is configured to: when the format is a preset format, decode the segment data to be decoded to obtain corresponding initial decoded data; remove redundant decoded data from the initial decoded data to obtain Corresponding target decoded data, wherein the redundant decoded data includes decoded data of an audio frame corresponding to a frame index preceding the start frame index.
  • the device further includes: a recording module, configured to record the decoded end frame identifier and the decoding duration corresponding to the target decoded data after decoding the segment data to be decoded to obtain the corresponding target decoded data .
  • a recording module configured to record the decoded end frame identifier and the decoding duration corresponding to the target decoded data after decoding the segment data to be decoded to obtain the corresponding target decoded data .
  • the device is integrated in the front end of the webpage, and further includes: a frame information acquisition module, configured to perform frame division processing on the audio resources before determining the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, to obtain The frame information of multiple audio frames in the audio resource; the frame information storage module is configured to store the obtained frame information into the preset frame sequence at the front end of the webpage.
  • a frame information acquisition module configured to perform frame division processing on the audio resources before determining the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, to obtain The frame information of multiple audio frames in the audio resource
  • the frame information storage module is configured to store the obtained frame information into the preset frame sequence at the front end of the webpage.
  • the device is integrated in the front end of the webpage, and further includes: a meta information acquisition module, configured to acquire the meta information of the audio resource before the decoding start frame identifier and the decoding end frame identifier are determined in the preset frame sequence, wherein, The meta-information includes the storage information of the audio resource, and the storage information includes the storage location and/or resource data of the audio resource; the meta-information storage module is configured to store the meta-information in the resource table at the front end of the webpage, Wherein, the resource table includes the association relationship between the audio resource identifier involved in this session and the stored information; correspondingly, the data acquisition module to be decoded is set to: according to the decoding start frame identifier and the decoding Ending the frame identifier, acquiring the target storage information associated with the corresponding audio resource identifier from the resource table, and acquiring segment data to be decoded based on the target storage information.
  • a meta information acquisition module configured to acquire the meta information of the audio resource before the decoding start frame identifier and the de
  • the device further includes: an editing operation receiving module, configured to receive a preset audio editing operation; an audio editing module, configured to, according to the frame identifier to be adjusted indicated by the preset audio editing operation, edit the preset frame sequence Corresponding editing operations are performed on the corresponding frame information to implement audio editing, wherein the editing operations include deleting and/or adjusting the sequence of the frame information.
  • an editing operation receiving module configured to receive a preset audio editing operation
  • an audio editing module configured to, according to the frame identifier to be adjusted indicated by the preset audio editing operation, edit the preset frame sequence
  • Corresponding editing operations are performed on the corresponding frame information to implement audio editing, wherein the editing operations include deleting and/or adjusting the sequence of the frame information.
  • the preset frame sequence also includes waveform summary information corresponding to a plurality of audio frames;
  • the device also includes: a waveform summary acquisition module, configured to draw a waveform according to the preset waveform in response to receiving a preset waveform drawing instruction
  • the frame identifier to be drawn indicated by the instruction obtains the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence;
  • the waveform diagram drawing module is configured to draw a corresponding waveform diagram according to the target waveform summary information.
  • the device includes: an audio resource decoding module, configured to decode the audio resource corresponding to the preset frame sequence before acquiring the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence;
  • the waveform abstract determination module is configured to divide the current decoded frame data into a first preset number of sub-interval data for the decoded frame data of each audio frame after decoding, determine the interval amplitudes corresponding to the plurality of sub-interval data respectively, and according to Multiple interval amplitudes determine the waveform summary information corresponding to the current audio frame;
  • the waveform summary storage module is set to store the waveform summary information corresponding to multiple audio frames into the preset frame sequence, and establish an association with the corresponding frame information .
  • the device includes: a first division module, configured to divide the preset frame sequence into a second preset number of sub-sequences; a sub-sequence amplitude determination module, configured to, for each sub-sequence, perform partial Decoding, and determining the subsequence amplitude corresponding to the current subsequence according to the decoding result; the waveform sketch drawing module is configured to draw a waveform sketch according to the subsequence amplitudes respectively corresponding to the multiple subsequences.
  • the subsequence amplitude determination module includes: a first division unit configured to divide the current subsequence into a third preset number of decoding units; a unit amplitude determination unit configured to, for each decoding unit, according to the current decoding unit Obtain the data to be decoded according to the corresponding start frame identifier and the preset number of decoded frames, and after decoding the data to be decoded, determine the maximum amplitude in the obtained decoded data as the unit amplitude of the current decoding unit; the subsequence amplitude The determination unit is configured to determine the maximum unit amplitude among the plurality of unit amplitudes as the subsequence amplitude corresponding to the current subsequence.
  • the target decoded data is used to store in the play buffer
  • the device further includes: a data volume determination module, configured to determine whether the decoded data is in the preset frame sequence according to the data volume of the unbroadcast decoded data in the play buffer. Determine the decoding start frame identifier and the decoding end frame identifier.
  • the device is applied to the front end of a webpage, and further includes: a synchronization module configured to synchronize the preset frame sequence corresponding to this session to the server.
  • FIG. 6 it shows a schematic structural diagram of an electronic device 600 suitable for implementing an embodiment of the present disclosure.
  • the electronic equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • PDA personal digital assistant
  • PAD tablet computer
  • PMP portable multimedia player
  • vehicle terminal such as mobile terminals such as car navigation terminals
  • fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be randomly accessed according to a program stored in a read-only memory (ROM) 602 or loaded from a storage device 608.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM memory
  • various programs and data necessary for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602.
  • the processing device 601 When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence, wherein, the preset frame sequence includes frame information of multiple audio frames in at least one audio resource, the frame information includes a frame identifier, the frame identifier includes an audio resource identifier and a frame index, and the audio resource identifier It is used to indicate the identity of the audio resource to which the corresponding audio frame belongs, and the frame index is used to indicate the order of the corresponding audio frame in all audio frames of the audio resource; according to the decoding start frame identifier and the decoding end frame ID, to obtain segment data to be decoded in the audio resource associated with the corresponding audio resource ID; decode the segment data to be decoded to obtain corresponding target decoded data.
  • the preset frame sequence includes frame information of multiple audio frames in at least one audio resource
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • the decoding module can also be described as "a module that decodes the segment data to be decoded to obtain corresponding target decoded data".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • an audio processing method including:
  • the preset frame sequence includes frame information of multiple audio frames in at least one audio resource
  • the frame information includes a frame identifier
  • the frame identifier includes an audio resource identifier and a frame index
  • the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs
  • the frame index is used to indicate that the corresponding audio frame belongs to all audio frames of the audio resource in the order
  • the determination of the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence includes:
  • the frame information corresponding to the decoding start frame identifier is used as the starting point to traverse, and when the preset traverse termination condition is satisfied, the decoding end frame identifier is determined according to the corresponding frame information;
  • the preset traversal termination conditions include:
  • the cumulative duration of the audio frames corresponding to the traversed frame information reaches the target decoding duration.
  • the preset traversal termination condition further includes at least one of the following:
  • the audio resource identifier in the current frame information is inconsistent with the audio resource identifier in the previous frame information
  • the frame index in the current frame information is not continuous with the frame index in the previous frame information
  • the frame index in the current frame information is the last one in the audio resource to which it belongs;
  • determining the decoding end frame identifier according to the corresponding frame information includes:
  • the decoding end frame identifier is determined according to the corresponding frame information.
  • the frame information also includes a frame offset and a frame data amount; according to the decoding start frame identifier and the decoding end frame identifier, the audio resources associated with the corresponding audio resource identifier are obtained to be Decode fragment data, including:
  • the traversal starting from the frame information corresponding to the decoding start frame identifier in the preset frame sequence includes:
  • the traversal starts from the frame information corresponding to the target frame index in the preset frame sequence, wherein the target frame index is the starting point in the decoding start frame identifier Based on the initial frame index, the frame index obtained by tracing forward the preset frame index difference;
  • obtaining segment data to be decoded in the audio resource associated with the corresponding audio resource identifier includes:
  • the segment data to be decoded in the audio resource associated with the corresponding audio resource identifier is acquired.
  • the decoding of the segment data to be decoded to obtain corresponding target decoded data includes:
  • the redundant decoded data is removed from the initial decoded data to obtain corresponding target decoded data, wherein the redundant decoded data includes decoded data of an audio frame corresponding to a frame index preceding the start frame index.
  • the segment data to be decoded after decoding the segment data to be decoded to obtain the corresponding target decoded data, it further includes:
  • the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence it also includes:
  • Framing the audio resource to obtain frame information of multiple audio frames in the audio resource
  • the obtained frame information is stored in the preset frame sequence at the front end of the webpage.
  • the decoding start frame identifier and the decoding end frame identifier in the preset frame sequence it also includes:
  • the meta-information includes storage information of the audio resource, and the storage information includes a storage location and/or resource data of the audio resource;
  • the resource table includes the association relationship between the audio resource identifiers involved in this session and the stored information
  • the acquisition of segment data to be decoded in the audio resource associated with the corresponding audio resource identifier according to the decoding start frame identifier and the decoding end frame identifier includes:
  • the target storage information associated with the corresponding audio resource identifier is obtained from the resource table, and the segment data to be decoded is obtained based on the target storage information.
  • the method also includes:
  • the frame identifier to be adjusted indicated by the preset audio editing operation perform a corresponding editing operation on the corresponding frame information in the preset frame sequence to realize audio editing, wherein the editing operation includes deleting the frame information and/or sequence adjustments.
  • the preset frame sequence also includes waveform summary information corresponding to multiple audio frames; the method further includes:
  • the target waveform summary information corresponding to the corresponding frame information in the preset frame sequence it also includes:
  • the waveform summary information corresponding to multiple audio frames is stored in the preset frame sequence and associated with the corresponding frame information.
  • the method also includes:
  • the partially decoding the current subsequence, and determining the subsequence amplitude corresponding to the current subsequence according to the decoding result includes:
  • the data to be decoded is obtained according to the start frame identifier corresponding to the current decoding unit and the number of preset decoding frames, and after decoding the data to be decoded, the maximum amplitude in the obtained decoded data is determined as the The unit amplitude of the current decoding unit;
  • the target decoded data is used to store in the playing buffer, and the method also includes:
  • the webpage when applied to the front end of the webpage, it also includes: synchronizing the preset frame sequence corresponding to this session to the server.
  • an audio processing device including:
  • the frame identification determination module is configured to determine a decoding start frame identification and a decoding end frame identification in a preset frame sequence, wherein the preset frame sequence contains frame information of multiple audio frames in at least one audio resource, so
  • the frame information includes a frame identifier, the frame identifier includes an audio resource identifier and a frame index, the audio resource identifier is used to indicate the identity of the audio resource to which the corresponding audio frame belongs, and the frame index is used to indicate that the corresponding audio frame is in The order among all audio frames of the belonging audio resource;
  • the data-to-be-decoded acquisition module is configured to acquire segment data to be decoded in the audio resource associated with the corresponding audio resource identifier according to the decoding start frame identifier and the decoding end frame identifier;
  • the decoding module is configured to decode the segment data to be decoded to obtain corresponding target decoded data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

音频处理方法、装置、设备及存储介质。该方法包括:在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,帧信息中包含帧标识,帧标识包括音频资源标识和帧索引,帧索引用于表示音频帧在音频资源的所有音频帧中的次序(101),根据解码起始帧标识和解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据(102),对待解码片段数据进行解码,得到对应的目标解码数据(103)。

Description

音频处理方法、装置、设备及存储介质
本申请要求在2021年12月30日提交中国专利局、申请号为202111654061.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及音频处理技术领域,例如涉及音频处理方法、装置、设备及存储介质。
背景技术
随着音频技术的发展,涉及到对音频进行播放或编辑等处理的应用场景越来越多,在这些应用场景中,需要处理的音频文件的来源可以很多样,但在处理过程中通常均需要对音频文件进行解码。
目前,当音频文件较大或时长较长时,解码过程复杂耗时,影响音频处理性能。以涉及网页(Web)前端的应用场景为例,网页前端在处理音频文件时,通常对音频文件进行全量解码,当音频文件较大或时长较长时,解码过程中很容易占用大量内存而导致浏览器崩溃,并且操作大量内存会严重影响机器性能,同时,全量解码过程比较耗时,难以保证音频处理的时效性。
发明内容
本公开实施例提供了音频处理方法、装置、存储介质及设备,可以改善相关技术中的音频处理方案。
第一方面,本公开实施例提供了一种音频处理方法,包括:
在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
对所述待解码片段数据进行解码,得到对应的目标解码数据。
第二方面,本公开实施例提供了一种音频处理装置,包括:
帧标识确定模块,设置为在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
待解码数据获取模块,设置为根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
解码模块,设置为对所述待解码片段数据进行解码,得到对应的目标解码数据。
第三方面,本公开实施例提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本公开实施例提供的音频处理方法。
第四方面,本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本公开实施例提供的音频处理方法。
附图说明
图1为本公开实施例提供的一种音频处理方法的流程示意图;
图2为本公开实施例提供的一种音频处理方案的架构示意图;
图3为本公开实施例提供的另一种音频处理方法的流程示意图;
图4为本公开实施例提供的一种音频播放控制过程示意图;
图5为本公开实施例提供的一种音频处理装置的结构框图;
图6为本公开实施例提供的一种电子设备的结构框图。
具体实施方式
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下述实施例中,每个实施例中同时提供了示例特征和示例实施方式,实施例中记载的多个特征可进行组合,形成多个示例方案,不应将每个编号的实施例仅视为一个技术方案。
图1为本公开实施例提供的一种音频处理方法的流程示意图,该方法可以由音频处理装置执行,可以适用于对音频进行解码的应用场景。其中该装置可由软件和/或硬件实现,一般可集成在电子设备中。所述电子设备可以为手机、智能手表、平板电脑以及个人数字助理等移动设备;也可为台式计算机等其他设备。如图1所示,该方法包括:
步骤101、在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,帧信息中包含帧标识,帧标识包括音频资源标识和帧索引,音频资源标识用于表示对应的音频帧所属音频资源的身份,帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序。
本公开实施例中,音频资源可以理解为原始的音频文件,具体来源不做限定,可以是电子设备本地存储的音频文件,也可以是服务端(如云端)存储的音频文件,还可以是其他来源的音频文件。存储于服务端的音频资源可以是用户上传至服务端的音频文件,也可以是对用户上传的音频文件进行转换(如格式转换等)后的音频文件等。音频资源关联有音频资源标识,音频资源标识用于表示音频资源的身份,可记为资源身份标识(Identification,ID)。
一般的,音频文件是由一系列编码后的音频帧(Frame)组成的,音频帧可以理解为可独立解码音频片段的最小单元,不同格式的音频文件中的音频帧的帧结构可能不同,基于声学原理,每帧时长一般在20ms(毫秒)到50ms之间。音频帧中可以维护与其相关的信息,例如音频帧所关联的资源ID(也即音频帧所属音频资源关联的音频资源标识)、音频帧在所属音频资源的所有音频帧中的次序、音频帧在所属音频资源中的位置、音频帧的数据量大小、以及音频帧所属音频资源的元信息等。
本公开实施例中,在本步骤之前,可以对音频资源进行分帧处理,得到音频资源中多个音频帧的帧信息,根据帧信息构建预设帧序列。其中,分帧处理可以理解为针对音频资源中的每个音频帧分别确定对应的帧信息。示例性的,可以预先从一个或多个音频资源中的多个音频帧所维护的信息中获取所需信息,通过直接提取和/或二次计算等方式,得到多个音频帧对应的帧信息。其中,帧信息中可包含帧标识,也可包括其他信息,具体不做限定。帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序,例如,音频资源中的第一个音频帧的帧索引可记为0,第二个音频帧的帧索引可记为1,以此类推。在得到多个音频帧的帧信息后,可以按照预设顺序对帧信息进行排列,得到预设帧序列,也即,预设帧序列中的对象是以帧信息为单位进行排序的。其中,预设顺序可以根据实际需求设置,具体不做限定,还可以在应用过程中根据实际需求动态调整。例如,预设顺序可以是按照音频资源标识排序,也即关联于同一音频资源标识的帧信息被排列在一起,对于关联于同一音频资源标识的帧信息,可以按照帧索引进行顺序排序,也即帧信息的排序与多个音频帧在所属音频资源中的原始次序一致,也可按照其他顺序排序,例如,相邻帧索引的帧信息之间可能间隔排有其他帧信息,如帧索引为1的帧信息和帧索引为2的帧信息之间可能存在帧索引为3的帧信息等。例如,预设顺序可以是对不同音频资源对应的帧信息进行穿插排序,例如资源ID为1的两个帧信息之间可能存在资源ID为2的帧信息等。
本公开实施例中,解码起始帧标识可以理解为本次需要解码的第一个音频帧对应的帧信息中的帧标识,解码结束帧标识可以理解为本次需要解码的最后一个音频帧对应的帧信息中的帧标识。示例性的,可以在检测到预设解码事件被触发时,在预设帧序列中确定解码起始帧标识和解码结束帧标识。预设解码事件的触发条件不做限定,可以根据实际的解码需求设定,解码需求例如可以包括播放需求、解码数据缓冲需求、音频转文字需求以及音频波形绘制需求等等,解码需求可以根据当前使用场景自动确定,也可以根据用户输入的操作来确定。预设解码事件可以指示当前解码需求的需求参数,需求参数例如可包括解码起始帧标识、解码结束帧标识、或目标解码时长等等。例如,根据需求参数在预设帧序列中确定解码起始帧标识和解码结束帧标识。例如,当需求参数包括解码起始帧标识和解码结束帧标识时,可以直接根据解码起始帧标识和解码结束帧标识在预设帧序列中查找到解码起始帧标识和解码结束帧标识;又如,当需求参数包括解码起始帧标识和目标解码时长时,可以先根据解码起始帧标识在预设帧序列中查找到解码起始帧标识,从解码起始帧标识对应的音频帧开始依次累加预设帧序列中后面的帧信息对应的音频帧的时长,直到达到目标解码时长,根据此时的帧信息中的帧标识确定解码结束帧标识。
步骤102、根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据。
示例性的,对应的音频资源标识可以理解为解码起始帧标识和/或解码结束帧标识中包含的音频资源标识。
示例性的,假设解码起始帧标识所属帧信息可记为起始帧信息,解码结束帧标识所属帧信息可记为结束帧信息,若起始帧信息、结束帧信息以及预设帧序列中处于起始帧信息和结束帧信息之间的帧信息(可记为中间帧信息)所对应的音频资源标识均一致,则说明需要解码的音频帧来自于同一个音频资源,可根据解码起始帧标识、解码结束帧标识和中间帧信息中包含的帧标识(可记为中间帧标识)中分别包含的帧索引获取该音频资源中的相应次序的音频帧,得到待解码片段数据。
示例性的,若起始帧信息、结束帧信息以及中间帧信息所对应的音频资源标识中存在至少两个不相同的音频资源标识,则说明需要解码的音频帧来自于至少两个音频资源,可根据解码起始帧标识、解码结束帧标识和中间帧标识中包含的帧索引分别到对应的音频资源标识所关联的音频资源中获取相应次序的音频帧,得到待解码片段数据。
步骤103、对所述待解码片段数据进行解码,得到对应的目标解码数据。
示例性的,在获取到待解码片段数据后,可以利用预设解码算法或调用预设解码接口对待解码片段数据进行解码,根据解码结果确定本次解码所需的目标解码数据。
本公开实施例中提供的音频处理方法,在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,帧信息中包含帧标识,帧标识包括音频资源标识和帧索引,音频资源标识用于表示对应的音频帧所属音频资源的身份,帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序,根据解码起始帧标识和解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,对待解码片段数据进行解码,得到对应的目标解码数据。通过采用上述技术方案,预先以序列形式存储音频资源中多个音频帧的帧信息,在需要解码时,根据解码起始帧标识和解码结束帧标识精准地定位所需解码的数据范围,从相应的音频资源中获取片段数据并进行解码,无需全量解码音频文件,实现按需解码,使得解码更灵活,提高音频处理效率。
在一些实施例中,所述在预设帧序列中确定解码起始帧标识和解码结束帧标识,包括:确定目标解码时长以及解码起始帧标识;在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,在满足预设遍历终止条件时,根据对应的帧信息确定解码结束帧标识;其中,所述预设遍历终止条件包括:已遍历帧信息对应的音频帧的累计时长达到目标解码时长。以逐帧遍历的方式累计所遍历的帧信息对应的音频帧的时长,达到目标解码时长时,结束遍历,可以根据解码起始帧标识和目标解码时长实现指定起始位置以及指定时长的音频帧数据的解码。其中,每个音频帧的时长一般跟所属音频资源的采样率相关,可根据所遍历的当前帧信息中的音频资源标识获取相应的采样率,进而确定当前帧信息对应的音频帧的时长。
示例性的,假设在累加当前帧信息对应的音频帧的时长后,得到的累计时长大于或等于目标解码时长,则可将当前帧信息中的帧标识确定为解码结束帧标识。
在一些实施例中,所述预设遍历终止条件还包括以下至少一项:当前帧信息中的音频资源标识与上一帧信息中的音频资源标识不一致;当前帧信息中的帧索引与上一帧信息中的帧索引不连续;当前帧信息中的帧索引为所属音频资源中的最后一个。其中,所述在满足预设遍历终止条件时,根据对应的帧信息确定解码结束帧标识,包括:在满足所述预设遍历终止条件中的任一项时,根据对应的帧信息确定解码结束帧标识。通过丰富预设遍历终止条件中的项目,可以使得待解码片段数据可以来自同一音频资源,还可以使得待解码片段数据中的音频帧是连续的,在满足任一项时,终止遍历,保证每次获取待解码片段数据时从同一个音频资源中获取,降低待解码片段数据的获取难度,提高数据获取效率。
示例性的,假设当前帧信息中的音频资源标识与上一帧信息的音频资源标识不一致,则可将上一帧信息中的帧标识确定为解码结束帧标识;假设当前帧信息中的帧索引与上一帧信息中的帧索引不连续,则可将上一帧信息中的帧标识确定为解码结束帧标识;假设当前帧信息中的帧索引为所属音频资源中的最后一个,则可将当前帧信息中的帧标识确定为解码结束帧标识。
在一些实施例中,所述帧信息中还包括帧偏移量和帧数据量;所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:将所述解码起始帧标识对应的音频资源标识所关联的音频资源确定为目标音频资源;根据所述解码起始帧标识对应的第一帧偏移量确定数据起始位置,根据所述解码结束帧标识对应的第二帧偏移量和帧数据量确定数据终止位置,根据所述数据起始位置和所述数据终止位置确定目标数据范围;获取所述目标音频资源中的所述目标数据范围内的音频数据,得到待解码片段数据。这样可以更加快速精准地获取待解码片段数据。
示例性的,帧偏移量可以理解为音频帧在所属音频资源中的起始位置,单位可以是字节。帧数据量可以理解为音频帧在所属音频资源中的大小,单位一般与帧偏移量相同,可以是字节。帧标识对应的帧偏移 量可以理解为帧标识所处的帧信息中包含的帧偏移量,也即相对应的帧标识和帧偏移量处于同一帧信息中,帧数据量同理。当预设遍历终止条件同时包含上述四项的情况下,可以保证解码起始帧标识和解码结束帧标识对应于同一个音频资源,可以根据两者中任意一个确定对应的音频资源标识,进而将所关联的音频资源确定为目标音频资源。根据第一帧偏移量可以确定目标音频资源中的待获取数据的起始位置,根据第二帧偏移量和帧数据量可以确定目标音频资源中的待获取数据的终止位置(例如,终止位置可表示为第二帧偏移量+帧数据量-1),从而得到目标数据范围,依照该目标数据范围可以从目标音频资源中提取相应的音频数据。
在一些实施例中,所述在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,包括:确定所述解码起始帧标识对应的音频帧的格式;在所述格式为预设格式的情况下,在预设帧序列中以目标帧索引对应的帧信息为起点开始遍历,其中,所述目标帧索引为在所述解码起始帧标识中的起始帧索引基础上向前追溯预设帧索引差值后得到的帧索引。其中,根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:根据所述目标帧索引对应的目标帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据。对于一些格式的音频资源来说,音频帧之间可能并非完全独立,可以向前追溯一定数量的音频帧(可称为前置帧)并加入待解码片段数据中,保证解码数据的完整性以及准确性。预设帧索引差值可以根据预设格式来设定。示例性的,预设格式可包括动态影像专家压缩标准音频层面3(Moving Picture Experts Group Audio Layer III,MP3)格式,对应的预设帧索引差值可以为1。需要说明的是,对于一些特殊情况,例如解码起始帧标识为0,则说明需要解码的是目标音频资源中的首个音频帧,此时可将解码起始帧标识视为目标帧标识。
在一些实施例中,在所述格式为预设格式的情况下,所述对所述待解码片段数据进行解码,得到对应的目标解码数据,包括:对所述待解码片段数据进行解码,得到对应的初始解码数据;在所述初始解码数据中去除冗余解码数据,得到对应的目标解码数据,其中,所述冗余解码数据包括所述起始帧索引之前的帧索引对应的音频帧的解码数据。对于预设格式来说,上述步骤中所确定的待解码片段数据中包含了多余的前置帧的数据,因此,解码得到的初始解码数据中也包含了前置帧的解码数据,为了避免解码数据的重复使用,如重复播放等,可以将前置帧的解码数据剔除。
在一些实施例中,在所述对所述待解码片段数据进行解码,得到对应的目标解码数据之后,还包括:记录所述解码结束帧标识和所述目标解码数据对应的解码时长。在设定上述预设遍历终止条件后,实际解码时长与目标解码时长存在不相等的情况,及时记录当前解码位置以及实际解码时长,便于后续在此基础上继续解码。
在涉及网页(Web)前端的应用场景中,网页前端在处理音频文件时,通常对音频文件进行全量解码,当音频文件较大或时长较长(如几十分钟,甚至超过1小时等)时,解码过程中很容易占用大量内存而导致浏览器崩溃,并且操作大量内存会严重影响机器性能,同时,全量解码过程比较耗时,难以保证音频处理的时效性。本公开实施例中的音频处理方案可以适用于网页(Web)前端的应用场景。
在一些实施例中,该方法可应用于网页前端,应用于网页前端,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:对音频资源进行分帧处理,得到所述音频资源中多个音频帧的帧信息;将所得帧信息存入网页前端的预设帧序列中。在网页前端维护预设帧序列,不需要存储全量解码数据,减少对内存的大量占用,提升浏览器以及设备的性能。其中,进行分帧处理的音频资源可以包括本次会话中涉及的全部或部分音频资源。
在一些实施例中,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:获取音频资源的元信息,其中,所述元信息中包括音频资源的存储信息,所述存储信息包括音频资源的存储位置和/或资源数据;将所述元信息存入网页前端的资源表中,其中,所述资源表中包括本次会话涉及的音频资源标识与所述存储信息的关联关系。相应的,所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:根据所述解码起始帧标识和所述解码结束帧标识,从所述资源表中获取对应的音频资源标识所关联的目标存储信息,并基于所述目标存储信息获取待解码片段数据。这样可以在前端通过存储资源表的形式存储音频资源对应的存储信息,方便通过资源表快速获取待解码片段数据。
示例性的,元信息可以包括音频资源的全局信息,可包括音频资源的存储信息。其中,存储信息包括音频资源的存储位置和/或资源数据,存储位置可以包括统一资源定位符(Uniform Resource Locator,URL)地址或本地存储路径等,资源数据可以理解为音频资源的完整数据。一般地,为了节省存储资源,存储位置和资源数据可以择一存在。此外,元信息中还可以包括如音频资源的格式(可以是枚举类型)、音频资源的文件总大小(单位可以为字节)、音频资源的总时长(单位可以是秒)、音频资源的采样率(单位可以是赫兹)、音频文件声道数、以及其他信息(如自定义信息)等。
在一些实施例中,还可包括:接收预设音频编辑操作;根据所述预设音频编辑操作指示的待调整帧标 识,对所述预设帧序列中相应的帧信息进行相应的编辑操作,以实现音频编辑,其中,所述编辑操作包括对帧信息进行删除和/或顺序调整。通过对帧序列中的帧信息进行编辑,实现音频帧粒度的音频编辑,无需操作原始资源数据,可以大大提升音频编辑效率和精准度。
示例性的,预设音频编辑操作可包括插入、删除以及排序等,不同的预设音频编辑操作所指示的待调整帧标识的数量可能不同,当待调整帧标识的数量为多个时,所包含的音频资源标识可以相同,也可以不同。例如,对于插入来说,待调整帧标识可以包括待插入的音频帧的帧标识(可记为第一帧标识,数量可以是一个或多个),还可包括用于表示插入位置的音频帧的帧标识(可记为第二帧标识),例如将第一帧标识对应的帧信息插入到第二帧标识对应的帧信息之后。又如,对于删除来说,待调整帧标识可以包括待删除的音频帧的帧标识。再如,对于排序来说,待调整帧标识可以包括待排序的音频帧的多个帧标识(可记为第三帧标识),预设音频编辑操作还可指示目标排序,按照目标排序对多个第三帧标识对应的帧信息进行重新排序,以实现更精准地音频编辑。
在一些实施例中,所述预设帧序列中还包括多个音频帧对应的波形摘要信息;所述方法还包括:响应于接收到预设波形绘制指令,根据所述预设波形绘制指令指示的待绘制帧标识,获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息;根据所述目标波形摘要信息绘制相应的波形图。在预设帧序列中存储多个音频帧对应的波形摘要信息,在需要进行波形图绘制时,不需要对音频数据进行解码,直接根据待绘制的音频帧的波形摘要信息绘制波形图,可有效提高波形图的绘制效率。
示例性的,波形摘要信息中可以包括多个振幅值,还可包括每两个相邻振幅值之间的时间间隔,多个振幅值在时间维度可以均匀分布,也可非均匀分布,具体不做限定。预设波形绘制指令可以根据当前场景自动生成,也可以在根据用户输入的操作生成等。
在一些实施例中,在所述获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息之前,还包括:对所述预设帧序列对应的音频资源进行解码;针对解码后的每个音频帧的解码帧数据,将当前解码帧数据划分为第一预设数量的子区间数据,确定多个子区间数据分别对应的区间振幅,并根据多个区间振幅确定当前音频帧对应的波形摘要信息;将多个音频帧对应的波形摘要信息存入所述预设帧序列中,并与相应的帧信息建立关联。对解码后的帧数据进行区间划分,以子区间为单位确定区间振幅,进而可以快速准确地得到多个音频帧对应的波形摘要信息并在预设帧序列中进行存储,方便后续的波形图绘制。
示例性的,在进行区间划分时,可以按照等间隔方式进行划分,也即每个区间的大小可以一致,保证区间振幅分布的均匀性,能够更准确地反映音频信号的变化规律。例如,可以根据音频帧的时长和预设振幅间隔来确定第一预设数量,如预设振幅间隔可以用于表示每隔预设时长计算一次振幅,例如每20ms计算一次振幅,音频帧时长为40ms,第一预设数量可以是2,则将当前解码帧数据划分为2个子区间数据。在确定子区间数据对应的区间振幅时,可将子区间数据中的振幅最大值确定为区间振幅。在得到一个音频帧对应的所有区间振幅后,可以对区间振幅按照对应的子区间数据的顺序进行汇总,形成该音频帧对应的波形摘要信息,存入与该音频帧对应的帧信息的位置,或加入该帧信息中,从而建立与相应的帧信息的关联。
示例性的,在对预设帧序列对应的音频资源进行解码时,可以设定目标时长(可以是上述目标解码时长),以目标时长为单位对音频资源进行分批解码,也即分批次确定本批次内音频帧的波形摘要信息,在单次解码完成并已确定波形摘要信息后,可以删除解码数据,减少对存储资源的占用。
在一些实施例中,还包括:将所述预设帧序列划分为第二预设数量的子序列;针对每个子序列,对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅;根据多个子序列分别对应的子序列振幅绘制波形草图。通过部分解码的方式,可以快速地有选择性地获取到部分振幅信息,从而及时获取到音频信号的整体变化规律。
示例性的,在进行子序列划分时,可以按照等间隔方式进行划分,也即每个子序列的大小可以一致,保证子序列振幅分布的均匀性,能够更准确地反映音频信号的整体变化规律。第二预设数量可以根据实际需求设置,也即需要输出的振幅数量,例如,想要输出整个预设帧序列在预设数值等分处的振幅值,则第二预设数量可以等于该预设数值。
在一些实施例中,所述对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅,包括:将当前子序列划分为第三预设数量的解码单元;针对每个解码单元,根据当前解码单元对应的起始帧索引和预设解码帧数获取待解码数据,对所述待解码数据进行解码后,将得到的解码数据中的最大振幅确定为所述当前解码单元的单元振幅;将多个单元振幅中的最大单元振幅确定为所述当前子序列对应的子序列振幅。在对子序列进行部分解码时,进行划分,以解码单元为单位进行解码单元内部的部分解码,使得被部分解码的数据分布更加均匀,准确地反映音频信号的整体变化规律。
示例性的,可以预先设置单个解码单元中所包含音频帧的最大数量和最小数量,根据当前子序列的总帧数预估单个解码单元中包含的音频帧数,使得该音频帧数处于最大数量和最小数量之间,再根据总帧数 与该音频帧数来确定第三预设数量。
在一些实施例中,所述目标解码数据用于存入播放缓冲区,所述方法还包括:根据所述播放缓冲区中的未播解码数据的数据量确定是否在预设帧序列中确定解码起始帧标识和解码结束帧标识。对于音频播放场景,针对按需解码音频解码方式,设置播放缓冲区,可以实现填充式播放模式,根据缓冲区中剩余的未播放解码数据的数据量来动态决定是否需要进行更多音频数据的解码,保证播放的流畅性。
示例性的,若未播解码数据的数据量小于预设数据量阈值,则在预设帧序列中确定解码起始帧标识和解码结束帧标识。其中,解码起始帧标识可以根据上次解码完成后记录的解码结束帧标识来确定,例如将预设帧序列中的该解码结束帧标识所属的帧信息的下一个帧信息的帧标识确定为当前的解码起始帧标识。
在一些实施例中,应用于网页前端,还包括:向服务端同步本次会话对应的预设帧序列。这样可以将预设帧序列同步至服务端,保证在网页刷新等情况下,预设帧序列不丢失。
示例性的,还可以向服务端同步本次会话的其他相关数据,例如资源表等。对于需要向服务端同步的数据,可以根据数据量确定是否需要进行压缩。一般的,当音频资源的时长较长或音频资源数量较多等,预设帧序列的数据量可能比较大,此时可以对预设帧序列进行压缩处理后再进行同步。
下面以网页前端应用场景为例,对本公开实施例进行说明。图2为本公开实施例提供的一种音频处理方案的架构示意图。该架构中主要包括云端、软件开发工具包(Software Development Kit,SDK)和Web容器。本公开实施例中提供的音频处理方法可以通过SDK实现,SDK可以理解为音频处理方法所实现功能的打包封装和接口暴露,SDK中可以包括分帧器、解码器、播放器、波形绘制器、序列化器和压缩器等。其中,分帧器负责分析源文件,提取元信息和帧信息;解码器封装按片段解码音频的功能,服务于播放器和波形绘制器;播放器封装音频实时加载、解码以及播放的能力;波形绘制器负责绘制波形、以及构建帧的波形摘要;序列化器负责将帧信息序列化为二进制数据以及相应的反序列化,便于持久化存储;压缩器负责对序列化后的数据进行压缩和解压缩;Web容器中包含网页前端在应用SDK时需要维护的数据,可包括资源表、帧序列(也即预设帧序列)以及波形(波形图)。
图3为本公开实施例提供的另一种音频处理方法的流程示意图,本公开实施例以上述实施例中多个示例方案为基础进行细化,可结合图3进行理解。
例如,该方法包括如下步骤:
步骤301、对音频资源进行分帧处理,得到音频资源中多个音频帧的帧信息以及音频资源的元信息,将所得帧信息存入网页前端的预设帧序列中,将元信息存入网页前端的资源表中。
示例性的,可利用分帧器对音频资源对应的源文件进行分帧处理。对于不同格式的音频文件,分帧处理方式可能不同,在分帧处理之前,可先对音频资源的格式进行分析,进而匹配相应的分帧方式,也即利用相应格式的分帧器进行分帧处理。例如,可以先根据文件名后缀确定预估文件格式,对源文件进行检测,判断源文件是否为预估文件格式(也即判断文件名后缀是否与真实格式匹配),若源文件为预估文件格式,则选择预估文件格式对应的分帧器。若不为预估文件格式,则可遍历预设文件格式集合,将与源文件匹配的格式确定为目标格式,进而选择目标格式对应的分帧器。预设文件格式集合可包括本公开实施例能够支持的所有音频文件格式,例如可包括MP3、MP4、窗波(Windows Wave,WAV)以及高级音频编码(Advanced Audio Coding,AAC)等,具体不做限定。
经过分帧处理后,得到的音频资源的元信息可包括音频格式枚举类型(type)、音频文件总大小(size)、音频总时长(duration)、音频文件存放地址(url)、音频文件完整数据(data,一般与url不同时存在)、音频文件采样率(sampleRate)以及音频文件声道数(channelCount)等。帧信息可包括该帧关联的资源id(uri)、该帧在原始音频文件所有帧中的原始次序(index,一般以0起始)、该帧在原始音频文件中的起始位置(offset)、该帧在原始音频文件中的大小(size)、以及该帧每声道存储的采样点个数(sampleSize)等。根据上述帧信息构建预设帧序列,帧标识包括uri和index。预设帧序列中还可包括波形摘要信息(wave),后续由波形绘制器构建,在构建预设帧序列时,可以预留wave的存储空间,待波形绘制器得到波形摘要信息后进行填充。
步骤302、将预设帧序列划分为多个子序列,分别确定多个子序列对应的子序列振幅,根据多个子序列分别对应的子序列振幅绘制波形草图。
例如,将预设帧序列划分为第二预设数量的子序列,针对每个子序列,将当前子序列划分为第三预设数量的解码单元,针对每个解码单元,根据当前解码单元对应的起始帧标识和预设解码帧数获取待解码数据,对待解码数据进行解码后,将得到的解码数据中的最大振幅确定为当前解码单元的单元振幅,将多个单元振幅中的最大单元振幅确定为当前子序列对应的子序列振幅,根据多个子序列分别对应的子序列振幅绘制波形草图。
示例性的,波形绘制分为首次绘制和根据波形摘要绘制,分帧完成后,首次绘制和构建波形摘要两个流程可并行进行。在首次绘制中,对音频进行部分解码,可以快速绘制出粗略的波形草图,本步骤中,可 以由波形绘制器进行波形草图的绘制。
例如,可向波形绘制器中输入预设帧序列(frames)、资源表(resourceMap)以及待输出的振幅数量(也即第二预设数量,可记为ampCount),通过波形绘制器输出整个预设帧序列在ampCount等分处的振幅值,也即输出ampCount个振幅值(每个振幅值范围可以在0到1之间),构成波形草图。
示例性的,可以设定如下参数:每个解码单元最小帧数(如minSegLen=6),每个解码单元最大帧数(如maxSegLen=60)以及每次解码的帧数(也即预设解码帧数,如decodeFrameCount=3)。
对于当前的预设帧序列,计算如果将其划分为ampCount个区间,每个区间的平均帧数avgRangeLen=frames.length/ampCount,其中,frames.length表示预设帧序列中的帧信息的个数。如果avgRangeLen小于minSegLen,说明ampCount过高导致需要解码的数据量过大,以至近似于全量解码,则可终止首次绘制流程,待波形摘要构建完成后根据摘要绘制波形图。若avgRangeLen不小于minSegLen,则可将frames按时间等分为ampCount个片段(子序列),针对每个片段执行下述操作:
a、记当前片段起止帧序号(也即在预设帧序列中的帧信息的序号)为begin到end;
b、根据当前片段长度end-begin计算解码单元长度segLen:
例如,可以计算(end-begin)/n,取整并调整到minSegLen到maxSegLen之间,其中,n可以预先设置,具体取值不做限定,例如可以是10。
c、根据segLen计算当前片段包含解码单元个数segCount(第三预设数量);
d、对于当前片段的每个解码单元,执行下列操作:
记当前解码单元起始帧位为beginIndex(相当于解码起始帧标识),调用解码器,从beginIndex开始解码decodeFrameCount帧数据,在解码结果中找到最大振幅作为当前解码单元的单元振幅;
e、得到当前片段中每个解码单元对应的单元振幅后,以最大的单元振幅作为当前片段的片段振幅(子序列振幅)。
在首次绘制波形时,对音频资源进行按需采样解码,大大降低了首次绘制时间,音频越长,提升效果越明显,经验证,对于90分钟的MP3格式的音频,性能相较于全量解码提升超过10倍。
步骤303、对预设帧序列对应的音频资源进行解码,确定多个音频帧对应的波形摘要信息,将波形摘要信息存入预设帧序列中,并与相应的帧信息建立关联。
例如,对预设帧序列对应的音频资源进行解码,针对解码后的每个音频帧的解码帧数据,将当前解码帧数据划分为第一预设数量的子区间数据,确定多个子区间数据分别对应的区间振幅,并根据多个区间振幅确定当前音频帧对应的波形摘要信息,将多个音频帧对应的波形摘要信息存入所述预设帧序列中,并与相应的帧信息建立关联。
示例性的,构建波形摘要流程中,可向波形绘制器中输入预设帧序列(frames)和资源表(resourceMap),通过波形绘制器输出每个音频帧的wave属性,也即波形摘要信息,格式可以为Uin8Array,每个振幅可以为0至255之间的数值。可以设定如下参数:预设振幅间隔(msPerAmp)和目标时长(decodeTime,也即每次解码的时长)。
例如,调用解码器,以decodeTime为单次解码目标时长进行全量解码,每次解码过程中,记录本次解码的帧范围beginIndex(相当于解码起始帧标识)至endIndex(相当于解码结束帧标识)以及解码后的数据Data,遍历其中包含的每个帧,对每一帧执行下列操作:
a、根据该帧起止时间计算出其所对应数据在Data中的范围:frameBeginSampleIndex到frameEndSampleIndex,并将其从Data中切出,记为frameData(解码帧数据);
b、根据该帧时长和msPerAmp参数计算出该帧需要产生的振幅值个数Count(第一预设数量),也即该帧时长/msPerAmp;
c、将frameData等分为Count个区间(子区间数据),针对每个区间,找到振幅最大值作为该区间的振幅(区间振幅),最终得到一个包含Count个振幅值的Uint8Array,记为波形摘要信息,将波形摘要信息作为wave属性添加到预设帧序列中的帧信息处。
通过一次性建立波形摘要并保存在预设帧序列中,使得后续波形绘制无需解码操作,性能非常高。
步骤304、接收预设音频编辑操作,根据预设音频编辑操作指示的待调整帧标识,对预设帧序列中相应的帧信息进行相应的编辑操作,以实现音频编辑。
示例性的,在首次构建预设帧序列时,可以按照音频资源中原始的音频帧的顺序对帧信息进行排列。在使用过程中,可能存在多种编辑需求,例如,想要将音频1中的一些音频帧插入至音频2中的某两个音频帧之间,此时,本公开实施例不需要对解码数据进行操作,通过对预设帧序列进行操作以调整帧信息的顺序即可快速完成编辑。
步骤305、确定目标解码时长以及解码起始帧标识,在预设帧序列中以解码起始帧标识对应的帧信息为起点开始遍历,在满足预设遍历终止条件中的任一项时,根据对应的帧信息确定解码结束帧标识。
其中,预设遍历终止条件包括:已遍历帧信息对应的音频帧的累计时长达到目标解码时长、当前帧信息中的音频资源标识与上一帧信息的音频资源标识不一致、当前帧信息中的帧索引与上一帧信息中的帧索引不连续、以及当前帧信息中的帧索引为所属音频资源中的最后一个。
示例性的,在初始的预设帧序列基础上经过编辑后,可能出现存在于同一音频资源的两个音频帧的帧信息之间穿插有其他音频资源的音频帧的帧信息的情况,此时为了保证参与解码的待解码数据来源于相同的音频资源且在音频资源中连续,设定上述预设遍历终止条件,动态确定解码结束帧标识。
例如,可根据播放缓冲区中的未播解码数据的数据量确定是否需要确定目标解码时长以及解码起始帧标识。网页首次打开时,播放缓冲区内通常为空,此时可以在分帧处理结束后执行本步骤,此时,目标解码时长可以根据播放器的设置确定,解码起始帧标识可以是预设帧序列中首个帧信息中的帧标识。在会话持续过程中,可以根据实际情况确定是否需要执行本步骤。
图4为本公开实施例提供的一种音频播放控制过程示意图,如图4所示,可设置环形的播放缓存区,通过数据加载调度策略确定当前是否需要解码新片段。图4中音频播放上下文(AudioContext)中可包含一个或多个音频处理节点,如ScriptPeocessor,可通过脚本处理音频数据,通过对该音频处理节点进行解码后的音频数据填充来控制要播放的内容。音量控制节点(GainNode)可以用于播放控制,播放时将ScriptPeocessor与其连接,暂停时将两者断开。播放缓冲区又称数据缓冲区,可以是环形缓冲区(RingBuffer),加载的音频数据会写入播放缓冲区,播放时从中读取音频数据填充至ScriptPeocessor。数据加载调度策略会随着播放进度的进行,不断地或定时地判断是否需要加载新的数据,如果需要,会调用解码器进行新数据的加载并写入播放缓冲区。本公开实施例中,可以基于ScriptPeocessor实现填充式播放设计,使其更好地贴合实时加载播放的场景,保证播放的流畅性,加强了对播放进度和状态的感知和控制。
示例性的,可以向解码器输入预设帧序列、资源表、解码起始帧标识、目标解码时长以及解码采样率,通过解码器输出实际解码到的帧标识、实际解码片段的时长、解码后的采样数据以及是否到达音频资源的文件末尾等。
示例性的,若解码起始帧标识对应的音频帧的帧类型为MP3,可以先确定解码起始帧标识中帧索引的上一个帧索引,若该帧索引存在,则将其对应的帧标识作为新的解码起始帧标识,并开始帧信息的遍历,也即从需要解码的起始帧在原始音频中的前一帧的帧信息开始遍历。
步骤306、将解码起始帧标识对应的音频资源标识所关联的音频资源确定为目标音频资源,根据解码起始帧标识对应的第一帧偏移量确定数据起始位置,根据解码结束帧标识对应的第二帧偏移量和帧数据量确定数据终止位置,根据数据起始位置和数据终止位置确定目标数据范围,从资源表中获取目标音频资源所关联的目标存储信息,并基于目标存储信息获取目标音频资源中的目标数据范围内的音频数据,得到待解码片段数据。
示例性的,在遍历结束后,得到需要解码的第一帧(beginFrame)和最后一帧(endFrame),由于预设遍历终止条件可以保证这两帧以及中间帧从属于同一音频资源且在源文件中位置连续,可以进行超文本传输协议(HyperText Transfer Protocol,HTTP)数据请求,请求地址为resourceMap[beginFrame.uri].url,请求数据范围:beginFrame.offset~endFrame.offset+endFrame.size-1。数据请求成功后,可以得到待解码片段数据(AudioClipData)。
步骤307、对待解码片段数据进行解码,得到对应的目标解码数据,并记录解码结束帧标识和目标解码数据对应的解码时长。
示例性的,在得到AudioClipData后,可以调用网页前端的音频解码接口(如BaseAudioContext.decodeAudioDate)进行解码,得到解码后的音频采样数据。
例如,对于上述音频帧为MP3格式的情况,对通过调用音频解码接口得到的初始解码数据进行裁剪,去除冗余解码数据,得到目标解码数据。
步骤308、对目标解码数据进行播放。
示例性的,如上文所述,解码后得到的目标解码数据可以先被推送至播放缓冲区,待需要播放时,填充至音频处理节点进行播放。
步骤309、响应于接收到预设波形绘制指令,根据预设波形绘制指令指示的待绘制帧标识,获取预设帧序列中相应的帧信息对应的目标波形摘要信息,根据目标波形摘要信息绘制相应的波形图。
示例性的,波形绘制器还可负责根据波形摘要信息绘制波形图。在根据波形摘要信息绘制波形图时,可以绘制整个预设帧序列对应的波形图,此时待绘制帧标识可以为全部,也可以绘制预设帧序列中部分帧信息对应的波形图,此时待绘制标识可以包括起始待绘制帧标识和结束待绘制帧标识。
示例性的,以绘制整个预设帧序列对应的波形图为例,可以向波形绘制器输入帧序列、资源表以及输出的振幅数量(可记为预设振幅数量)。将预设帧序列按时间等分为预设振幅数量个子序列,针对每个子序列,确定当前子序列对应的开始帧标识和终止帧标识,遍历开始帧标识所属帧信息到终止帧标识所属帧 信息之间的所有帧信息对应的波形摘要信息,将最大的振幅值确定为当前子序列对应的振幅值,得到预设振幅数量的振幅值,从而快速得到波形图。
步骤310、向服务端同步本次会话对应的预设帧序列和资源表。
示例性的,在首次得到预设帧序列和资源表后,可以向云端同步,在会话持续过程中,也可继续同步。需要说明的是,该同步过程可以是实时的,也可以是每隔预设时间间隔触发的,还可以是资源表或预设帧序列发生变化时触发的,具体不做限定。
示例性的,资源表的数据量一般较少,可以以JSON格式存储,可以不进行序列化以及压缩处理。预设帧序列数据量一般较大,可以将预设帧序列序列化为二进制格式,再进行压缩,如gzip压缩,以满足网络传输的需要,一般可以达到每小时帧信息仅占约1.2M数据量的效果。
示例性的,可定义帧字段枚举(FrameField)以及每个字段的值类型(FrameType),这样帧的每个字段名可以用一个unit8存储,字段值以特定的格式进行读写。波形摘要信息可以采用自定义数据格式,其结构为第一个字节存储振幅个数,后面每个字节存储每个振幅的振幅值。遍历帧中每个字段,以unit8的格式写入字段id,再根据字段值类型写入具体值,之后以相同的方式处理下一个字段。所有字段序列化完毕后,在序列化结果开头以unit8的格式写入总长度。多个帧的序列化可以将每帧的序列化结果进行拼接,得到预设帧序列的序列化结果。
本公开实施例提供的音频处理方法,对音频资源进行分帧处理,并输出帧序列和资源表,在需要对音频进行解码时,可以实现按需解码,且可以支持不同音频资源以及不同格式帧在预设帧序列中进行混合存储,基于输入和帧的特性自动计算并加载所需要的音频片段,使得解码更灵活,提高音频处理效率。采用并行的首次绘制波形流程和波形摘要构建流程,通过部分解码方式进行首次绘制可以大大降低首次绘制时间,一次性构建波形摘要信息后,可以大大提升后续波形图绘制的性能。并且,及时将资源表和帧序列同步至云端,通过序列化以及压缩处理减少数据传输量,保证会话信息不丢失。
图5为本公开实施例提供的一种音频处理装置的结构框图,该装置可由软件和/或硬件实现,一般可集成在电子设备中,可通过执行音频处理方法来进行音频处理。如图5所示,该装置包括:
帧标识确定模块501,设置为在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
待解码数据获取模块502,设置为根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
解码模块503,设置为对所述待解码片段数据进行解码,得到对应的目标解码数据。
本公开实施例中提供的音频处理装置,预先以序列形式存储音频资源中多个音频帧的帧信息,在需要解码时,根据解码起始帧标识和解码结束帧标识精准地定位所需解码的数据范围,从相应的音频资源中获取片段数据并进行解码,无需全量解码音频文件,实现按需解码,使得解码更灵活,提高音频处理效率。
例如,帧标识确定模块包括:第一确定单元,设置为确定目标解码时长以及解码起始帧标识;第二确定单元,设置为在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,在满足预设遍历终止条件时,根据对应的帧信息确定解码结束帧标识。其中,所述预设遍历终止条件包括:已遍历帧信息对应的音频帧的累计时长达到目标解码时长。
例如,所述预设遍历终止条件还包括以下至少一项:当前帧信息中的音频资源标识与上一帧信息中的音频资源标识不一致;当前帧信息中的帧索引与上一帧信息中的帧索引不连续;当前帧信息中的帧索引为所属音频资源中的最后一个。其中,所述第二确定单元设置为:在满足所述预设遍历终止条件中的任一项时,根据对应的帧信息确定解码结束帧标识。
例如,所述帧信息中还包括帧偏移量和帧数据量。所述待解码数据获取模块,包括:目标音频资源确定单元,设置为将所述解码起始帧标识对应的音频资源标识所关联的音频资源确定为目标音频资源;目标数据范围确定单元,设置为根据所述解码起始帧标识对应的第一帧偏移量确定数据起始位置,根据所述解码结束帧标识对应的第二帧偏移量和帧数据量确定数据终止位置,根据所述数据起始位置和所述数据终止位置确定目标数据范围;数据获取单元,设置为获取所述目标音频资源中的所述目标数据范围内的音频数据,得到待解码片段数据。
例如,所述第二确定单元在执行在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历时,设置为:确定所述解码起始帧标识对应的音频帧的格式;在所述格式为预设格式的情况下,在预设帧序列中以目标帧索引对应的帧信息为起点开始遍历,其中,所述目标帧索引为在所述解码起始帧标识中的起始帧索引基础上向前追溯预设帧索引差值后得到的帧索引;其中,所述待解码数据获取模块设置为:根据所述目标帧索引对应的目标帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源 中的待解码片段数据。
例如,解码模块设置为:在所述格式为预设格式的情况下,对所述待解码片段数据进行解码,得到对应的初始解码数据;在所述初始解码数据中去除冗余解码数据,得到对应的目标解码数据,其中,所述冗余解码数据包括所述起始帧索引之前的帧索引对应的音频帧的解码数据。
例如,该装置还包括:记录模块,设置为在所述对所述待解码片段数据进行解码,得到对应的目标解码数据之后,记录所述解码结束帧标识和所述目标解码数据对应的解码时长。
例如,该装置集成于网页前端,还包括:帧信息获取模块,设置为在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,对音频资源进行分帧处理,得到所述音频资源中多个音频帧的帧信息;帧信息存入模块,设置为将所得帧信息存入网页前端的预设帧序列中。
例如,该装置集成于网页前端,还包括:元信息获取模块,设置为在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,获取音频资源的元信息,其中,所述元信息中包括音频资源的存储信息,所述存储信息包括音频资源的存储位置和/或资源数据;元信息存入模块,设置为将所述元信息存入网页前端的资源表中,其中,所述资源表中包括本次会话涉及的音频资源标识与所述存储信息的关联关系;相应的,所述待解码数据获取模块设置为:根据所述解码起始帧标识和所述解码结束帧标识,从所述资源表中获取对应的音频资源标识所关联的目标存储信息,并基于目标存储信息获取待解码片段数据。
例如,该装置还包括:编辑操作接收模块,设置为接收预设音频编辑操作;音频编辑模块,设置为根据所述预设音频编辑操作指示的待调整帧标识,对所述预设帧序列中相应的帧信息进行相应的编辑操作,以实现音频编辑,其中,所述编辑操作包括对帧信息进行删除和/或顺序调整。
例如,所述预设帧序列中还包括多个音频帧对应的波形摘要信息;该装置还包括:波形摘要获取模块,设置为响应于接收到预设波形绘制指令,根据所述预设波形绘制指令指示的待绘制帧标识,获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息;波形图绘制模块,设置为根据所述目标波形摘要信息绘制相应的波形图。
例如,该装置包括:音频资源解码模块,设置为在所述获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息之前,对所述预设帧序列对应的音频资源进行解码;波形摘要确定模块,设置为针对解码后的每个音频帧的解码帧数据,将当前解码帧数据划分为第一预设数量的子区间数据,确定多个子区间数据分别对应的区间振幅,并根据多个区间振幅确定当前音频帧对应的波形摘要信息;波形摘要存入模块,设置为将多个音频帧对应的波形摘要信息存入所述预设帧序列中,并与相应的帧信息建立关联。
例如,该装置包括:第一划分模块,设置为将所述预设帧序列划分为第二预设数量的子序列;子序列振幅确定模块,设置为针对每个子序列,对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅;波形草图绘制模块,设置为根据多个子序列分别对应的子序列振幅绘制波形草图。
例如,所述子序列振幅确定模块包括:第一划分单元,设置为将当前子序列划分为第三预设数量的解码单元;单元振幅确定单元,设置为针对每个解码单元,根据当前解码单元对应的起始帧标识和预设解码帧数获取待解码数据,对所述待解码数据进行解码后,将得到的解码数据中的最大振幅确定为所述当前解码单元的单元振幅;子序列振幅确定单元,设置为将多个单元振幅中的最大单元振幅确定为所述当前子序列对应的子序列振幅。
例如,所述目标解码数据用于存入播放缓冲区,所述装置还包括:数据量判定模块,设置为根据所述播放缓冲区中的未播解码数据的数据量确定是否在预设帧序列中确定解码起始帧标识和解码结束帧标识。
例如,该装置应用于网页前端,还包括:同步模块,设置为向服务端同步本次会话对应的预设帧序列。
下面参考图6,其示出了适于用来实现本公开实施例的电子设备600的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行多种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的多种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有多种装置的电子设备600,但是应理解的是, 并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;对所述待解码片段数据进行解码,得到对应的目标解码数据。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定,例如,解码模块还可以被描述为“对所述待解码片段数据进行解码,得到对应的目标解码数据的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或 多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,提供了一种音频处理方法,包括:
在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
对所述待解码片段数据进行解码,得到对应的目标解码数据。
例如,所述在预设帧序列中确定解码起始帧标识和解码结束帧标识,包括:
确定目标解码时长以及解码起始帧标识;
在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,在满足预设遍历终止条件时,根据对应的帧信息确定解码结束帧标识;
其中,所述预设遍历终止条件包括:
已遍历帧信息对应的音频帧的累计时长达到目标解码时长。
例如,所述预设遍历终止条件还包括以下至少一项:
当前帧信息中的音频资源标识与上一帧信息中的音频资源标识不一致;
当前帧信息中的帧索引与上一帧信息中的帧索引不连续;
当前帧信息中的帧索引为所属音频资源中的最后一个;
其中,所述在满足预设遍历终止条件时,根据对应的帧信息确定解码结束帧标识,包括:
在满足所述预设遍历终止条件中的任一项时,根据对应的帧信息确定解码结束帧标识。
例如,所述帧信息中还包括帧偏移量和帧数据量;所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
将所述解码起始帧标识对应的音频资源标识所关联的音频资源确定为目标音频资源;
根据所述解码起始帧标识对应的第一帧偏移量确定数据起始位置,根据所述解码结束帧标识对应的第二帧偏移量和帧数据量确定数据终止位置,根据所述数据起始位置和所述数据终止位置确定目标数据范围;
获取所述目标音频资源中的所述目标数据范围内的音频数据,得到待解码片段数据。
例如,所述在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,包括:
确定所述解码起始帧标识对应的音频帧的格式;
在所述格式为预设格式的情况下,在预设帧序列中以目标帧索引对应的帧信息为起点开始遍历,其中,所述目标帧索引为在所述解码起始帧标识中的起始帧索引基础上向前追溯预设帧索引差值后得到的帧索引;
其中,根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
根据所述目标帧索引对应的目标帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据。
例如,在所述格式为预设格式的情况下,所述对所述待解码片段数据进行解码,得到对应的目标解码数据,包括:
对所述待解码片段数据进行解码,得到对应的初始解码数据;
在所述初始解码数据中去除冗余解码数据,得到对应的目标解码数据,其中,所述冗余解码数据包括所述起始帧索引之前的帧索引对应的音频帧的解码数据。
例如,在所述对所述待解码片段数据进行解码,得到对应的目标解码数据之后,还包括:
记录所述解码结束帧标识和所述目标解码数据对应的解码时长。
例如,应用于网页前端,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:
对音频资源进行分帧处理,得到所述音频资源中多个音频帧的帧信息;
将所得帧信息存入网页前端的预设帧序列中。
例如,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:
获取音频资源的元信息,其中,所述元信息中包括音频资源的存储信息,所述存储信息包括音频资源的存储位置和/或资源数据;
将所述元信息存入网页前端的资源表中,其中,所述资源表中包括本次会话涉及的音频资源标识与所述存储信息的关联关系;
相应的,所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
根据所述解码起始帧标识和所述解码结束帧标识,从所述资源表中获取对应的音频资源标识所关联的目标存储信息,并基于所述目标存储信息获取待解码片段数据。
例如,所述方法还包括:
接收预设音频编辑操作;
根据所述预设音频编辑操作指示的待调整帧标识,对所述预设帧序列中相应的帧信息进行相应的编辑操作,以实现音频编辑,其中,所述编辑操作包括对帧信息进行删除和/或顺序调整。
例如,所述预设帧序列中还包括多个音频帧对应的波形摘要信息;所述方法还包括:
响应于接收到预设波形绘制指令,根据所述预设波形绘制指令指示的待绘制帧标识,获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息;
根据所述目标波形摘要信息绘制相应的波形图。
例如,在所述获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息之前,还包括:
对所述预设帧序列对应的音频资源进行解码;
针对解码后的每个音频帧的解码帧数据,将当前解码帧数据划分为第一预设数量的子区间数据,确定多个子区间数据分别对应的区间振幅,并根据多个区间振幅确定当前音频帧对应的波形摘要信息;
将多个音频帧对应的波形摘要信息存入所述预设帧序列中,并与相应的帧信息建立关联。
例如,所述方法还包括:
将所述预设帧序列划分为第二预设数量的子序列;
针对每个子序列,对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅;
根据多个子序列分别对应的子序列振幅绘制波形草图。
例如,所述对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅,包括:
将当前子序列划分为第三预设数量的解码单元;
针对每个解码单元,根据当前解码单元对应的起始帧标识和预设解码帧数获取待解码数据,对所述待解码数据进行解码后,将得到的解码数据中的最大振幅确定为所述当前解码单元的单元振幅;
将多个单元振幅中的最大单元振幅确定为所述当前子序列对应的子序列振幅。
例如,所述目标解码数据用于存入播放缓冲区,所述方法还包括:
根据所述播放缓冲区中的未播解码数据的数据量确定是否在预设帧序列中确定解码起始帧标识和解码结束帧标识。
例如,应用于网页前端,还包括:向服务端同步本次会话对应的预设帧序列。
根据本公开的一个或多个实施例,提供了一种音频处理装置,包括:
帧标识确定模块,设置为在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
待解码数据获取模块,设置为根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
解码模块,设置为对所述待解码片段数据进行解码,得到对应的目标解码数据。
此外,虽然采用特定次序描绘了多种操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (19)

  1. 一种音频处理方法,包括:
    在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
    根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
    对所述待解码片段数据进行解码,得到对应的目标解码数据。
  2. 根据权利要求1所述的方法,其中,所述在预设帧序列中确定解码起始帧标识和解码结束帧标识,包括:
    确定目标解码时长以及解码起始帧标识;
    在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,响应于确定满足预设遍历终止条件,根据对应的帧信息确定解码结束帧标识;
    其中,所述预设遍历终止条件包括:
    已遍历帧信息对应的音频帧的累计时长达到所述目标解码时长。
  3. 根据权利要求2所述的方法,其中,所述预设遍历终止条件还包括以下至少一项:
    当前帧信息中的音频资源标识与上一帧信息中的音频资源标识不一致;
    当前帧信息中的帧索引与上一帧信息中的帧索引不连续;
    当前帧信息中的帧索引为所属音频资源中的最后一个;
    其中,所述响应于确定满足预设遍历终止条件,根据对应的帧信息确定解码结束帧标识,包括:
    响应于确定满足所述预设遍历终止条件中的任一项,根据对应的帧信息确定解码结束帧标识。
  4. 根据权利要求3所述的方法,其中,所述帧信息中还包括帧偏移量和帧数据量;所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
    将所述解码起始帧标识对应的音频资源标识所关联的音频资源确定为目标音频资源;
    根据所述解码起始帧标识对应的第一帧偏移量确定数据起始位置,根据所述解码结束帧标识对应的第二帧偏移量和帧数据量确定数据终止位置,根据所述数据起始位置和所述数据终止位置确定目标数据范围;
    获取所述目标音频资源中的所述目标数据范围内的音频数据,得到待解码片段数据。
  5. 根据权利要求2所述的方法,其中,所述在预设帧序列中以所述解码起始帧标识对应的帧信息为起点开始遍历,包括:
    确定所述解码起始帧标识对应的音频帧的格式;
    响应于确定所述格式为预设格式,在所述预设帧序列中以目标帧索引对应的帧信息为起点开始遍历,其中,所述目标帧索引为在所述解码起始帧标识中的起始帧索引基础上向前追溯预设帧索引差值后得到的帧索引;
    其中,所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
    根据所述目标帧索引对应的目标帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据。
  6. 根据权利要求5所述的方法,其中,响应于确定所述格式为预设格式,所述对所述待解码片段数据进行解码,得到对应的目标解码数据,包括:
    对所述待解码片段数据进行解码,得到对应的初始解码数据;
    在所述初始解码数据中去除冗余解码数据,得到对应的目标解码数据,其中,所述冗余解码数据包括所述起始帧索引之前的帧索引对应的音频帧的解码数据。
  7. 根据权利要求3所述的方法,在所述对所述待解码片段数据进行解码,得到对应的目标解码数据之后,还包括:
    记录所述解码结束帧标识和所述目标解码数据对应的解码时长。
  8. 根据权利要求1所述的方法,应用于网页前端,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:
    对音频资源进行分帧处理,得到所述音频资源中多个音频帧的帧信息;
    将所得帧信息存入所述网页前端的预设帧序列中。
  9. 根据权利要求8所述的方法,在所述在预设帧序列中确定解码起始帧标识和解码结束帧标识之前,还包括:
    获取音频资源的元信息,其中,所述元信息中包括音频资源的存储信息,所述存储信息包括音频资源的存储位置和资源数据中的至少之一;
    将所述元信息存入所述网页前端的资源表中,其中,所述资源表中包括本次会话涉及的音频资源标识与所述存储信息的关联关系;
    所述根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据,包括:
    根据所述解码起始帧标识和所述解码结束帧标识,从所述资源表中获取对应的音频资源标识所关联的目标存储信息,并基于所述目标存储信息获取待解码片段数据。
  10. 根据权利要求1所述的方法,还包括:
    接收预设音频编辑操作;
    根据所述预设音频编辑操作指示的待调整帧标识,对所述预设帧序列中相应的帧信息进行相应的编辑操作,以实现音频编辑,其中,所述编辑操作包括对帧信息进行删除和顺序调整中的至少之一。
  11. 根据权利要求1所述的方法,其中,所述预设帧序列中还包括多个音频帧对应的波形摘要信息;所述方法还包括:
    响应于接收到预设波形绘制指令,根据所述预设波形绘制指令指示的待绘制帧标识,获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息;
    根据所述目标波形摘要信息绘制相应的波形图。
  12. 根据权利要求11所述的方法,在所述获取所述预设帧序列中相应的帧信息对应的目标波形摘要信息之前,还包括:
    对所述预设帧序列对应的音频资源进行解码;
    针对解码后的每个音频帧的解码帧数据,将当前解码帧数据划分为第一预设数量的子区间数据,确定所述第一预设数量的子区间数据分别对应的区间振幅,并根据所述第一预设数量的区间振幅确定当前音频帧对应的波形摘要信息;
    将所述第一预设数量的音频帧对应的波形摘要信息存入所述预设帧序列中,并与相应的帧信息建立关联。
  13. 根据权利要求1所述的方法,还包括:
    将所述预设帧序列划分为第二预设数量的子序列;
    针对每个子序列,对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅;
    根据所述第二预设数量的子序列分别对应的子序列振幅绘制波形草图。
  14. 根据权利要求13所述的方法,其中,所述对当前子序列进行部分解码,并根据解码结果确定所述当前子序列对应的子序列振幅,包括:
    将当前子序列划分为第三预设数量的解码单元;
    针对每个解码单元,根据当前解码单元对应的起始帧标识和预设解码帧数获取待解码数据,对所述待解码数据进行解码后,将得到的解码数据中的最大振幅确定为所述当前解码单元的单元振幅;
    将所述第三预设数量的单元振幅中的最大单元振幅确定为所述当前子序列对应的子序列振幅。
  15. 根据权利要求1所述的方法,其中,所述目标解码数据用于存入播放缓冲区,所述方法还包括:
    根据所述播放缓冲区中的未播解码数据的数据量,判断是否在所述预设帧序列中确定所述解码起始帧标识和所述解码结束帧标识。
  16. 根据权利要求1所述的方法,应用于网页前端,还包括:
    向服务端同步本次会话对应的所述预设帧序列。
  17. 一种音频处理装置,包括:
    帧标识确定模块,设置为在预设帧序列中确定解码起始帧标识和解码结束帧标识,其中,所述预设帧序列中包含至少一个音频资源中的多个音频帧的帧信息,所述帧信息中包含帧标识,所述帧标识包括音频资源标识和帧索引,所述音频资源标识用于表示对应的音频帧所属音频资源的身份,所述帧索引用于表示对应的音频帧在所属音频资源的所有音频帧中的次序;
    待解码数据获取模块,设置为根据所述解码起始帧标识和所述解码结束帧标识,获取对应的音频资源标识所关联的音频资源中的待解码片段数据;
    解码模块,设置为对所述待解码片段数据进行解码,得到对应的目标解码数据。
  18. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1-16任一项所述的方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利 要求1-16任一项所述的方法。
PCT/CN2022/140468 2021-12-30 2022-12-20 音频处理方法、装置、设备及存储介质 WO2023125169A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111654061.6 2021-12-30
CN202111654061.6A CN114299972A (zh) 2021-12-30 2021-12-30 音频处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023125169A1 true WO2023125169A1 (zh) 2023-07-06

Family

ID=80974199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140468 WO2023125169A1 (zh) 2021-12-30 2022-12-20 音频处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114299972A (zh)
WO (1) WO2023125169A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117877525A (zh) * 2024-03-13 2024-04-12 广州汇智通信技术有限公司 一种基于可变粒度特征的音频检索方法和装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299972A (zh) * 2021-12-30 2022-04-08 北京字跳网络技术有限公司 音频处理方法、装置、设备及存储介质
CN118230743A (zh) * 2022-12-20 2024-06-21 北京字跳网络技术有限公司 音频处理方法、装置及设备
CN116320095B (zh) * 2023-05-15 2023-08-04 北京融为科技有限公司 通信能力重组方法、系统及电子设备
CN116996810A (zh) * 2023-07-31 2023-11-03 广州星际悦动股份有限公司 口腔护理设备的音频处理方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980533A (zh) * 2010-11-12 2011-02-23 中国华录集团有限公司 一种基于索引文件实现传输流文件特技模式功能的方法
CN102522088A (zh) * 2011-11-25 2012-06-27 展讯通信(上海)有限公司 音频的解码方法及装置
CN107295402A (zh) * 2017-08-11 2017-10-24 成都品果科技有限公司 视频解码方法及装置
US20170329849A1 (en) * 2016-05-12 2017-11-16 Dolby International Ab Indexing variable bit stream audio formats
CN111866542A (zh) * 2019-04-30 2020-10-30 腾讯科技(深圳)有限公司 音频信号处理方法、多媒体信息处理方法、装置及电子设备
CN112925943A (zh) * 2019-12-06 2021-06-08 浙江宇视科技有限公司 数据处理方法、装置、服务器及存储介质
CN114299972A (zh) * 2021-12-30 2022-04-08 北京字跳网络技术有限公司 音频处理方法、装置、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980533A (zh) * 2010-11-12 2011-02-23 中国华录集团有限公司 一种基于索引文件实现传输流文件特技模式功能的方法
CN102522088A (zh) * 2011-11-25 2012-06-27 展讯通信(上海)有限公司 音频的解码方法及装置
US20170329849A1 (en) * 2016-05-12 2017-11-16 Dolby International Ab Indexing variable bit stream audio formats
CN107295402A (zh) * 2017-08-11 2017-10-24 成都品果科技有限公司 视频解码方法及装置
CN111866542A (zh) * 2019-04-30 2020-10-30 腾讯科技(深圳)有限公司 音频信号处理方法、多媒体信息处理方法、装置及电子设备
CN112925943A (zh) * 2019-12-06 2021-06-08 浙江宇视科技有限公司 数据处理方法、装置、服务器及存储介质
CN114299972A (zh) * 2021-12-30 2022-04-08 北京字跳网络技术有限公司 音频处理方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117877525A (zh) * 2024-03-13 2024-04-12 广州汇智通信技术有限公司 一种基于可变粒度特征的音频检索方法和装置

Also Published As

Publication number Publication date
CN114299972A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2023125169A1 (zh) 音频处理方法、装置、设备及存储介质
WO2021008394A1 (zh) 视频处理方法、装置、电子设备及存储介质
WO2022105597A1 (zh) 视频倍速播放方法、装置、电子设备及存储介质
CN101635848B (zh) 一种视频文件的编辑方法和装置
WO2023284437A1 (zh) 媒体文件处理方法、装置、设备、可读存储介质及产品
CN108900855B (zh) 直播内容录制方法、装置、计算机可读存储介质及服务器
CN110418183B (zh) 音视频同步方法、装置、电子设备及可读介质
US11928152B2 (en) Search result display method, readable medium, and terminal device
JP7524231B2 (ja) ビデオデータの処理方法、装置、電子機器およびコンピュータ可読媒体
US20220321936A1 (en) Information push method, apparatus, electronic device and storage medium
WO2015169172A1 (zh) 网络视频播放的方法和装置
CN110784741A (zh) 媒体数据的时移响应方法及装置
WO2020024950A1 (zh) 录制视频的方法和装置
WO2020024980A1 (zh) 处理数据的方法和装置
CN114201705A (zh) 视频处理方法、装置、电子设备及存储介质
CN109600661B (zh) 用于录制视频的方法和装置
CN113672748A (zh) 多媒体信息播放方法及装置
EP4206945A1 (en) Search content matching method and apparatus, and electronic device and storage medium
US20240103802A1 (en) Method, apparatus, device and medium for multimedia processing
US11908490B2 (en) Video recording method and device, electronic device and storage medium
WO2020024949A1 (zh) 确定时间戳的方法和装置
US20150312369A1 (en) Checkpoints for media buffering
CN111866542B (zh) 音频信号处理方法、多媒体信息处理方法、装置及电子设备
US9313244B2 (en) Content reproduction apparatus, content reproduction method, and computer-readable recording medium having content reproduction program recorded thereon
CN110856028B (zh) 一种媒体数据的播放方法、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914423

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE