WO2018120627A1 - 音频数据的处理方法和装置 - Google Patents

音频数据的处理方法和装置 Download PDF

Info

Publication number
WO2018120627A1
WO2018120627A1 PCT/CN2017/086238 CN2017086238W WO2018120627A1 WO 2018120627 A1 WO2018120627 A1 WO 2018120627A1 CN 2017086238 W CN2017086238 W CN 2017086238W WO 2018120627 A1 WO2018120627 A1 WO 2018120627A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
signal frame
audio
playing
frozen
Prior art date
Application number
PCT/CN2017/086238
Other languages
English (en)
French (fr)
Inventor
谭利文
李玉龙
孙伟
曹海恒
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780009194.3A priority Critical patent/CN108605162B/zh
Priority to US16/474,836 priority patent/US10979469B2/en
Publication of WO2018120627A1 publication Critical patent/WO2018120627A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/007Protection circuits for transducers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's

Definitions

  • the present application relates to multimedia technologies, and in particular, to a method and an apparatus for processing audio data.
  • the audio playback of the user equipment usually includes two modes, one is the normal mode, decoded by the audio soft decoder on the central processing unit (CPU) side, and then multi-channeled by the audio mixer (Audio Mixer).
  • the track is synthesized and played; the other is the Offload mode, which is decoded by the DSP decoder on the Digital Signal Processing (DSP) side, and then processed by mixing and the like.
  • DSP Digital Signal Processing
  • the Offload mode processing mode generally consumes less power, while the normal mode processing mode generally consumes more power. Therefore, it is necessary to reduce the power consumption of the normal mode of processing.
  • the embodiment of the present invention provides a method and a device for processing audio data, so as to reduce power consumption in a silent play scenario when the user equipment adopts a normal mode processing mode.
  • an embodiment of the present application provides a method for processing audio data, including:
  • the frozen playing condition includes: an average power of the at least one audio signal frame is lower than a first preset threshold, and the audio playing program is in a background running mode.
  • the mute play can be effectively recognized, and the audio play program is processed correspondingly, thereby effectively reducing the power consumption when the normal mode processing mode is adopted.
  • the at least one audio signal frame includes M audio signal frames
  • the acquiring the average power of the at least one audio signal frame includes: separately acquiring The average power of each audio signal frame
  • the frozen playback condition includes: an audio signal frame in which the average power of the M audio signal frames is lower than a first preset threshold exceeds a second preset threshold in the M audio signal frames, and The audio player is in the background running mode;
  • M is any positive integer greater than one.
  • the mute play can be effectively recognized, and the audio play program is processed correspondingly, thereby effectively reducing the power consumption when the normal mode processing mode is adopted. And, based on the flatness of the M audio signal frames Whether the average power judgment satisfies the frozen playing condition can further improve the accuracy of the recognition of the silent playing scene.
  • the receiving the decoded at least one audio signal frame includes: every preset time interval Receiving the decoded at least one audio signal frame; the method further includes:
  • the length of time of the preset time interval is increased.
  • the updated audio signal frame is acquired in time through each preset time interval, and it is determined whether the updated audio signal frame satisfies the frozen playing condition, and the processing manner of the audio playing program is dynamically adjusted according to the judgment result.
  • the method further includes: if the frozen playback condition is not met, triggering the The audio player plays an audio signal frame.
  • the acquiring the average power of each audio signal frame separately includes:
  • n is the number of the audio signal frame
  • S 0 is the number of sampling points of the audio signal frame
  • the method further includes:
  • the smoothing process is performed when the playback is paused to play, and the audio experience can be effectively improved.
  • an embodiment of the present application provides a processing apparatus for audio data, where the processing apparatus of the audio data has a function of implementing behavior of a processing apparatus of audio data in the foregoing method embodiment.
  • This function can be implemented in hardware or in hardware by executing the corresponding software.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • an embodiment of the present application provides a user equipment, including: a processor, a memory, a communication bus, and a communication interface; the memory is configured to store a computer execution instruction, and the processor is connected to the memory through the communication bus.
  • the processor executes the computer-executed instructions stored in the memory to cause the user equipment to perform the processing method of the audio data according to any of the above first aspects.
  • the embodiment of the present application provides a computer readable storage medium for storing computer software instructions used by the user equipment, when the computer is running on a computer, so that the computer can execute any one of the foregoing first aspects.
  • an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, enable the computer to perform the method of processing audio data according to any one of the above first aspects.
  • the method and apparatus for processing audio data in the embodiment of the present application by receiving the decoded at least one audio signal frame, acquiring an average power of the at least one audio signal frame, determining whether the frozen playing condition is met, and if the frozen playing condition is met , triggering the audio player to enter a freeze play state, the audio in the freeze play state
  • the playback program pauses playback, thereby effectively recognizing the silent playback and correspondingly processing the audio playback program, thereby effectively reducing the power consumption when the normal mode processing mode is used.
  • FIG. 1 is a schematic diagram of an application scenario of a method for processing audio data according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method for processing audio data according to an embodiment of the present application
  • FIG. 3 is a flowchart of another method for processing audio data according to an embodiment of the present application.
  • FIG. 4A is a schematic structural diagram of a silent frame power consumption engine 14 according to an embodiment of the present application.
  • 4B is a flowchart of another method for processing audio data according to an embodiment of the present application.
  • 4C is a schematic explanatory diagram of a freeze play mechanism according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an apparatus for processing audio data according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a user equipment according to an embodiment of the present disclosure.
  • the User Equipment may represent any applicable end user equipment, and may include (or may represent) a Wireless Transmit/Receive Unit (WTRU), a mobile station, a mobile node, Mobile devices, fixed or mobile contracting units, pagers, mobile phones, personal digital assistants (PDAs), smart phones, notebook computers, computers, touch screen devices, wireless sensors, or consumer electronics devices.
  • WTRU Wireless Transmit/Receive Unit
  • a "mobile" station/node/device herein refers to a station/node/device connected to a wireless (or mobile) network and is not necessarily related to the actual mobility of the station/node/device.
  • a plurality refers to two or more. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • silent play refers specifically to an audio signal frame in which the user equipment continues for a certain length of time or has low audio energy.
  • the length of time can be flexibly set according to requirements, for example, 10 min, 30 min, 1 h, and the like.
  • low audio energy specifically refers to the energy of the audio signal frame being less than a preset threshold. For a specific explanation of the preset threshold, reference may be made to the following embodiments.
  • the processing method of the audio data in the embodiment of the present application can be applied to the user equipment, and an exemplary application scenario of the audio data processing method in the embodiment of the present application is explained, as shown in FIG. 1 below.
  • the processing method of the audio data in the embodiment of the present application can also be applied to other implementable application scenarios.
  • FIG. 1 is a schematic diagram of an application scenario of a method for processing audio data according to an embodiment of the present disclosure.
  • the application scenario may specifically include an application 11 , a parser 12 , a decoder 13 , a silent frame power consumption engine 14 , and audio Mixer 15 and speaker 16.
  • the application 11 may specifically be a media playing application, and the game should be Use programs and other types of applications.
  • the parser 12 may be specifically configured to receive an audio file sent by the application 11, and the audio file may be an MP3 format, an OGG (oggVorbis) format, or an Advanced Audio Coding (AAC) format.
  • AAC Advanced Audio Coding
  • the decoder 13 is configured to decode the audio file to obtain an audio signal stream, and the audio signal stream may specifically be a Pulse Code Modulation (PCM) code stream.
  • the mute frame power consumption engine 14 is configured to perform the processing method of the audio data in the embodiment of the present application, and recognizes that the audio playing program is running in the background and is in a silent playing scene, and performs corresponding processing on the audio playing program, thereby reducing processing in the normal mode. The consumption of electricity when the mode is used.
  • the audio mixer 15 is for mixing the input audio signals for output.
  • the speaker 16 converts the audio signal output from the audio mixer 15 into an acoustic signal.
  • This structure includes a kernel layer, an application framework layer, and an application layer. It is understandable that the benefit of layering is to use the content provided by the lower layer. Provide a unified service for the upper layer, and shield the difference between this layer and the following layers. When this layer and the following layers change, it will not affect the upper layer. That is to say, each layer performs its own functions, and each layer provides a fixed service access point (SAP).
  • SAP fixed service access point
  • the above application 11 is located in the application layer, and the silent frame power consumption engine 14 is located between the application layer and the hardware driver layer. For example, it may be located in the Libraries layer, or may be other layers, specifically The hierarchy of the system is related.
  • the application 11 may be provided with an audio playing program, that is, the audio playing program may be a sub-function in the application 11 for processing an audio file, and the audio playing program may send the audio file to the parser 12, by The parser 12 and the decoder 13 decode the audio file to obtain an audio signal stream, and the audio signal stream may specifically be a Pulse Code Modulation (PCM) code stream, and the audio player program may also perform the audio file by itself. Decode to obtain an audio signal stream.
  • the silent frame power consumption reduction engine 14 receives the audio signal stream from the audio player and processes the audio signal stream, thereby implementing an audio playback program running in the background and playing audio with low audio energy for a long time or all the time. In the case, the power consumption in the normal mode processing mode is reduced.
  • PCM Pulse Code Modulation
  • the audio signal stream, or PCM stream, referred to herein includes, in particular, a plurality of PCM signals.
  • the audio signal frame referred to herein specifically refers to the PCM signal per unit time, wherein the unit time can be 1ms, 10ms, or 20ms, etc., which can be flexibly set according to requirements, the number of PCM signals and the sampling rate per unit time. Relatedly, the higher the sampling rate, the more PCM signals per unit time.
  • the PCM signal can also be referred to as a sampled signal.
  • FIG. 2 is a flowchart of a method for processing audio data according to an embodiment of the present application. As shown in FIG. 2, the method in this embodiment may include:
  • Step 101 Receive at least one audio signal frame after decoding, and obtain an average power of the at least one audio signal frame.
  • the decoded audio signal frame may be a decoded audio signal frame obtained by decoding the audio signal frame of the audio playing program by the decoder 13, or may be a decoded audio obtained by decoding by the audio playing program. Signal frame.
  • the silent frame power consumption engine 14 receives the decoded one or more audio signal frames.
  • the silent frame power consumption engine 14 acquires the average power of the received one or more audio signal frames. Where the average of the audio signal frames is obtained
  • power can be flexibly selected according to requirements. For example, the energy value of the sampled signal of the audio signal frame can be obtained, and the average power of the audio signal frame can be determined according to the energy value.
  • Step 102 Determine whether the frozen play condition is met, if yes, execute step 103, if otherwise, perform step 104.
  • the frozen playing condition includes: an average power of the at least one audio signal frame is lower than a first preset threshold, and the audio playing program is in a background running mode. That is, when the average power of one or more audio signal frames is lower than the first preset threshold, it may be determined that the one or more audio signal frames are silence audio signal frames, if the audio player is in the background running mode, The audio player can be triggered to enter a freeze play state.
  • Step 103 If the frozen playing condition is met, triggering the audio playing program to pause playing.
  • the silent frame power consumption engine 14 triggers the audio playing program to enter a frozen playing state, and the audio playing program pauses playing in the frozen playing state.
  • the implementation manner of the audio play program pause playback may be: (1) in the Pull mode, suspending the audio subsystem to request an audio signal frame from the audio play program; (2) in the Push mode, blocking the audio play program to push to the audio subsystem Audio signal frame.
  • the audio subsystem specifically includes a parser, a decoder, a soft decoder, and a mixer. That is, in the frozen play state, the audio processing activities of the audio subsystem are all stopped. After the audio player is paused, the audio-related network access activity is also stopped. For example, the audio player requests data from the server in the network, thereby effectively reducing power consumption.
  • Step 104 If the frozen playing condition is not met, the audio playing program plays an audio signal frame.
  • the current playing state of the audio playing program may be acquired, and if the current playing state is frozen playing, the audio playing program may be triggered to cancel the frozen playing state, and the frozen playing state is released.
  • the audio playback program plays an audio signal frame at a corresponding time point. If the current playing state is non-frozen playing, the audio playing program is not processed so that the audio signal frame of the corresponding time point is normally played.
  • the audio playing program by receiving the decoded at least one audio signal frame from the audio playing program, acquiring an average power of the at least one audio signal frame, determining whether the frozen playing condition is met, and if the frozen playing condition is met, triggering
  • the audio playing program enters a frozen playing state, and the audio playing program pauses playing in the frozen playing state, thereby effectively recognizing the silent playing and performing corresponding processing on the audio playing program, thereby effectively reducing the processing mode in the normal mode.
  • the power consumption by receiving the decoded at least one audio signal frame from the audio playing program, acquiring an average power of the at least one audio signal frame, determining whether the frozen playing condition is met, and if the frozen playing condition is met, triggering
  • the audio playing program enters a frozen playing state, and the audio playing program pauses playing in the frozen playing state, thereby effectively recognizing the silent playing and performing corresponding processing on the audio playing program, thereby effectively reducing the processing mode in the normal mode.
  • the power consumption by receiving the decoded at least one audio
  • FIG. 3 is a flowchart of another method for processing audio data according to an embodiment of the present application.
  • the embodiment continuously calculates energy of multiple audio signal frames according to an average of consecutive multiple audio signal frames.
  • the power determining whether the audio playing program is triggered to enter the frozen playing state, so that the accuracy of the recognition of the silent playing scene can be further improved.
  • the method in this embodiment may include:
  • Step 201 Continuously receive the decoded M audio signal frames from the audio playing program.
  • M is any positive integer greater than 1, and the specific value can be flexibly set according to requirements.
  • the silent frame power consumption engine 14 continuously receives the decoded M audio signal frames from the audio playback program.
  • Step 202 Acquire an energy value of each of the plurality of sampling signals of each audio signal frame, and determine an average power of each audio signal frame according to the energy values of the plurality of sampling signals of each audio signal frame.
  • the silent frame power consumption engine 14 calculates the average power of each audio signal frame.
  • Step 203 Determine whether the frozen playing condition is met, if yes, execute step 204, if otherwise, perform step 205.
  • the frozen playback condition includes: an audio signal frame in which the average power of the M audio signal frames is lower than a first preset threshold exceeds a second preset threshold in the M audio signal frames, and The audio player is in background mode. That is, the frozen playback condition of the embodiment requires an audio signal frame in which the average power of the M audio signal frames is lower than the first preset threshold, and the proportion of the M audio signal frames exceeds the second preset threshold.
  • Step 204 If the frozen playing condition is met, triggering the audio playing program to enter a frozen playing state, and the audio playing program pauses playing in the frozen playing state.
  • Step 205 If the frozen playing condition is not met, the audio playing program plays an audio signal frame at a corresponding time point.
  • a specific implementation manner of step 202 is: separately obtaining sampling values x of multiple sampling signals of each audio signal frame, and calculating an average power p x of each audio signal frame according to formula (1) ( m).
  • n is the number of the audio signal frame
  • S 0 is the number of sampling points of the audio signal frame
  • the audio signal frame whose average power is lower than the first preset threshold in the M audio signal frames may be obtained, and the specific acquisition manner may be: according to the formula (2) ) Calculate the ratio.
  • C fe (m) is the number of audio signal frames whose average power is lower than the first preset threshold, and ⁇ is the ratio. If the ⁇ is greater than the second preset threshold, it is determined that the frozen playback condition is satisfied.
  • the audio playing program by continuously receiving the decoded M audio signal frames, respectively acquiring energy values of the plurality of sampling signals of each audio signal frame, and determining each audio according to the energy values of the plurality of sampling signals of each audio signal frame.
  • the average power of the signal frame is determined whether the frozen playing condition is met. If the frozen playing condition is met, the audio playing program is triggered to enter a frozen playing state, and the audio playing program is paused during the frozen playing state, thereby realizing effective recognition. Silent playback and corresponding processing of the audio playback program, which can effectively reduce the power consumption when using the normal mode processing mode.
  • judging whether the frozen playing condition is satisfied according to the average power of the M audio signal frames can further improve the accuracy of the recognition of the silent playing scene.
  • the silent frame power consumption engine 14 may specifically include: a silence frame determining module 141, a freeze playing policy module 142, and playing. Control module 143.
  • the silent frame power consumption engine 14 may further include a delay buffer module 144 and a noise floor smoothing module 145.
  • each module of the silent frame power consumption engine 14 is a logical division, which may also be another division mode.
  • the embodiment of the present application is schematically explained by the above structure.
  • FIG. 4B is a flowchart of another method for processing audio data according to an embodiment of the present application
  • FIG. 4C is a schematic explanatory diagram of a freeze playback mechanism according to an embodiment of the present application. As shown in FIG. 4B, the method in the embodiment of the present application may be used. include:
  • the application sends an audio signal frame to the silence frame determining module 141.
  • the silence frame determination module 141 receives the audio signal frame transmitted by the application.
  • the mute frame determination module 141 receives the audio signal frame, acquires energy values of the plurality of sampled signals of the audio signal frame, and determines an average power of the audio signal frame according to the energy values of the plurality of sampled signals of the audio signal frame.
  • the average power of the audio signal frame can be calculated by the formula (1).
  • the silence frame determining module 141 determines whether the average power of the audio signal frame is less than a first preset threshold. If yes, execute S304, if no, execute S302.
  • the silence frame determination module 141 triggers the play control module 143 to perform play control.
  • the silence frame determining module 141 may send the determination result of S303 to the play control module 143.
  • the playback control module 143 calculates an audio signal frame whose average power is lower than the first preset threshold in the M audio signal frames, and determines whether the frozen playback condition is met. If yes, execute S306. If otherwise, execute S307.
  • the play control module 143 continuously receives the mute frame determination result sent by the mute frame determination module 141, that is, if the average power of the plurality of consecutive audio signal frames is less than the first preset threshold, the play control module 143 may calculate the above ratio.
  • the specific manner of determining whether the frozen playback condition is met may be: determining whether the ratio exceeds a second preset threshold, and whether the application is in the background running mode, and if yes, satisfying the frozen playing condition.
  • the play control module 143 may send the determination result satisfying the freeze play condition to the freeze play policy module 142, and the freeze play policy module 142 returns a specific play control policy to the play control module 143, for example, the control application enters the freeze play state.
  • the play control module 143 triggers the application to enter a freeze play state.
  • FIG. 4C is exemplarily illustrated.
  • the horizontal axis of FIG. 4C is the time axis, and the vertical axis is used to indicate the energy value of the sampling signal.
  • the application is in the foreground before entering the frozen playback state.
  • the play control module 143 may mark the application to enter the state to be frozen, waiting for In the frozen play state, the application normally plays the audio signal frame, and when the play control module 143 continuously acquires the average power of the plurality of audio signal frames from the silence frame determination module 141, the average power is lower than the first preset threshold, and the power is lower than the first preset. If the proportion of the audio signal frame of the threshold exceeds the second preset threshold, the play control module 143 may mark the application to enter the freeze play state and trigger the audio play program to enter the freeze play state. As shown in FIG. 4C, in the freeze playback state, the playback of the audio signal frame is suspended.
  • a freeze play maintenance window may be further configured, where the freeze play maintenance window is configured to obtain a decoding of the corresponding time point from the audio play program at a preset time interval after triggering the audio play program to enter the freeze play state.
  • the subsequent audio signal frame determines whether the frozen playing condition is satisfied, and if the frozen playing condition is satisfied, increases the length of the preset time interval.
  • the play control module 143 controls the application to play the audio signal frame at the corresponding time point.
  • the noise floor smoothing module 145 performs smoothing processing on the audio signal frame at the corresponding time point. Acquiring the smoothed output signal, and controlling the application to play the smoothed output signal, wherein the delay buffer module 144 is configured to buffer the audio signal frame, and provide an audio signal frame required for performing smoothing processing, that is, providing a formula (3) The required mute signal s(n). That is, as shown in FIG. 4C, smoothing processing is performed between the freeze playback state and the release freeze playback state. This ensures a good audio experience during the switching process.
  • the specific smoothing method can be: use the following interpolation function:
  • the original audio signal and the mute signal are m(n) and s(n), respectively, and the smoothed output is Sout(n).
  • the audio playing program by continuously receiving the decoded M audio signal frames, respectively acquiring energy values of the plurality of sampling signals of each audio signal frame, and determining each audio according to the energy values of the plurality of sampling signals of each audio signal frame.
  • the average power of the signal frame is determined whether the frozen playing condition is met. If the frozen playing condition is met, the audio playing program is triggered to enter a frozen playing state, and the audio playing program is paused during the frozen playing state, thereby realizing effective recognition. Silent playback and corresponding processing of the audio playback program, which can effectively reduce the power consumption when using the normal mode processing mode.
  • judging whether the frozen playing condition is satisfied according to the average power of the M audio signal frames can further improve the accuracy of the recognition of the silent playing scene.
  • the audio experience can be effectively improved.
  • the audio data processing apparatus of the embodiment of the present application is the silent frame power consumption engine shown in FIG. 1.
  • FIG. 5 is a schematic structural diagram of an apparatus for processing audio data according to an embodiment of the present disclosure.
  • the apparatus in this embodiment may include: a receiving module 11 and a processing module 12, where the receiving module 11 is configured to receive and decode.
  • the subsequent at least one audio signal frame is used by the processing module 12 to obtain an average power of the at least one audio signal frame.
  • the processing module 12 is further configured to determine whether the frozen playing condition is met, and if the frozen playing condition is met, triggering the audio playing program to pause playing; wherein the frozen playing condition comprises: an average power of the at least one audio signal frame Below the first preset threshold, and the audio player is in the background mode.
  • the at least one audio signal frame includes M audio signal frames
  • the processing module 12 acquires an average power of the at least one audio signal frame: respectively acquiring an average power of each audio signal frame;
  • the frozen playing condition includes: an audio signal frame in which the average power lower than the first preset threshold in the M audio signal frames occupies a second preset threshold in the M audio signal frames, and the audio playing program In background mode; M is any positive integer greater than 1.
  • the receiving module 11 is configured to receive the decoded at least one audio signal frame, including: receiving the decoded at least one audio signal frame every preset time interval; the processing module 12 is further configured to: if continuous If the frozen playback condition is satisfied multiple times, the length of time of the preset time interval is increased.
  • processing module 12 is further configured to: if the frozen playing condition is not met, trigger the audio playing program to play an audio signal frame.
  • the processing module 12 is configured to separately obtain an average power of each audio signal frame, including: respectively acquiring sampling values x of the plurality of sampling signals of each audio signal frame; The average power p x (m) of each audio signal frame is calculated; where m is the number of the audio signal frame and S 0 is the number of sample points of the audio signal frame.
  • the processing module 12 is further configured to: if the audio playing program is converted from playing to playing, smoothing the audio signal frame to obtain a smoothed output signal; and controlling the playing of the audio playing program.
  • the smoothed output signal is further configured to: if the audio playing program is converted from playing to playing, smoothing the audio signal frame to obtain a smoothed output signal; and controlling the playing of the audio playing program. The smoothed output signal.
  • the device in the embodiment of the present application may further include a storage module, where the storage module is used to store program code and data of the processing device of the audio data.
  • the device in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, and details are not described herein again.
  • FIG. 6 is a schematic structural diagram of a user equipment according to an embodiment of the present disclosure.
  • the user equipment in this embodiment may include a communication bus 601, and at least one processor 602 and a memory 603 connected to the communication bus 601.
  • the communication bus 601 is used to implement connection communication between devices.
  • the processor 602 can be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits that implement the embodiments of the present application, or an on-chip. System on Chip (SoC).
  • One or more programs are stored in the memory 603, and the one or more programs include instructions. When the instructions are executed by the user equipment, the user equipment performs the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar.
  • the user equipment in this embodiment may further include a transceiver 604, where the processor 602 may call the instruction code of the memory 603 to control the transceiver 604 in the embodiment of the present application to perform the operations of the foregoing method embodiment, and the implementation principle thereof is The technical effects are similar and will not be described here.
  • the receiving module 11 in the embodiment of the present application may correspond to the transceiver 604 of the user equipment.
  • Processing module 12 may correspond to processor 602 of the user device.
  • the transceiver 604 in the embodiment of the present application may also be a communication interface.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the above software functional unit is stored in a storage medium and includes a plurality of instructions for causing a computer device (may be a personal computer, server, or network device, etc.) or a processor performs some of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Abstract

本申请实施例提供一种音频数据的处理方法和装置。本申请音频数据的处理方法,可以实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少电量消耗。

Description

音频数据的处理方法和装置
本申请要求于2016年12月30日提交中国专利局、申请号为201611259388.2、发明名称为“一种降低移动设备音频播放功耗的方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及多媒体技术,尤其涉及一种音频数据的处理方法和装置。
背景技术
用户设备的音频播放通常包括两种模式,一种是正常模式,在中央处理器(Central Processing Unit,CPU)侧通过音频软解码器解码,然后再通过混音器(Audio Mixer)进行多路音轨的合成,进行播放;另一种是Offload模式,在数字信号处理(Digital Signal Processing,DSP)侧通过DSP解码器解码,然后通过混音等处理,进行播放。
其中,Offload模式的处理方式通常功耗较低,而正常模式的处理方式通常功耗较大。所以需要降低正常模式的处理方式的功耗。
发明内容
本申请实施例提供一种音频数据的处理方法和装置,以降低用户设备采用正常模式的处理方式时在静音播放场景下的功耗。
第一方面,本申请实施例提供一种音频数据的处理方法,包括:
接收来解码后的至少一个音频信号帧,获取所述至少一个音频信号帧的平均功率;
判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序暂停播放;
其中,所述冷冻播放条件包括:所述至少一个音频信号帧的平均功率低于第一预设阈值,且所述音频播放程序处于后台运行模式。
本实现方式,可以实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。
结合第一方面,在第一方面的一种可能的实现方式中,所述至少一个音频信号帧包括M个音频信号帧,所述获取所述至少一个音频信号帧的平均功率,包括:分别获取每个音频信号帧的平均功率;
其中,所述冷冻播放条件包括:所述M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比超过第二预设阈值,且所述音频播放程序处于后台运行模式;
M为大于1的任意正整数。
本实现方式,可以实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。并且,根据M个音频信号帧的平 均功率判断是否满足冷冻播放条件,可以进一步提升静音播放场景的识别的准确率。
结合第一方面或第一方面的一种可能的实现方式,在第一方面的另一种可能的实现方式中,所述接收解码后的至少一个音频信号帧,包括:每隔预设时间间隔接收解码后的至少一个音频信号帧;所述方法还包括:
若连续多次满足冷冻播放条件,则增加所述预设时间间隔的时间长度。
本实现方式,通过每个预设时间间隔及时获取更新的音频信号帧,并判断更新的音频信号帧是否满足冷冻播放条件,根据判断结果动态调整对音频播放程序的处理方式。
结合第一方面或第一方面的任一种可能的实现方式,在第一方面的另一种可能的实现方式中,所述方法还包括:若不满足所述冷冻播放条件,则触发所述音频播放程序播放音频信号帧。
结合第一方面或第一方面的任一种可能的实现方式,在第一方面的另一种可能的实现方式中,所述分别获取每个音频信号帧的平均功率,包括:
获取每个音频信号帧的多个采样信号的采样值x;
根据公式
Figure PCTCN2017086238-appb-000001
计算每个音频信号帧的平均功率px(m);
其中,m为所述音频信号帧的编号,S0为所述音频信号帧的采样点个数。
结合第一方面或第一方面的任一种可能的实现方式,在第一方面的另一种可能的实现方式中,所述方法还包括:
若所述音频播放程序从暂停播放转换为播放,则对音频信号帧进行平滑处理,获取平滑后的输出信号;
控制所述音频播放程序播放所述平滑后的输出信号。
本实现方式,通过在暂停播放到播放进行切换时,进行平滑处理,可以有效提升音频体验。
第二方面,本申请实施例提供一种音频数据的处理装置,该音频数据的处理装置具有实现上述方法实施例中音频数据的处理装置行为的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,本申请实施例提供一种用户设备,包括:处理器、存储器、通信总线和通信接口;该存储器用于存储计算机执行指令,该处理器与该存储器通过该通信总线连接,当该用户设备运行时,该处理器执行该存储器存储的该计算机执行指令,以使该用户设备执行如上述第一方面任意一项的音频数据的处理方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,用于储存为上述用户设备所用的计算机软件指令,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项或者第二方面任意一项的音频数据的处理方法。
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项的音频数据的处理方法。
本申请实施例音频数据的处理方法和装置,通过接收来解码后的至少一个音频信号帧,获取所述至少一个音频信号帧的平均功率,判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频 播放程序暂停播放,从而实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍。
图1为本申请实施例的音频数据的处理方法的应用场景示意图;
图2为本申请实施例的一种音频数据的处理方法的流程图;
图3为本申请实施例的另一种音频数据的处理方法的流程图;
图4A为本申请实施例的一种静音帧功耗引擎14的结构示意图;
图4B为本申请实施例的另一种音频数据的处理方法的流程图;
图4C为本申请实施例的冷冻播放机制的示意性说明图;
图5为本申请实施例的一种音频数据的处理装置的结构示意图;
图6为本申请实施例提供的用户设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
本文所涉及的用户设备(User Equipment,UE)可以表示任意适用的端用户设备,可以包括(或可以表示)诸如无线发送/接收单元(Wireless Transmit/Receive Unit,WTRU)、移动站、移动节点、移动设备、固定或移动签约单元、寻呼机、移动电话、掌上电脑(Personal Digital Assistant,PDA)、智能手机、笔记本型电脑、计算机、触摸屏设备、无线传感器或消费电子设备等设备。此处的“移动”站/节点/设备表示与无线(或移动)网络连接的站/节点/设备,而并不一定与该站/节点/设备的实际移动性有关。
本文所涉及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本文所涉及的“静音播放”具体指用户设备持续一定时间长度或者一直播放音频能量低的音频信号帧。其中,该时间长度可以根据需求进行灵活设置,例如,10min、30min、1h等。“音频能量低”具体指音频信号帧的能量小于预设阈值,该预设阈值的具体解释说明可以参见下述实施例。
为了清楚的理解本申请实施例的音频数据的处理方法可以应用于用户设备中,以本申请实施例的音频数据的处理方法的一种示例性应用场景作解释说明,具体如下面图1所示,本申请实施例的音频数据的处理方法也可以应用于其他可实施的应用场景中。
图1为本申请实施例的音频数据的处理方法的应用场景示意图,如图1所示,该应用场景具体可以包括应用程序11、解析器12、解码器13、静音帧功耗引擎14、音频混音器15和扬声器16。其中,应用程序11具体可以为媒体播放应用程序、游戏应 用程序以及其他功能类型的应用程序。解析器12具体可以用于接收应用程序11发送的音频文件,该音频文件具体可以是MP3格式、OGG(oggVorbis)格式、或者高级音频编码(Advanced Audio Coding,AAC)格式等。解码器13(decoder)用于对音频文件进行解码,获取音频信号流,该音频信号流具体可以是脉冲编码调制(Pulse Code Modulation,PCM)码流。静音帧功耗引擎14用于执行本申请实施例的音频数据的处理方法,通过识别音频播放程序在后台运行且处于静音播放场景,并对音频播放程序进行相应处理,从而减少采用正常模式的处理方式时的电量的消耗。音频混音器15用于将所输入的音频信号混合起来输出。扬声器16将音频混音器15输出的音频信号转化为声信号。
以安卓(Android)系统的层次结构为例,这种结构包括内核(kernel)层、框架(application framework)层以及应用(applications)层等,可以理解的,分层的好处是使用下层提供的内容为上层提供统一的服务,屏蔽本层及以下层的差异,当本层及以下层发生了变化不会影响到上层。也就是说各层各司其职,各层提供固定的服务接入点SAP(Service Access Point)。上述应用程序11即位于应用(applications)层,静音帧功耗引擎14即位于应用层与硬件驱动(kernel)层之间,例如,其具体可以位于Libraries层,当然也可以是其他层,具体与系统的层次结构有关。
具体的,应用程序11中可以设置有音频播放程序,即音频播放程序可以为应用程序11中的一个子功能,用于处理音频文件,该音频播放程序可以将音频文件发送给解析器12,由该解析器12和解码器13对音频文件进行解码,获取音频信号流,该音频信号流具体可以是脉冲编码调制(Pulse Code Modulation,PCM)码流,该音频播放程序还可以自行对音频文件进行解码,获取音频信号流。静音帧降功耗引擎14接收来自于该音频播放程序的音频信号流,对音频信号流进行处理,从而实现音频播放程序在后台运行,且长时间或者一直在播放音频能量较低的音频的场景中,减少采用正常模式的处理方式时的电量的消耗。其具体实现方式可以参见下述实施例的解释说明。
本文涉及的音频信号流,或者PCM码流,具体包括多个PCM信号。本文所涉及的音频信号帧具体指单位时间内的PCM信号,其中,单位时间可以是1ms、10ms、或者20ms等,其可以根据需求进行灵活设置,单位时间内的PCM信号的个数与采样率有关,采样率越高,单位时间内的PCM信号越多。PCM信号也可以称之为采样信号。
图2为本申请实施例的一种音频数据的处理方法的流程图,如图2所示,本实施例的方法可以包括:
步骤101、接收解码后的至少一个音频信号帧,获取所述至少一个音频信号帧的平均功率。
其中,该解码后的音频信号帧,可以是由解码器13对音频播放程序的音频信号帧进行解码获取的解码后的音频信号帧,也可以是由音频播放程序进行解码获取的解码后的音频信号帧。
具体的,静音帧功耗引擎14接收解码后的一个或者多个音频信号帧。静音帧功耗引擎14获取接收到的一个或多个音频信号帧的平均功率。其中获取音频信号帧的平均 功率的具体实施方式有很多种,可以根据需求进行灵活选取,例如可以获取音频信号帧的采样信号的能量值,根据能量值确定音频信号帧的平均功率。
步骤102、判断是否满足冷冻播放条件,若是则执行步骤103,若否则执行步骤104。
其中,所述冷冻播放条件包括:所述至少一个音频信号帧的平均功率低于第一预设阈值,且所述音频播放程序处于后台运行模式。即,当一个或多个音频信号帧的平均功率低于该第一预设阈值时,即可以确定该一个或多个音频信号帧为静音音频信号帧,如果音频播放程序处于后台运行模式,则可以触发该音频播放程序进入冷冻播放状态。
步骤103、若满足所述冷冻播放条件,则触发音频播放程序暂停播放。
具体的,在满足冷冻播放条件时,静音帧功耗引擎14触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频播放程序暂停播放。所述音频播放程序暂停播放的实现方式可以为:(1)Pull模式下,暂停音频子系统向该音频播放程序请求音频信号帧;(2)Push模式下,阻塞音频播放程序向音频子系统推送音频信号帧。
其中,音频子系统具体包括解析器、解码器、软解码器、以及混音器。即在冷冻播放状态下,音频子系统的各项音频处理活动均处于停止状态。音频播放程序暂停播放后,与音频相关的网络访问活动也随之停止,例如,该音频播放程序向网络中的服务器请求数据等,从而可以有效减少电量消耗。
步骤104、若不满足所述冷冻播放条件,则该音频播放程序播放音频信号帧。
具体的,若不满足所述冷冻播放条件,则可以获取该音频播放程序的当前播放状态,如果当前播放状态为冷冻播放,则可以触发该音频播放程序解除冷冻播放状态,在解除冷冻播放状态下所述音频播放程序播放相应时间点的音频信号帧。如果当前播放状态为非冷冻播放,则对该音频播放程序不作处理,以使其正常播放相应时间点的音频信号帧。
本实施例,通过接收来自于音频播放程序的解码后的至少一个音频信号帧,获取所述至少一个音频信号帧的平均功率,判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频播放程序暂停播放,从而实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。
下面采用几个具体的实施例,对图2所示方法实施例的技术方案进行详细说明。
图3为本申请实施例的另一种音频数据的处理方法的流程图,如图3所示,本实施例连续对多个音频信号帧的能量进行计算,根据连续多个音频信号帧的平均功率确定是否触发音频播放程序进入冷冻播放状态,从而可以进一步提升静音播放场景的识别的准确率,本实施例的方法可以包括:
步骤201、连续接收来自于音频播放程序的解码后的M个音频信号帧。
其中,M为大于1的任意正整数,其具体取值可以根据需求进行灵活设置。
具体的,静音帧功耗引擎14连续接收来自于音频播放程序的解码后的M个音频信号帧。
步骤202、分别获取每个音频信号帧的多个采样信号的能量值,根据每个音频信号帧的多个采样信号的能量值确定每个音频信号帧的平均功率。
具体的,静音帧功耗引擎14分别计算各个音频信号帧的平均功率。
步骤203、判断是否满足冷冻播放条件,若是则执行步骤204,若否则执行步骤205。
其中,所述冷冻播放条件包括:所述M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比超过第二预设阈值,且所述音频播放程序处于后台运行模式。即本实施例的冷冻播放条件需要M个音频信号帧中平均功率低于第一预设阈值的音频信号帧,在所述M个音频信号帧中占比超过第二预设阈值。
步骤204、若满足所述冷冻播放条件,则触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频播放程序暂停播放。
步骤205、若不满足所述冷冻播放条件,则该音频播放程序播放相应时间点的音频信号帧。
其中,步骤204和步骤205的具体解释说明,可以参见图2所示实施例的步骤103和步骤104,此处不再赘述。
可选的,步骤202的一种具体的可实施方式为:分别获取每个音频信号帧的多个采样信号的采样值x,根据公式(1)计算每个音频信号帧的平均功率px(m)。
Figure PCTCN2017086238-appb-000002
其中,m为所述音频信号帧的编号,S0为所述音频信号帧的采样点个数。
在步骤203之前,还可以获取该M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在该M个音频信号帧中占比,具体的获取方式可以为,根据公式(2)计算该占比。
γ=Cfe(m)/M*100%               (2)
其中,Cfe(m)为平均功率低于第一预设阈值的音频信号帧的个数,γ为该占比,如果该γ大于第二预设阈值,则确定满足冷冻播放条件。
本实施例,通过连续接收解码后的M个音频信号帧,分别获取每个音频信号帧的多个采样信号的能量值,根据每个音频信号帧的多个采样信号的能量值确定每个音频信号帧的平均功率,判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频播放程序暂停播放,从而实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。
并且,根据M个音频信号帧的平均功率判断是否满足冷冻播放条件,可以进一步提升静音播放场景的识别的准确率。
图4A为本申请实施例的一种静音帧功耗引擎14的结构示意图,如图4A所示,该静音帧功耗引擎14具体可以包括:静音帧判断模块141、冷冻播放策略模块142和播放控制模块143。可选的,该静音帧功耗引擎14还可以包括延时缓冲模块144和底噪平滑模块145。
需要说明的是,上述对静音帧功耗引擎14的各个模块的划分为逻辑划分,其也可以是其他划分模式,本申请实施例以上述结构做示意性解释说明。
图4B为本申请实施例的另一种音频数据的处理方法的流程图,图4C为本申请实施例的冷冻播放机制的示意性说明图,如图4B所示,本申请实施例的方法可以包括:
S301、应用程序向静音帧判断模块141发送音频信号帧。
静音帧判断模块141接收应用程序发送的音频信号帧。
S302、静音帧判断模块141接收音频信号帧,获取音频信号帧的多个采样信号的能量值,根据音频信号帧的多个采样信号的能量值确定音频信号帧的平均功率。
具体可以通过公式(1)计算音频信号帧的平均功率。
S303、静音帧判断模块141判断音频信号帧的平均功率是否小于第一预设阈值。若是,则执行S304,若否,则执行S302。
S304、静音帧判断模块141触发播放控制模块143进行播放控制。
具体的,静音帧判断模块141可以将S303的判断结果发送给播放控制模块143。
S305、播放控制模块143计算M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比,判断是否满足冷冻播放条件,若是,则执行S306,若否则执行S307。
具体的,播放控制模块143连续接收静音帧判断模块141发送的静音帧判断结果,即有连续多个音频信号帧的平均功率小于第一预设阈值,则播放控制模块143可以计算上述占比。具体的判断是否满足冷冻播放条件的实施方式可以为,判断上述占比是否超过第二预设阈值,且应用程序是否处于后台运行模式,若均为是,则满足冷冻播放条件。播放控制模块143可以将满足冷冻播放条件的判定结果发送给冷冻播放策略模块142,由冷冻播放策略模块142向播放控制模块143返回一个具体的播放控制策略,例如,控制应用程序进入冷冻播放状态。
S306、播放控制模块143触发所述应用程序进入冷冻播放状态。
以图4C为例进行示意性举例说明,图4C的横轴为时间轴,纵轴用于指示采样信号的能量值,如图4C左侧所示,未进入冷冻播放状态前,应用程序处于前台运行,当应用程序处于后台运行,且静音帧判断模块141识别到一个音频信号帧的平均功率低于第一预设阈值,则播放控制模块143可以标记该应用程序进入待冷冻播放状态,在待冷冻播放状态下,应用程序正常播放音频信号帧,当播放控制模块143从静音帧判断模块141连续获取多个音频信号帧的平均功率低于第一预设阈值,且功率低于第一预设阈值的音频信号帧所占比例超过第二预设阈值,则播放控制模块143可以标记该应用程序进入冷冻播放状态,并触发所述音频播放程序进入冷冻播放状态。如图4C所示,在冷冻播放状态下,暂停播放音频信号帧。
可选的,还可以换设置冷冻播放维护窗,所述冷冻播放维护窗用于在触发所述音频播放程序进入冷冻播放状态后,每隔预设时间间隔从音频播放程序获取相应时间点的解码后的音频信号帧,并判断是否满足冷冻播放条件,若满足冷冻播放条件,则增加所述预设时间间隔的时间长度。如图4C所示,随着每次维护窗判断的结果维持冷冻播放状态,逐渐增大维护窗的时间间隔如t1=30ms,t2=2s,t3=5min,…….,tn=6h等。
S307、播放控制模块143控制该应用程序播放相应时间点的音频信号帧。
可选的,若所述应用程序从冷冻播放状态转换为非冷冻播放状态,即所述应用程序从暂停播放转换为播放,则底噪平滑模块145对相应时间点的音频信号帧进行平滑处理,获取平滑后的输出信号,控制所述应用程序播放所述平滑后的输出信号,其中,延时缓冲模块144用于缓冲音频信号帧,提供进行平滑处理时所需的音频信号帧,即提供公式(3)所需的静音信号s(n)。即如图4C所示,在冷冻播放状态和解除冷冻播放状态之间进行平滑处理。从而可以确保切换过程中的良好的音频体验。
具体的平滑处理方式可以为:使用如下内插函数:
Figure PCTCN2017086238-appb-000003
其中,原始音频信号与静音信号分别为m(n)和s(n),平滑处理后的输出为Sout(n)。M是平滑过渡的长度;ramp是过渡时间变量,它的变化范围是0~M。显然,切换平滑程度是由ramp和M共同决定的,并且ramp=0时,Sout(n)=s(n);ramp=M时,Sout(n)=m(n)。
通过上述公式(3)在解冻侧,随着样点值n的递增,平滑后的输出信号成分所包含的原始音频信号逐渐占主导、静音成分逐渐变弱,最终过渡到完全原始音频成分。
本实施例,通过连续接收解码后的M个音频信号帧,分别获取每个音频信号帧的多个采样信号的能量值,根据每个音频信号帧的多个采样信号的能量值确定每个音频信号帧的平均功率,判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序进入冷冻播放状态,在冷冻播放状态下所述音频播放程序暂停播放,从而实现有效识别静音播放,并对音频播放程序进行相应处理,从而可以有效减少采用正常模式的处理方式时的电量消耗。
并且,根据M个音频信号帧的平均功率判断是否满足冷冻播放条件,可以进一步提升静音播放场景的识别的准确率。
并且,通过在冷冻播放状态到解除冷冻播放状态进行切换时,进行平滑处理,可以有效提升音频体验。
本申请实施例的音频数据的处理装置即为图1所示的静音帧功耗引擎。
图5为本申请实施例的一种音频数据的处理装置的结构示意图,如图5所示,本实施例的装置可以包括:接收模块11和处理模块12,其中,接收模块11用于接收解码后的至少一个音频信号帧,处理模块12用于获取所述至少一个音频信号帧的平均功率。处理模块12还用于判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序暂停播放;其中,所述冷冻播放条件包括:所述至少一个音频信号帧的平均功率低于第一预设阈值,且所述音频播放程序处于后台运行模式。
可选的,所述至少一个音频信号帧包括M个音频信号帧,所述处理模块12获取所述至少一个音频信号帧的平均功率:分别获取每个音频信号帧的平均功率;其中,所述冷冻播放条件包括:所述M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比超过第二预设阈值,且所述音频播放程序处于后台运行模式;M为大于1的任意正整数。
可选的,所述接收模块11用于接收解码后的至少一个音频信号帧,包括:每隔预设时间间隔接收解码后的至少一个音频信号帧;所述处理模块12还用于:若连续多次满足冷冻播放条件,则增加所述预设时间间隔的时间长度。
可选的,所述处理模块12还用于:若不满足所述冷冻播放条件,则触发所述音频播放程序播放音频信号帧。
可选的,所述处理模块12用于分别获取每个音频信号帧的平均功率,包括:分别获取每个音频信号帧的多个采样信号的采样值x;根据公式
Figure PCTCN2017086238-appb-000004
计算每 个音频信号帧的平均功率px(m);其中,m为所述音频信号帧的编号,S0为所述音频信号帧的采样点个数。
可选的,所述处理模块12还用于:若所述音频播放程序从暂停播放转换为播放,则对音频信号帧进行平滑处理,获取平滑后的输出信号;控制所述音频播放程序播放所述平滑后的输出信号。
可选的,本申请实施例的装置还可以包括存储模块,该存储模块用于存储音频数据的处理装置的程序代码和数据。
本实施例的装置,可以用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
图6为本申请实施例提供的用户设备的结构示意图,如图6所示,本实施例的用户设备可以包括,通信总线601,以及连接到通信总线601的至少一个处理器602和存储器603。其中,通信总线601用于实现各装置之间的连接通信。处理器602可以是一个中央处理器(Central Processing Unit,CPU),或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者完成实施本申请实施例的一个或多个集成电路,或者是一个片上系统(System on Chip,简称,SoC)。存储器603中存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被用户设备执行时所述用户设备执行上述方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。可选的,本实施例的用户设备还可以包括收发器604,处理器602可以调用存储器603的指令代码,控制本申请实施例中的收发器604执行上述方法实施例的操作,其实现原理和技术效果类似,此处不再赘述。
作为一种实现方式,本申请实施例中的接收模块11可以与用户设备的收发器604对应。处理模块12可以与用户设备的处理器602对应。
作为一种实现方式,本申请实施例中的收发器604也可以为通信接口。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备 (可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (14)

  1. 一种音频数据的处理方法,其特征在于,包括:
    接收解码后的至少一个音频信号帧,获取所述至少一个音频信号帧的平均功率;
    判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发音频播放程序暂停播放;
    其中,所述冷冻播放条件包括:所述至少一个音频信号帧的平均功率低于第一预设阈值,且所述音频播放程序处于后台运行模式。
  2. 根据权利要求1所述的方法,其特征在于,所述至少一个音频信号帧包括M个音频信号帧,所述获取所述至少一个音频信号帧的平均功率,包括:
    分别获取每个音频信号帧的平均功率;
    其中,所述冷冻播放条件包括:所述M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比超过第二预设阈值,且所述音频播放程序处于后台运行模式;
    M为大于1的任意正整数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述接收解码后的至少一个音频信号帧,包括:
    每隔预设时间间隔接收解码后的至少一个音频信号帧;
    所述方法还包括:
    若连续多次满足冷冻播放条件,则增加所述预设时间间隔的时间长度。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:若不满足所述冷冻播放条件,则触发所述音频播放程序播放音频信号帧。
  5. 根据权利要求2至4任一项所述的方法,其特征在于,所述分别获取每个音频信号帧的平均功率,包括:
    获取每个音频信号帧的多个采样信号的采样值x;
    根据公式
    Figure PCTCN2017086238-appb-100001
    计算每个音频信号帧的平均功率px(m);
    其中,m为所述音频信号帧的编号,S0为所述音频信号帧的采样点个数。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述方法还包括:
    若所述音频播放程序从暂停播放转换为播放,则对音频信号帧进行平滑处理,获取平滑后的输出信号;
    控制所述音频播放程序播放所述平滑后的输出信号。
  7. 一种音频数据的处理装置,其特征在于,包括:
    接收模块,用于接收解码后的至少一个音频信号帧;
    处理模块,用于获取所述至少一个音频信号帧的平均功率;
    所述处理模块还用于,判断是否满足冷冻播放条件,若满足所述冷冻播放条件,则触发所述音频播放程序暂停播放;
    其中,所述冷冻播放条件包括:所述至少一个音频信号帧的平均功率低于第一预设阈值,且所述音频播放程序处于后台运行模式。
  8. 根据权利要求7所述的装置,其特征在于,所述至少一个音频信号帧包括M个音 频信号帧,所述处理模块用于获取所述至少一个音频信号帧的平均功率:
    分别获取每个音频信号帧的平均功率;
    其中,所述冷冻播放条件包括:所述M个音频信号帧中平均功率低于第一预设阈值的音频信号帧在所述M个音频信号帧中占比超过第二预设阈值,且所述音频播放程序处于后台运行模式;M为大于1的任意正整数。
  9. 根据权利要求8所述的装置,其特征在于,所述接收模块用于接收解码后的至少一个音频信号帧,包括:
    每隔预设时间间隔接收解码后的至少一个音频信号帧;
    所述处理模块还用于:
    若连续多次满足冷冻播放条件,则增加所述预设时间间隔的时间长度。
  10. 根据权利要求7至9任一项所述的装置,其特征在于,所述处理模块还用于:若不满足所述冷冻播放条件,则触发所述音频播放程序播放音频信号帧。
  11. 根据权利要求8至10任一项所述的装置,其特征在于,所述处理模块用于分别获取每个音频信号帧的平均功率,包括:
    分别获取每个音频信号帧的多个采样信号的采样值x;
    根据公式
    Figure PCTCN2017086238-appb-100002
    计算每个音频信号帧的平均功率px(m);
    其中,m为所述音频信号帧的编号,S0为所述音频信号帧的采样点个数。
  12. 根据权利要求7至11任一项所述的装置,其特征在于,所述处理模块还用于:
    若所述音频播放程序从暂停播放转换为播放,则对音频信号帧进行平滑处理,获取平滑后的输出信号;
    控制所述音频播放程序播放所述平滑后的输出信号。
  13. 一种用户设备,其特征在于,所述用户设备包括存储器、总线系统和至少一个处理器,所述存储器和至少一个处理器所述通过所述总线系统相连;
    所述存储器中存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述用户设备执行时使所述用户设备执行如权利要求1至6任一项所述的方法。
  14. 一种存储一个或多个程序的计算机可读存储介质,其特征在于,所述一个或多个程序包括指令,所述指令当被用户设备执行时使所述用户设备执行根据权利要求1至6任一项所述方法。
PCT/CN2017/086238 2016-12-30 2017-05-27 音频数据的处理方法和装置 WO2018120627A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780009194.3A CN108605162B (zh) 2016-12-30 2017-05-27 音频数据的处理方法和装置,以及用户设备和存储介质
US16/474,836 US10979469B2 (en) 2016-12-30 2017-05-27 Audio data processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611259388 2016-12-30
CN201611259388.2 2016-12-30

Publications (1)

Publication Number Publication Date
WO2018120627A1 true WO2018120627A1 (zh) 2018-07-05

Family

ID=62706903

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/086238 WO2018120627A1 (zh) 2016-12-30 2017-05-27 音频数据的处理方法和装置

Country Status (3)

Country Link
US (1) US10979469B2 (zh)
CN (1) CN108605162B (zh)
WO (1) WO2018120627A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459759A (zh) * 2020-03-31 2020-07-28 Oppo广东移动通信有限公司 电子设备及其应用程序的管理方法、计算机存储介质
CN111787268B (zh) * 2020-07-01 2022-04-22 广州视源电子科技股份有限公司 音频信号的处理方法、装置、电子设备及存储介质
CN114005469A (zh) * 2021-10-20 2022-02-01 广州市网星信息技术有限公司 一种自动跳过静音片段的音频播放方法及系统
CN113986190A (zh) * 2021-11-02 2022-01-28 维沃移动通信有限公司 应用的处理方法、装置和电子设备
CN117193697A (zh) * 2022-05-30 2023-12-08 华为技术有限公司 音频播放方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100820905B1 (ko) * 2006-11-20 2008-04-11 (주)한우리아이티 오디오신호 출력 제어장치 및 그 제어방법
CN105404654A (zh) * 2015-10-30 2016-03-16 魅族科技(中国)有限公司 一种音频文件播放方法及装置
CN105429984A (zh) * 2015-11-27 2016-03-23 刘军 媒体播放方法、设备及音乐教学系统
CN105704609A (zh) * 2016-01-25 2016-06-22 广州视源电子科技股份有限公司 音响设备模式调节方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868023B2 (en) * 2008-01-04 2014-10-21 3D Radio Llc Digital radio systems and methods
CN1992535A (zh) * 2005-12-31 2007-07-04 英华达(南京)科技有限公司 收音机自动静音的方法及装置
CN101848280A (zh) 2009-03-25 2010-09-29 深圳富泰宏精密工业有限公司 静音播放音乐的省电系统及方法
US8983640B2 (en) * 2009-06-26 2015-03-17 Intel Corporation Controlling audio players using environmental audio analysis
CN102098606A (zh) 2009-12-10 2011-06-15 腾讯科技(深圳)有限公司 一种音量动态调节的方法及装置
US10244102B2 (en) * 2015-08-20 2019-03-26 Samsung Electronics Co., Ltd. Method and apparatus for managing application data usage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100820905B1 (ko) * 2006-11-20 2008-04-11 (주)한우리아이티 오디오신호 출력 제어장치 및 그 제어방법
CN105404654A (zh) * 2015-10-30 2016-03-16 魅族科技(中国)有限公司 一种音频文件播放方法及装置
CN105429984A (zh) * 2015-11-27 2016-03-23 刘军 媒体播放方法、设备及音乐教学系统
CN105704609A (zh) * 2016-01-25 2016-06-22 广州视源电子科技股份有限公司 音响设备模式调节方法和装置

Also Published As

Publication number Publication date
CN108605162A (zh) 2018-09-28
CN108605162B (zh) 2020-11-06
US20190327284A1 (en) 2019-10-24
US10979469B2 (en) 2021-04-13

Similar Documents

Publication Publication Date Title
WO2018120627A1 (zh) 音频数据的处理方法和装置
WO2015085959A1 (zh) 语音处理方法及装置
WO2019237821A1 (zh) 虚拟场景的场景图像传输方法、装置、计算机设备及计算机可读存储介质
US10603586B2 (en) Voice communication method and system in game applications
KR102607561B1 (ko) 버퍼 처리 방법, 장치, 기기 및 컴퓨터 저장 매체
KR20160005050A (ko) 키워드 검출을 위한 적응적 오디오 프레임 프로세싱
CN110827858B (zh) 语音端点检测方法及系统
WO2018036352A1 (zh) 视频数据的编解码方法、装置、系统及存储介质
JP2022028879A (ja) 音声データの処理方法、装置、機器及び記憶媒体
CN109599133B (zh) 语言音轨的切换方法、装置、计算机设备及存储介质
KR20130116922A (ko) 대역폭 확장 방법 및 장치
CN104424949A (zh) 用于发送和接收语音分组的方法和实现该方法的电子设备
US20150201041A1 (en) Device dependent codec negotiation
CN114245175A (zh) 视频转码方法、装置、电子设备及存储介质
WO2017206816A1 (zh) 睡眠管理方法及装置、计算机存储介质
CN110890104B (zh) 语音端点检测方法及系统
US9437205B2 (en) Method, application, and device for audio signal transmission
WO2023072028A1 (zh) 音乐缓存方法、装置、电子设备及存储介质
CN114363704B (zh) 视频播放方法、装置、设备以及存储介质
CN115914746A (zh) 视频处理方法、装置、电子设备和存储介质
US11244697B2 (en) Artificial intelligence voice interaction method, computer program product, and near-end electronic device thereof
US20150100321A1 (en) Intelligent state aware system control utilizing two-way voice / audio communication
WO2014180100A1 (en) Method, application, and device for audio signal transmission
CN109859293A (zh) 用于安卓设备的动画多状态切换方法和装置
CN114221940B (zh) 音频数据处理方法、系统、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17887605

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17887605

Country of ref document: EP

Kind code of ref document: A1