CN111787268A - Audio signal processing method and device, electronic equipment and storage medium - Google Patents

Audio signal processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111787268A
CN111787268A CN202010622129.1A CN202010622129A CN111787268A CN 111787268 A CN111787268 A CN 111787268A CN 202010622129 A CN202010622129 A CN 202010622129A CN 111787268 A CN111787268 A CN 111787268A
Authority
CN
China
Prior art keywords
audio signal
audio
frame
amplitude
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010622129.1A
Other languages
Chinese (zh)
Other versions
CN111787268B (en
Inventor
刘荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd, Guangzhou Shirui Electronics Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN202010622129.1A priority Critical patent/CN111787268B/en
Publication of CN111787268A publication Critical patent/CN111787268A/en
Application granted granted Critical
Publication of CN111787268B publication Critical patent/CN111787268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a method and a device for processing audio signals, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an audio signal sent by an opposite terminal communication node received in an anti-jitter buffer of terminal equipment; and adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal. Compared with the prior art, when the playing speed of the audio frame in the audio signal is adjusted, the energy and/or the amplitude of the audio frame in the audio signal are referred to, the audio frame with lower energy or lower amplitude in the audio signal is adjusted greatly, and the audio frame with higher energy or larger amplitude in the audio signal is adjusted slightly, so that the stability of the audio signal during playing can be improved, the perception of a user on the speed change of the audio signal is reduced, and the experience of the user is improved.

Description

Audio signal processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of voice communication, and in particular, to a method and an apparatus for processing an audio signal, an electronic device, and a storage medium.
Background
In a video conference, when the network is unstable, audio signals are briefly lost and accumulated. In this case, the anti-jitter buffer provided in the terminal device is usually increased to make the sound playing smooth. But too large an anti-jitter buffer may result in too much sound delay and thus degrade the user experience.
In the prior art, in order to avoid the excessive sound delay caused by the anti-jitter buffer, there is a method for automatically adjusting the playing speed. When the network is abnormal, the playing speed of the audio signal is slowed down, and when the network is recovered, the previous audio signal is obtained again and the playing speed of the audio signal is accelerated.
However, although the conventional method for adjusting the playing speed of the audio signal avoids the problem of too large sound delay when the network is abnormal, the audio signal may be played suddenly and suddenly, which causes poor stability when the audio signal is played, and further reduces user experience.
Disclosure of Invention
The invention provides a method and a device for processing an audio signal, electronic equipment and a storage medium, which aim to solve the problem of poor stability of the audio signal during playing in the prior art.
A first aspect of the present invention provides a method of processing an audio signal, the method comprising:
acquiring an audio signal sent by an opposite terminal communication node received in an anti-jitter buffer of terminal equipment;
and adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal.
In an optional embodiment, the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal includes:
determining a speed variation corresponding to each audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal;
and adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
In an optional embodiment, the determining, according to the energy and/or amplitude of an audio frame in the audio signal, a speed variation corresponding to each audio frame in the audio signal includes:
determining an average amplitude of audio in the audio signal from the amplitude of each audio frame in the audio signal;
and determining the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
In an optional embodiment, the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal includes:
determining a mute frame in the audio signal according to the energy and/or amplitude of an audio frame in the audio signal;
and adjusting the playing speed of the mute frame in the audio signal.
In an optional embodiment, the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal includes:
and if the network abnormality of the terminal equipment is detected, reducing the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
In an optional embodiment, the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal includes:
and if the terminal equipment is detected to recover from the network abnormality, accelerating the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
A second aspect of the present invention provides an apparatus for processing an audio signal, the apparatus comprising:
the terminal equipment comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an audio signal sent by an opposite terminal call node received in an anti-jitter buffer of the terminal equipment;
and the adjusting module is used for adjusting the playing speed of the audio frame in the audio signal according to the energy and/or the amplitude of the audio frame in the audio signal.
In an optional implementation manner, the adjusting module is specifically configured to determine, according to energy and/or amplitude of an audio frame in the audio signal, a speed variation corresponding to each audio frame in the audio signal; and adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
In an optional embodiment, the adjusting module is specifically configured to determine an average amplitude of audio in the audio signal according to an amplitude of each audio frame in the audio signal; and determining the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
In an optional embodiment, the adjusting module is specifically configured to determine a mute frame in the audio signal according to an energy and/or an amplitude of an audio frame in the audio signal; and adjusting the playing speed of the mute frame in the audio signal.
In an optional implementation manner, the adjusting module is specifically configured to, if a network anomaly of the terminal device is detected, reduce a playing speed of an audio frame in the audio signal according to energy and/or amplitude of the audio frame in the audio signal.
In an optional implementation manner, the adjusting module is specifically configured to, if it is detected that the terminal device recovers from a network abnormality, increase a playing speed of an audio frame in the audio signal according to energy and/or amplitude of the audio frame in the audio signal.
In a third aspect of the embodiments of the present invention, there is provided an electronic device, including: a memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the various optional audio signal processing methods of the first aspect and the first aspect of the invention.
A fourth aspect of the present invention provides a storage medium having stored thereon a computer program for executing the method of processing the audio signal according to the first aspect and its various alternatives.
According to the audio signal processing method and device, the electronic device and the storage medium, the audio signal sent by the opposite terminal communication node and received in the anti-shake buffer area of the terminal device is obtained, and then the playing speed of the audio frame in the audio signal is adjusted according to the energy and/or amplitude of the audio frame in the audio signal. Compared with the prior art, when the playing speed of the audio frame in the audio signal is adjusted, the energy and/or the amplitude of the audio frame in the audio signal are referred to, the audio frame with lower energy or lower amplitude in the audio signal is adjusted greatly, and the audio frame with higher energy or larger amplitude in the audio signal is adjusted slightly, so that the stability of the audio signal during playing can be improved, the perception of a user on the speed change of the audio signal is reduced, and the experience of the user is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed to be used in the description of the embodiments or the prior art, and obviously, the drawings in the following description are some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without inventive labor.
Fig. 1 is a schematic view of an application scenario of a method for processing an audio signal according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for processing an audio signal according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another audio signal processing method according to an embodiment of the present application;
fig. 4 is a flowchart illustrating a further audio signal processing method according to an embodiment of the present application;
fig. 5 is a flowchart illustrating a further audio signal processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for processing an audio signal according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a video conference, when the network is unstable, audio signals are briefly lost and accumulated. In this case, the anti-jitter buffer provided in the terminal device is usually increased to make the sound playing smooth. But too large an anti-jitter buffer may result in too much sound delay and thus degrade the user experience. In the prior art, in order to avoid the excessive sound delay caused by the anti-jitter buffer, there is a method for automatically adjusting the playing speed. When the network is abnormal, the playing speed of the audio signal is slowed down, and when the network is recovered, the previous audio signal is obtained again and the playing speed of the audio signal is accelerated.
However, although the conventional method for adjusting the playing speed of the audio signal avoids the problem of too large sound delay when the network is abnormal, the audio signal may be suddenly played quickly and suddenly and slowly, which causes poor stability when the audio signal is played, thereby reducing user experience.
In order to solve the above problem, embodiments of the present application provide a method and an apparatus for processing an audio signal, an electronic device, and a storage medium, so as to solve the problem of poor stability when the audio signal is played. The invention conception of the application is as follows: when the playing speed of the audio signal is adjusted, the audio frame with lower energy or lower amplitude in the audio signal is adjusted greatly, and the audio frame with higher energy or larger amplitude in the audio signal is adjusted slightly, so that the stability of the audio signal during playing is improved, the perception of a user on the speed change of the audio signal is reduced, and the experience is improved.
Fig. 1 is a schematic view of an application scenario of a method for processing an audio signal according to an embodiment of the present application. As shown in fig. 1, a user performs a video conference through a first terminal apparatus 101 and a second terminal apparatus 102, and the first terminal apparatus 101 transmits an audio signal to the second terminal apparatus 102 through a server 103. At this time, if the network of the first terminal device 101 is abnormal, the first terminal device 101 may decrease the playing speed of the audio signal that has been received in the anti-jitter buffer. Subsequently, after waiting for the network of the first terminal device 101 to recover from the abnormality, the subsequent audio signal may be received again, and the playing speed of the subsequently received audio signal may be increased, so as to reduce the delay of both parties of the video conference.
The first terminal device 101 and the second terminal device 102 may be a mobile phone (mobile phone), a tablet (pad), a computer with a wireless transceiving function, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a wireless terminal in industrial control (industrial control), a wireless terminal in remote surgery (remote medical supply), a wireless terminal in smart grid (smart grid), a wireless terminal in smart home (smart home), and the like. In the embodiment of the present application, the apparatus for implementing the function of the terminal may be a terminal device, or may be an apparatus capable of supporting the terminal to implement the function, for example, a chip system, and the apparatus may be installed in the terminal device. In the embodiment of the present application, the chip system may be composed of a chip, and may also include a chip and other discrete devices.
The server 103 may be a server or a server in a cloud service platform. The server is configured to receive the video signal and the audio signal sent by the first terminal device 101 and the second terminal device 102, and send the video signal and the audio signal to the opposite-end call terminal.
It should be noted that the application scenario in the technical solution of the present application may be the application scenario in fig. 1, but is not limited to this, and may also be applied to other scenarios requiring voice call.
It can be understood that the processing method of the audio signal can be implemented by the processing apparatus of the audio signal provided in the embodiment of the present application, and the processing apparatus of the audio signal can be a part or all of a certain device, and for example, can be a terminal device or a processor of the terminal device.
The following describes in detail the technical solutions of the embodiments of the present application with specific embodiments, taking a terminal device integrated or installed with a relevant execution code as an example. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a schematic flowchart of a processing method of an audio signal according to an embodiment of the present application, where an execution main body of the embodiment is a terminal device, and the embodiment relates to a specific process of how to adjust a playing speed of an audio frame in an audio signal. As shown in fig. 2, the method includes:
s201, obtaining an audio signal sent by an opposite terminal communication node and received in an anti-shake buffer area of the terminal equipment.
In this application, terminal equipment can be provided with anti trembling buffer, and when terminal equipment and opposite terminal conversation node carried out video conference or voice call, the opposite terminal conversation node can continuously send audio signal to terminal equipment, and at this moment, terminal equipment can exist audio signal in anti trembling buffer temporarily. When the terminal device needs to play an audio signal or adjust the play speed of an audio frame in the audio signal, the audio signal sent by the opposite-end call node can be obtained from the anti-shake buffer of the terminal device.
The anti-jitter buffer is a shared data area, and audio signals are collected, stored and sent to the processor at regular intervals in the anti-jitter buffer. The jitter buffer intentionally delays the arriving audio signal when receiving the audio signal, thereby preventing the voice playing during the call from being more stable.
S202, adjusting the playing speed of the audio frame in the audio signal according to the energy and/or the amplitude of the audio frame in the audio signal.
In this step, after the terminal device obtains the audio signal sent by the peer end communication node and received in the anti-jitter buffer, the playing speed of the audio frame in the audio signal may be adjusted according to the energy and/or amplitude of the audio frame in the audio signal.
The adjusting the playing speed of the audio frame in the audio signal includes increasing the playing speed of the audio frame in the audio signal and decreasing the playing speed of the audio frame in the audio signal.
In some embodiments, if the network abnormality of the terminal device is detected, the terminal device reduces the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal. In other embodiments, if it is detected that the terminal device recovers from the network anomaly, the playing speed of the audio frame in the audio signal is increased according to the energy and/or amplitude of the audio frame in the audio signal.
In some embodiments, because there are always some places to stop and rest during normal conversation, so that there are silent frames in the audio signal, the audio signal is slowed down or speeded up in silent sections without sound, only the silent sections are lengthened or shortened, and the user feels no obvious. For example, the terminal device may first determine the mute frame in the anti-jitter buffer according to the energy and/or amplitude of the audio frame in the audio signal. And then, the terminal equipment adjusts the playing speed of the mute frames in the audio signal.
In other embodiments, if there are no silence frames in some audio signals, the speed variation of the audio signals may be assigned according to the energy and/or amplitude of the sound. When the energy or amplitude of the audio signal is low, a large rate of speed change is assigned. When the energy or amplitude of the audio signal is high, a smaller rate of speed change is assigned. For example, the terminal device may first determine a speed variation corresponding to each audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal. And then, the terminal equipment adjusts the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
According to the audio signal processing method provided by the embodiment of the application, the playing speed of the audio frame in the audio signal is adjusted according to the energy and/or amplitude of the audio frame in the audio signal by acquiring the audio signal sent by the opposite terminal communication node and received in the anti-shake buffer zone of the terminal device. Compared with the prior art, when the playing speed of the audio frame in the audio signal is adjusted, the energy and/or the amplitude of the audio frame in the audio signal are referred to, the audio frame with lower energy or lower amplitude in the audio signal is adjusted greatly, and the audio frame with higher energy or larger amplitude in the audio signal is adjusted slightly, so that the stability of the audio signal during playing can be improved, the perception of a user on the speed change of the audio signal is reduced, and the experience of the user is improved.
On the basis of the above embodiments, the following provides two ways to adjust the playing speed of the audio frames in the audio signal based on the energy and/or amplitude of the audio frames in the audio signal. Fig. 3 is a flowchart illustrating another audio signal processing method according to an embodiment of the present application, and fig. 3 is a first manner of adjusting a playing speed of an audio frame in an audio signal based on an energy and/or an amplitude of the audio frame in the audio signal. As shown in fig. 3, the audio signal processing method includes:
s301, audio signals sent by the opposite terminal communication node and received in the anti-shake buffer area of the terminal device are obtained.
Technical terms, technical effects, technical features and optional embodiments of step S301 can be understood with reference to step S201 shown in fig. 2, and repeated contents will not be described herein.
S302, determining the speed variation corresponding to each audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
In this step, after acquiring the audio signal sent by the peer end communication node and received in the anti-jitter buffer, the terminal device may determine, according to the energy and/or amplitude of the audio frame in the audio signal, a speed variation corresponding to each audio frame in the audio signal.
The embodiment of the application does not limit how to determine the speed variation corresponding to each audio frame, and only needs to ensure that a larger speed variation proportion is allocated when the energy or amplitude of the audio signal is lower, and a smaller speed variation proportion is allocated when the energy or amplitude of the audio signal is higher.
In some embodiments, the terminal device may determine an average amplitude of audio in the audio signal from the amplitude of each audio frame in the audio signal. Then, the terminal device determines the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
In the first mode, the terminal device may allocate the speed variation corresponding to each frame of audio frame in a manner that the amplitude and the speed variation are in inverse linear proportion.
For example, if the time length of the audio signal in the anti-shake buffer is X seconds, stretching/compressing to Y seconds is required, and the frame length of the audio signal is T seconds, the audio signal may be divided into N ═ X/T frames. Correspondingly, the terminal device may count the average amplitude a of the whole audio signal of X seconds, and for the nth frame of audio frame with the duration T, the terminal device may also count its amplitude An. Subsequently, the terminal device may input the average amplitude a of the audio signal and the amplitude An of the n-th frame audio frame into equations (1) and (2), and determine the speed change amount corresponding to each audio frame in the audio signal. Equations (1) and (2) are as follows:
Cn=s*A/An.................................(1)
s=(Y-X)/(A/A1+A/A2+......+A/An)...................(2)
cn is the speed variation corresponding to the nth frame of audio frame, s is the coefficient to be determined, A is the average amplitude of the audio signal, An is the amplitude of the nth frame of audio frame, X is the original time length of the audio signal, and the target stretching/compressing time length of the Y audio signal.
In addition, in order to prevent that when An is equal to 0 or small, it results in An unsolvability or too large allocation difference, the terminal device may add a small positive number M to An. At this time, the terminal device may input the average amplitude a of the audio signal and the amplitude An of the n-th frame audio frame into equations (3) and (4), and determine the speed variation corresponding to each audio frame in the audio signal. Equations (3) and (4) are as follows:
Cn=s*A/(An+M)................................(3)
s=(Y-X)/(A/(A1+M)+A/(A2+M)+......+A/(An+M))...................(4)
it should be noted that, in the embodiment of the present application, the value of M is not limited, and may be specifically set according to an actual situation, for example, 5% of the average amplitude may be taken.
In the second way, the terminal device may further allocate the speed variation corresponding to each frame of the audio frame in a manner that the amplitude and the speed variation are non-linearized.
For example, the terminal device may input the average amplitude a of the audio signal and the amplitude An of the nth frame audio frame into equations (5) and (6), and determine the speed variation corresponding to each audio frame in the audio signal. Equations (5) and (6) are as follows:
Cn=s*(A/(An+M))^z................................(5)
s=(Y-X)/((A/(A1+M))^z+(A/(A2+M))^z+......+(A/(An+M))^z)..........(6)
wherein, Z is a constant and can be specifically set according to the actual situation. If more speed variations need to be assigned for audio frames with small amplitudes, z may take a value greater than 1. If it is desired to distribute the audio frames of different amplitudes evenly, z may take a value less than 1.
It should be noted that, in the embodiment of the present application, the adjustment of the playing speed of the audio frame in the audio signal is performed by taking the amplitude as an example, but the present application is not limited thereto. The amplitude a in the formulas (1) to (6) related to the embodiments of the present application may be replaced by energy E or power P.
And S303, adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
In this step, after the terminal device determines the speed variation corresponding to each audio frame in the audio signal, the playing speed of each audio frame in the audio signal may be adjusted based on the speed variation corresponding to each audio frame in the audio signal.
Fig. 4 is a flowchart illustrating a further audio signal processing method according to an embodiment of the present application, and fig. 4 is a first manner of adjusting a playing speed of an audio frame in an audio signal based on an energy and/or an amplitude of the audio frame in the audio signal. As shown in fig. 4, the audio signal processing method includes:
s401, audio signals sent by the opposite terminal communication node and received in an anti-shake buffer area of the terminal device are obtained.
The technical terms, technical effects, technical features and optional embodiments of step S401 can be understood by referring to step S201 shown in fig. 2, and repeated contents will not be described herein.
S402, determining a mute frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
In this step, after the terminal device obtains the audio signal sent by the peer end communication node and received in the anti-jitter buffer, the terminal device may determine a mute frame in the audio signal according to the energy and/or amplitude of an audio frame in the audio signal.
In the embodiments of the present application, there is no limitation on how to determine the mute frame in the audio signal, and in some embodiments, when the energy and/or amplitude of a certain frame of audio frame in the audio signal is zero, the terminal device may determine that the certain frame of audio frame is a mute frame. In other embodiments, when the energy of a frame of audio frames in the audio signal is below an energy threshold and/or the amplitude is below an amplitude threshold, the terminal device may determine that the audio frame is a silence frame.
S403, adjusting the playing speed of the mute frames in the audio signal.
In this step, after the terminal device determines the mute frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal, the playing speed of the mute frame in the audio signal may be adjusted.
In the embodiments of the present application, how to adjust the play speed of the mute frames in the audio signal is not limited, and in some embodiments, the terminal device may equally allocate a speed variation to each mute frame. In other embodiments, the terminal device may assign more speed variance to consecutive silence frames.
On the basis of the above embodiment, the following provides a method for processing an audio signal after a network abnormality occurs during a video conference. Fig. 5 is a schematic flowchart of another audio signal processing method according to an embodiment of the present application, and as shown in fig. 5, the audio signal processing method includes:
s501, obtaining an audio signal sent by an opposite terminal communication node and received in an anti-shake buffer area of the terminal equipment.
S502, if the network abnormality of the terminal equipment is detected, reducing the playing speed of the audio frame in the audio signal according to the energy and/or the amplitude of the audio frame in the audio signal.
S503, if the terminal equipment is detected to recover from the network abnormity, the playing speed of the audio frame in the audio signal is accelerated according to the energy and/or the amplitude of the audio frame in the audio signal.
According to the audio signal processing method provided by the embodiment of the application, the playing speed of the audio frame in the audio signal is adjusted according to the energy and/or amplitude of the audio frame in the audio signal by acquiring the audio signal sent by the opposite terminal communication node and received in the anti-shake buffer zone of the terminal device. Compared with the prior art, when the playing speed of the audio frame in the audio signal is adjusted, the energy and/or the amplitude of the audio frame in the audio signal are referred to, the audio frame with lower energy or lower amplitude in the audio signal is adjusted greatly, and the audio frame with higher energy or larger amplitude in the audio signal is adjusted slightly, so that the stability of the audio signal during playing can be improved, the perception of a user on the speed change of the audio signal is reduced, and the experience of the user is improved.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Fig. 6 is a schematic structural diagram of an apparatus for processing an audio signal according to an embodiment of the present disclosure. The processing device of the audio signal may be implemented by software, hardware or a combination of the two, and may be, for example, the terminal device or the chip of the terminal device in the above embodiments, so as to execute the processing method of the audio signal in the above embodiments. As shown in fig. 6, the audio signal processing apparatus includes: an obtaining module 601 and an adjusting module 602.
An obtaining module 601, configured to obtain an audio signal sent by an opposite end call node and received in an anti-jitter buffer of a terminal device;
the adjusting module 602 is configured to adjust a playing speed of an audio frame in the audio signal according to an energy and/or an amplitude of the audio frame in the audio signal.
In an optional implementation manner, the adjusting module 602 is specifically configured to determine, according to energy and/or amplitude of an audio frame in an audio signal, a speed variation corresponding to each audio frame in the audio signal; and adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
In an optional implementation, the adjusting module 602 is specifically configured to determine an average amplitude of audio in the audio signal according to an amplitude of each audio frame in the audio signal; and determining the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
In an optional embodiment, the adjusting module 602 is specifically configured to determine a mute frame in the audio signal according to an energy and/or an amplitude of an audio frame in the audio signal; the play speed of the mute frames in the audio signal is adjusted.
In an optional implementation manner, the adjusting module 602 is specifically configured to, if a network anomaly of the terminal device is detected, reduce a playing speed of an audio frame in the audio signal according to energy and/or amplitude of the audio frame in the audio signal.
In an optional implementation manner, the adjusting module 602 is specifically configured to, if it is detected that the terminal device recovers from the network anomaly, increase the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
The audio signal processing apparatus provided in the embodiment of the present application may perform the audio signal processing method in the foregoing method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device may include: at least one processor 701 and a memory 702. Fig. 7 shows an electronic device as an example of a processor.
And a memory 702 for storing programs. In particular, the program may include program code including computer operating instructions.
The memory 702 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 701 is configured to execute the computer-executable instructions stored in the memory 702 to implement the crowd density prediction method;
the processor 701 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.
Optionally, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are implemented independently, the communication interface, the memory 702 and the processor 701 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are integrated into a chip, the communication interface, the memory 702 and the processor 701 may complete communication through an internal interface.
The embodiment of the application also provides a chip which comprises a processor and an interface. Wherein the interface is used for inputting and outputting data or instructions processed by the processor. The processor is configured to perform the methods provided in the above method embodiments. The chip can be applied to a processing device of audio signals.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, are specifically, the computer-readable storage medium stores program information, and the program information is used for the processing method of the audio signal.
Embodiments of the present application also provide a program, which when executed by a processor, is configured to perform the method for processing an audio signal provided by the above method embodiments.
Embodiments of the present application further provide a program product, such as a computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the method for processing an audio signal provided by the above method embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (14)

1. A method of processing an audio signal, the method comprising:
acquiring an audio signal sent by an opposite terminal communication node received in an anti-jitter buffer of terminal equipment;
and adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal.
2. The method of claim 1, wherein the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal comprises:
determining a speed variation corresponding to each audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal;
and adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
3. The method of claim 2, wherein determining the speed change amount corresponding to each audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal comprises:
determining an average amplitude of audio in the audio signal from the amplitude of each audio frame in the audio signal;
and determining the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
4. The method of claim 1, wherein the adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal comprises:
determining a mute frame in the audio signal according to the energy and/or amplitude of an audio frame in the audio signal;
and adjusting the playing speed of the mute frame in the audio signal.
5. The method according to any one of claims 1-4, wherein said adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal comprises:
and if the network abnormality of the terminal equipment is detected, reducing the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
6. The method according to any one of claims 1-4, wherein said adjusting the playing speed of the audio frames in the audio signal according to the energy and/or amplitude of the audio frames in the audio signal comprises:
and if the terminal equipment is detected to recover from the network abnormality, accelerating the playing speed of the audio frame in the audio signal according to the energy and/or amplitude of the audio frame in the audio signal.
7. An apparatus for processing an audio signal, the apparatus comprising:
the terminal equipment comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an audio signal sent by an opposite terminal call node received in an anti-jitter buffer of the terminal equipment;
and the adjusting module is used for adjusting the playing speed of the audio frame in the audio signal according to the energy and/or the amplitude of the audio frame in the audio signal.
8. The apparatus according to claim 7, wherein the adjusting module is specifically configured to determine, according to energy and/or amplitude of audio frames in the audio signal, a speed variation corresponding to each audio frame in the audio signal; and adjusting the playing speed of each audio frame in the audio signal according to the speed variation corresponding to each audio frame in the audio signal.
9. The apparatus according to claim 8, wherein the adjusting module is specifically configured to determine an average amplitude of audio in the audio signal according to an amplitude of each audio frame in the audio signal; and determining the speed variation corresponding to each audio frame in the audio signal according to the average amplitude of the audio in the audio signal and the amplitude of each audio frame in the audio signal.
10. The apparatus according to claim 7, wherein the adjusting module is specifically configured to determine the silence frame in the audio signal according to an energy and/or an amplitude of an audio frame in the audio signal; and adjusting the playing speed of the mute frame in the audio signal.
11. The apparatus according to any one of claims 7 to 10, wherein the adjusting module is specifically configured to, if a network anomaly of the terminal device is detected, reduce a playing speed of an audio frame in the audio signal according to an energy and/or an amplitude of the audio frame in the audio signal.
12. The apparatus according to any one of claims 7 to 10, wherein the adjusting module is specifically configured to, if it is detected that the terminal device recovers from a network anomaly, increase a playing speed of an audio frame in the audio signal according to an energy and/or an amplitude of the audio frame in the audio signal.
13. An electronic device, comprising: a memory and a processor;
the memory for storing executable instructions of the processor;
the processor is configured to perform the method of any of claims 1-6 via execution of the executable instructions.
14. A storage medium having a computer program stored thereon, comprising: the program, when executed by a processor, implements the method of any of claims 1-6.
CN202010622129.1A 2020-07-01 2020-07-01 Audio signal processing method and device, electronic equipment and storage medium Active CN111787268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010622129.1A CN111787268B (en) 2020-07-01 2020-07-01 Audio signal processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010622129.1A CN111787268B (en) 2020-07-01 2020-07-01 Audio signal processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111787268A true CN111787268A (en) 2020-10-16
CN111787268B CN111787268B (en) 2022-04-22

Family

ID=72760528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010622129.1A Active CN111787268B (en) 2020-07-01 2020-07-01 Audio signal processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111787268B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979798A (en) * 2022-04-21 2022-08-30 维沃移动通信有限公司 Play speed control method and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263672A1 (en) * 2006-05-09 2007-11-15 Nokia Corporation Adaptive jitter management control in decoder
CN101119323A (en) * 2007-09-21 2008-02-06 腾讯科技(深圳)有限公司 Method and device for solving network jitter
US20100290454A1 (en) * 2007-11-30 2010-11-18 Telefonaktiebolaget Lm Ericsson (Publ) Play-Out Delay Estimation
CN103747287A (en) * 2014-01-13 2014-04-23 合一网络技术(北京)有限公司 Video playing speed regulation method and system applied to flash
CN105245496A (en) * 2015-08-26 2016-01-13 广州市百果园网络科技有限公司 Audio data play method and device
US20160180857A1 (en) * 2013-06-21 2016-06-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Jitter Buffer Control, Audio Decoder, Method and Computer Program
TW201627990A (en) * 2015-01-21 2016-08-01 宇智網通股份有限公司 Time domain based voice event detection method and related device
CN106559635A (en) * 2015-09-30 2017-04-05 杭州萤石网络有限公司 A kind of player method and device of multimedia file
WO2017095276A1 (en) * 2015-11-30 2017-06-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and receiving device for adapting a play-out rate of a jitter buffer
CN108605162A (en) * 2016-12-30 2018-09-28 华为技术有限公司 The treating method and apparatus of audio data
CN110799936A (en) * 2017-08-18 2020-02-14 Oppo广东移动通信有限公司 Volume adjusting method and device, terminal equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263672A1 (en) * 2006-05-09 2007-11-15 Nokia Corporation Adaptive jitter management control in decoder
CN101119323A (en) * 2007-09-21 2008-02-06 腾讯科技(深圳)有限公司 Method and device for solving network jitter
US20100290454A1 (en) * 2007-11-30 2010-11-18 Telefonaktiebolaget Lm Ericsson (Publ) Play-Out Delay Estimation
US20160180857A1 (en) * 2013-06-21 2016-06-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Jitter Buffer Control, Audio Decoder, Method and Computer Program
CN103747287A (en) * 2014-01-13 2014-04-23 合一网络技术(北京)有限公司 Video playing speed regulation method and system applied to flash
TW201627990A (en) * 2015-01-21 2016-08-01 宇智網通股份有限公司 Time domain based voice event detection method and related device
CN105245496A (en) * 2015-08-26 2016-01-13 广州市百果园网络科技有限公司 Audio data play method and device
CN106559635A (en) * 2015-09-30 2017-04-05 杭州萤石网络有限公司 A kind of player method and device of multimedia file
WO2017095276A1 (en) * 2015-11-30 2017-06-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and receiving device for adapting a play-out rate of a jitter buffer
CN108605162A (en) * 2016-12-30 2018-09-28 华为技术有限公司 The treating method and apparatus of audio data
CN110799936A (en) * 2017-08-18 2020-02-14 Oppo广东移动通信有限公司 Volume adjusting method and device, terminal equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗智华: ""数字语音传输系统的软件设计与实现"", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979798A (en) * 2022-04-21 2022-08-30 维沃移动通信有限公司 Play speed control method and electronic equipment
CN114979798B (en) * 2022-04-21 2024-03-22 维沃移动通信有限公司 Playing speed control method and electronic equipment

Also Published As

Publication number Publication date
CN111787268B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN111030936B (en) Current-limiting control method and device for network access and computer-readable storage medium
US9626364B2 (en) Streaming media
US9008321B2 (en) Audio processing
CN107682752B (en) Method, device and system for displaying video picture, terminal equipment and storage medium
US20150134846A1 (en) Method and apparatus for media segment request retry control
CN113490055B (en) Data processing method and device
KR20170139513A (en) Transmission device, transmission method and program
CN111787268B (en) Audio signal processing method and device, electronic equipment and storage medium
CN110022335B (en) Data packet transmitting method, device, server and computer readable storage medium
CN112799793B (en) Scheduling method and device, electronic equipment and storage medium
CN110602338B (en) Audio processing method, device, system, storage medium and equipment
CN112866970B (en) Communication connection method and device, electronic equipment and wireless transmission system
CN108494702B (en) Transmission control method and apparatus, storage medium, and electronic apparatus
CN111478916B (en) Data transmission method, device and storage medium based on video stream
CN108616767B (en) Audio data transmission method and device
CN114416013A (en) Data transmission method, data transmission device, electronic equipment and computer-readable storage medium
JP2021515463A (en) Providing activity notifications regarding digital content
CN111355919B (en) Communication session control method and device
US9806967B2 (en) Communication device and data processing method
CN109716432B (en) Gain processing method and device, electronic equipment, signal acquisition method and system
CN114546910A (en) Access control method, device, storage medium and electronic device
CN108924465B (en) Method, device, equipment and storage medium for determining speaker terminal in video conference
CN106341519B (en) Audio data processing method and device
CN113824689B (en) Edge computing network, data transmission method, device, equipment and storage medium
CN109905320B (en) Message distribution method and device for aggregation port

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant