WO2022062979A1 - 音频处理方法、计算机可读存储介质、及电子设备 - Google Patents

音频处理方法、计算机可读存储介质、及电子设备 Download PDF

Info

Publication number
WO2022062979A1
WO2022062979A1 PCT/CN2021/118398 CN2021118398W WO2022062979A1 WO 2022062979 A1 WO2022062979 A1 WO 2022062979A1 CN 2021118398 W CN2021118398 W CN 2021118398W WO 2022062979 A1 WO2022062979 A1 WO 2022062979A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio signal
signal
sound
electronic device
Prior art date
Application number
PCT/CN2021/118398
Other languages
English (en)
French (fr)
Inventor
杨枭
田立生
李肖
张海宏
朱统
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21871350.1A priority Critical patent/EP4210344A4/en
Publication of WO2022062979A1 publication Critical patent/WO2022062979A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/637Control signals issued by the client directed to the server or network components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/033Headphones for stereophonic communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/02Arrangements for generating broadcast information; Arrangements for generating broadcast-related information with a direct linking to broadcast information or to broadcast space-time; Arrangements for simultaneous generation of broadcast information and broadcast-related information
    • H04H60/04Studio equipment; Interconnection of studios
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to an audio processing method, a computer-readable storage medium, and an electronic device.
  • live broadcasting has gradually become a popular application with a large number of users. Some internet celebrity anchors even have millions of "fans".
  • the live broadcast method can give full play to the advantages of the mobile Internet, and can broadcast product demonstrations, conferences, evaluations, surveys, interviews, classes, trainings and other content on the spot. After the live broadcast is completed, its audio and video content can also be replayed and on-demand at any time to maximize the value of the live broadcast content.
  • the main devices for capturing audio and video are mobile phones and live sound cards.
  • live audio capture is different.
  • a professional-grade sound acquisition structure cannot be designed inside the mobile phone, nor can a large-sized microphone device be used.
  • live broadcast it is often necessary to mix the host sound of the live host with the background accompaniment sound to be released as the published audio, and the host also needs to hear the audio feedback from the distant audience.
  • the present application provides an audio processing method, an electronic device, and a computer-readable storage medium.
  • the audio processing method does not have the problem of time delay, has few connections, and is easy to implement.
  • the applicant has found through research that the rapid popularization of wireless earphones is an industry trend. With the gradual increase in sales of binaural true wireless Bluetooth earphones (TWS), more and more users are using such earphones.
  • TWS binaural true wireless Bluetooth earphones
  • TWS headphones are small in size, completely remove the connection, and will not affect user activities.
  • TWS headsets when it is used for sound collection, since it is worn on the outer ear, the position is basically fixed, so it is very close to the human mouth, and the relative position of the two is also basically fixed, which shows that the use of TWS to collect people The unique advantages of sound.
  • the applicant and others proposed a live audio processing solution that uses wireless headsets, especially TWS headsets, for live broadcast, and through the cooperation of mobile phones and wireless headsets, TWS headsets are closely integrated with live broadcast applications, so as to eliminate connections and simplify Number of devices and ease of lifting.
  • the present application provides an audio processing method for an audio processing system, where the audio processing system includes an electronic device, an accompaniment sound providing device, and a wireless earphone.
  • the audio processing method includes:
  • the electronic device starts a live broadcast application, the live broadcast application publishes audio, and receives feedback audio associated with the audio;
  • the electronic device receives the audio signal including the accompaniment sound signal sent by the accompaniment sound providing device, and uses the received audio signal as or after processing as the first audio signal;
  • the electronic device publishes the first audio signal through the live broadcast application, and receives feedback audio through the Internet through the live broadcast application, where the feedback audio is a second audio signal;
  • the electronic device obtains a third audio signal by mixing the first audio signal and the second audio signal;
  • the electronic device transmits the third audio signal wirelessly to the wireless headset associated with the electronic device for listening.
  • the electronic device may be a terminal used by the host for live broadcast, such as a mobile phone, a tablet computer and other electronic devices.
  • the accompaniment sound providing device may be a device capable of providing audio output, such as an accompaniment mobile phone, an audio player, and the like.
  • the wireless earphone and the electronic device are wirelessly connected to listen to the third audio signal (that is, the audio formed by the mixing of the accompaniment sound, the main sound and the audience sound) obtained by the electronic device based on the mixing of the first audio signal and the second audio signal. signal), remove the connection, will not affect user activities, improve portability.
  • the host can carry out live broadcast by carrying fewer electronic devices, reducing wired connections, improving portability, and the live broadcast effect is better without delay, avoiding the many devices and connections shown in Figure 2(a).
  • the problems such as complexity and the existence of delay difference as shown in Figure 2(b), and it is difficult for users to find the problem of the existence of delay difference, which will directly affect the problem of live broadcast effect.
  • the wireless earphone can be, for example, a Bluetooth earphone, which can be connected with a live broadcast mobile phone through a Bluetooth module to transmit audio signals.
  • any wireless headset that can transmit audio signals between the headset and the mobile phone in a non-wired manner should be understood as falling within the scope of the present application.
  • the audio processing system further includes an audio processor (for providing audio processing such as sound mixing) and a sound collection device (for collecting the main broadcast sound), the accompaniment sound providing device and the The sound collecting device is respectively connected to the sound collecting device, and the sound collecting device is connected to the electronic device, and the electronic device receives the audio signal including the accompaniment sound signal sent by the accompaniment sound providing device, and sends the sound signal to the electronic device.
  • the received audio signal as or processed as the first audio signal includes: the audio processor receives the accompaniment audio signal provided by the accompaniment sound providing device and the main broadcast sound signal collected by the sound collection device through wired communication, respectively, and mixing to obtain the first audio signal, and the electronic device receives the first audio signal provided by the audio processor through wired communication.
  • the audio signal received by the electronic device is the audio signal mixed by the audio processor, which is directly used as the first audio signal.
  • the accompaniment sound signal is transmitted to the audio processor by the accompaniment sound providing device, and the main broadcast sound signal is collected by the sound acquisition device and transmitted to the audio processor, and the audio processor
  • the accompaniment sound signal and the main broadcast sound signal are mixed to obtain the first audio signal
  • the electronic device is connected to the audio processor through wired communication to receive the first audio signal from the audio processor. Therefore, it can be ensured that the audio signal from the audio processor is free from external interference, and the signal loss during the communication process is relatively small, so that the sound quality can be improved.
  • the electronic device plays the first audio signal through the live broadcast application, it receives the audio signal reflecting the feedback via the Internet, which enables the anchor to target a wider audience, and the audience can listen and watch more conveniently.
  • the audio processor, the sound collection device, and the accompaniment sound providing device may be integrated into one electronic device, or these functions may be jointly implemented by multiple electronic devices.
  • the sound collection device can be a separate microphone, a microphone, or an electronic device such as a mobile phone with a microphone component, which can provide the main broadcast sound signal.
  • the accompaniment sound providing device may be an accompaniment mobile phone, a record player, or other electronic device capable of playing audio, and capable of providing accompaniment sound signals. Both the accompaniment sound providing device and the sound acquisition device can be connected to the audio processor in a wired manner, and the audio processor mixes the accompaniment sound signal with the main broadcast sound signal to obtain the first audio signal. Therefore, the first audio signal with better quality can be obtained conveniently through the audio processor, the sound collection device, and the accompaniment sound providing device.
  • the first audio signal is a digital audio signal. That is, the first audio output by the audio processor may be a digital audio signal, which may be connected to the electronic device through digital input, for example, the audio processor is connected to the USB interface or TypeC interface of the electronic device through a USB connector or a TypeC connector.
  • the first audio signal may also be an analog audio signal.
  • the first audio signal of the analog signal is converted into the first audio signal of the digital signal by the electronic device, and the first audio signal of the digital signal is released.
  • the first audio output from the audio processor may be an analog audio signal, which is connected to the electronic device through analog input, for example, the audio processor is connected to an earphone socket of the electronic device through an earphone connector.
  • the electronic device can convert the analog audio signal into a digital audio signal through the underlying Codec, and then release the digital audio signal through the wireless network. That is to say, the electronic device can adapt to the audio processors of different types of audio signals, and all can transmit digital audio signals, which increases the accuracy and stability of audio transmission.
  • the electronic device includes a sound mixing module and a channel control module.
  • the channel control module When the electronic device starts the live broadcast application, the channel control module enables the electronic device to enable a wireless audio interface and a wired audio interface at the same time.
  • the channel control module activates the mixing module for mixing and simultaneously establishes a first channel, a second channel, a third channel, and a fourth channel, wherein the first channel combines the
  • the first audio signal is sent to the live application for publishing through the application
  • the second channel sends the first audio signal from all line audio interfaces to the mixing module
  • the third sends the second audio signal received through the wireless network to the sound mixing module by the live broadcast application
  • the fourth channel is for the sound mixing module to send the second audio signal based on the first audio signal and the second audio signal.
  • the third audio signal obtained by mixing is sent to the wireless audio interface through wireless communication, so that the wireless earphone can monitor.
  • the electronic device starts the live broadcast through the live broadcast application of the application layer at the bottom of the system, and performs the mixing and channel control through the mixing module and the channel control module of the framework layer, so as to realize the mixing and transmission of audio. Therefore, by arranging the channel control module and the sound mixing module at the frame layer of the system bottom layer of the electronic device, it is possible to perform audio transmission accurately and stably on the basis of audio mixing, and realize wireless earphone listening.
  • the electronic device receives an audio signal including an accompaniment sound signal sent by the accompaniment sound providing device, and uses the received audio signal as or after processing As the first audio signal includes:
  • the electronic device receives the accompaniment sound signal sent by the accompaniment sound providing device via wired communication
  • the wireless earphone collects the host audio signal, and sends the host audio signal to the electronic device via wireless communication;
  • the electronic device performs sound mixing processing based on the accompaniment sound signal and the main broadcast sound signal to obtain the first audio signal.
  • the electronic device uses the received audio signal after mixing processing as the first audio signal.
  • the electronic device may be a live broadcast terminal of the host, such as a mobile phone, a tablet computer and other electronic devices.
  • the accompaniment sound providing device can be an accompaniment mobile phone, a record player and other electronic devices capable of playing audio to provide accompaniment sound signals.
  • the electronic device and the accompaniment sound providing device are connected through wired communication to ensure that the accompaniment sound signal from the accompaniment sound providing device is not affected by the outside world. interference, and the signal loss during the communication process is relatively small, so that the sound quality can be improved.
  • the electronic device publishes the first audio signal through the live broadcast application, it receives the audio signal reflecting the feedback, which enables the anchor to target a wider audience (the audience can listen to and watch the application played by the electronic device through the Internet). Convenient.
  • the wireless headset and the electronic device are wirelessly connected to pick up the main sound from the wireless headset and provide it to the electronic device, and at the same time receive the third audio signal (that is, the accompaniment sound, main sound, The audio signal formed by the audience sound mixing), the wireless connection completely removes the connection, does not affect the user's activities, and improves portability.
  • the wireless earphone is worn on the outer ear, the position is basically fixed, so it is very close to the human mouth, and the relative position of the two is basically fixed, which shows the unique advantage of using wireless earphones to collect human voices.
  • the host can perform live broadcast by carrying fewer electronic devices, and the host only needs to carry electronic equipment, accompaniment sound providing equipment and wireless headphones to perform live broadcast, and only the accompaniment sound providing equipment and electronic equipment are wired and connected. less, which further improves portability, and the live broadcast effect is better without delay, which avoids the problems of many devices and complicated connections shown in Figure 2(a) and the delay difference shown in Figure 2(b). In addition, it is difficult for users to find out that there is a problem of delay difference, which will directly affect the effect of live broadcast.
  • the accompaniment sound signal is a digital audio signal.
  • the first audio output from the accompaniment sound providing device may be a digital audio signal, which may be connected to the electronic device through digital input.
  • the accompaniment sound signal may also be an analog audio signal.
  • the accompaniment sound signal of the analog signal is converted into the accompaniment sound signal of the digital signal by the electronic device, and thereafter, the electronic device mixes the accompaniment sound signal of the digital signal with the main sound signal.
  • the first audio output from the accompaniment sound providing device may be an analog audio signal, which may be connected to the electronic device through analog input, for example, the accompaniment sound providing device is connected to an earphone socket of the electronic device through an earphone connector.
  • the electronic device can convert the analog audio signal into the digital audio signal through the underlying Codec (codec), and then release the digital audio signal through the network. Therefore, the electronic device can adapt to different types of audio signals of the accompaniment sound providing device, and all can be released as digital audio signals, thereby increasing the accuracy and stability of audio transmission.
  • codec Codec
  • the electronic device includes a sound mixing module and a channel control module.
  • the channel control module When the electronic device starts the live broadcast application, the channel control module enables the electronic device to enable a wireless audio interface and a wired audio interface at the same time.
  • the channel control module activates the mixing module for mixing, and simultaneously establishes a channel, b channel, c channel, d channel, and e channel, wherein the a channel combines the accompaniment sound from the wired audio interface
  • the signal is sent to the sound mixing module, the b channel sends the main broadcast sound signal from the wireless audio interface to the sound mixing module, and the c channel will be used by the sound mixing module based on the accompaniment sound signal.
  • the first audio signal obtained by mixing the main broadcast audio signal is sent from the mixing module to the live broadcast application for publication through the Internet, and the channel d is the second audio signal received by the live broadcast application through the Internet.
  • the audio signal is sent to the sound mixing module, and the e-channel sends the third audio signal obtained by the sound mixing module based on the mixing of the first audio signal and the second audio signal to the wireless audio an interface to transmit to the wireless headset associated with the electronic device over a wireless network.
  • the electronic device starts the live broadcast through the application of the application layer at the bottom of the system, and performs mixing and channel control through the mixing module and the channel control module of the framework layer, thereby realizing the mixing and transmission of audio. Therefore, by setting the channel control module and the sound mixing module on the frame layer of the system bottom layer of the electronic device, the sound can be picked up by the wireless earphone, and the main broadcast sound, accompaniment sound and audience feedback sound obtained by the sound pickup can be collected based on the sound mixing module. (The second audio signal) is mixed, and then transmitted to the host through wireless communication for the host to listen to.
  • the sound mixing is performed by any sound mixing algorithm among a linear method, a fixed weight method, and a dynamic weight method. That is to say, the audio can be mixed through a mixing algorithm such as a linear method, a fixed weight method, and a dynamic weight method. Thereby, the sound mixing can be easily realized, and the sound mixing effect is good.
  • an audio processing method for an electronic device at a live broadcast end comprising: starting a live broadcast application, the live broadcast application publishes audio and receives feedback audio associated with the published audio; receiving includes The audio signal including the accompaniment sound signal, and the received audio signal is used or processed as the first audio signal for publishing; the first audio signal is published through the live broadcast application, and the live broadcast
  • the application receives feedback audio through the Internet, where the feedback audio is a second audio signal; mixes the first audio signal and the second audio signal to obtain a third audio signal; transmits the third audio signal through wireless communication, so that the wireless headset can monitor.
  • the audio signal for release ie, the first audio signal
  • the feedback audio signal ie, the second audio signal
  • the host can carry fewer electronic devices, and the connection is simpler and less, and the live broadcast can be carried out conveniently.
  • the receiving an audio signal including an accompaniment sound signal, and using or processing the received audio signal as the first audio signal includes: receiving the first audio signal via wired communication
  • the first audio signal is mixed with an accompaniment sound signal and an anchor sound signal.
  • the electronic device directly inputs the first audio signal mixed with the accompaniment sound signal and the main sound signal through wired communication, publishes it through the Internet, and receives the second audio signal associated with the first audio signal.
  • audio signal, and the first audio signal and the second audio signal are mixed and sent through wireless communication, so as to realize wireless earphone monitoring.
  • the first audio signal is a digital audio signal.
  • the first audio signal is an analog audio signal
  • the analog audio signal is converted into a digital audio signal for generating the published audio by the electronic device.
  • the electronic device includes a sound mixing module and a channel control module, and when the electronic device starts the live broadcast application, the channel control module enables the electronic device to simultaneously enable wireless audio interface and wired audio interface, the channel control module activates the mixing module for mixing, and establishes a first channel, a second channel, a third channel, and a fourth channel at the same time, wherein the first channel will come from The first audio signal of the wired audio interface is sent to the live broadcast application for publishing through the application, and the second channel sends the first audio signal from all wired audio interfaces to the live broadcast application.
  • the third channel sends the second audio signal received through the wireless network to the sound mixing module by the live broadcast application, and the fourth channel is used by the sound mixing module based on the first audio
  • the third audio signal obtained by mixing the signal with the second audio signal is sent to the wireless audio interface through wireless communication, so that the wireless earphone can monitor.
  • the receiving an audio signal including an accompaniment sound signal, and using or processing the received audio signal as the first audio signal includes: receiving the audio signal via wired communication An accompaniment sound signal; receiving a main broadcast sound signal via wireless communication; mixing the accompaniment sound signal with the main broadcast sound signal to obtain the first audio signal.
  • the electronic device inputs the accompaniment sound signal through wired communication, picks up the sound through the wireless earphone, and sends the picked up main broadcast sound signal to the electronic device through wireless communication, and the electronic device sends the accompaniment sound signal to the electronic device.
  • the main broadcast audio signal is mixed to form a first audio signal
  • the second audio signal associated with the first audio signal is received through the Internet, and the first audio signal and the second audio signal are mixed and sent through wireless communication, for wireless headphone monitoring.
  • the functions of sound pickup and monitoring are realized at the same time through wireless headphones. It can further reduce the equipment, reduce the connection, and be more convenient.
  • the accompaniment sound signal is a digital audio signal.
  • the accompaniment sound signal is an analog audio signal
  • the analog audio signal is converted into a digital audio signal by the electronic device, and the electronic device is based on the accompaniment sound signal of the digital audio signal and the host sound signal. Do the mix.
  • the electronic device includes a sound mixing module and a channel control module, and when the electronic device starts the live broadcast application, the channel control module enables the electronic device to simultaneously enable wireless The audio interface and the wired audio interface, the channel control module activates the mixing module for mixing, and establishes a channel, a channel b, a channel c, a channel d, and a channel e at the same time, wherein the channel a will come from all channels.
  • the accompaniment sound signal of the wired audio interface is sent to the sound mixing module, the b channel sends the main broadcast sound signal from the wireless audio interface to the sound mixing module, and the c channel will be used by the sound mixing module.
  • the module sends the first audio signal obtained by mixing the accompaniment sound signal and the main broadcast sound signal from the sound mixing module to the live broadcast application for publishing through the Internet, and the d channel is transmitted by the live broadcast.
  • the class application sends the second audio signal received through the Internet to the sound mixing module, and the e-channel sends the second audio signal obtained by the sound mixing module based on the mixing of the first audio signal and the second audio signal.
  • Three audio signals are sent to the wireless audio interface to be sent to the wireless headset associated with the electronic device through a wireless network.
  • the sound mixing is performed by any one of a linear method, a fixed weight method, and a dynamic weight method.
  • the present application provides a computer-readable storage medium storing computer-readable codes that, when executed by one or more processors, cause the processors to perform the execution according to the second aspect above The audio processing method of either implementation.
  • the present application provides an electronic device for publishing audio through the live broadcast application and receiving feedback audio associated with the published audio, including: a wireless audio interface and a wired audio interface; for receiving audio signals including accompaniment sound signals through the wireless audio interface and the wired audio interface; a channel control module and a sound mixing module, and the channel control module is used for when the electronic device starts the live broadcast application,
  • the wireless audio interface and the wired audio interface are enabled, and the audio signal collected by the audio signal acquisition module is sent to the audio mixing module, and the audio mixing module is used to collect the audio signal based on the audio signal received by the audio signal acquisition module.
  • the sound mixing module is further configured to receive a second audio as feedback based on the first audio signal and the live broadcast application The signal is mixed to generate a third audio signal
  • the channel control module is further configured to send the third audio signal to the wireless audio interface for transmission through wireless communication, so as to be associated with the wireless headset of the electronic device to monitor.
  • the access control module when the electronic device starts the live broadcast application, the access control module enables the electronic device to enable the wireless audio interface and the wired audio interface at the same time, and the access control module starts the mixing module for mixing and establishing a first path, a second path, a third path, and a fourth path, wherein the first path combines the first audio signal from the wired audio interface
  • the second channel sends the first audio signal from the wired audio interface to the sound mixing module
  • the third channel The live application sends the second audio signal received through the Internet to the mixing module, and the mixing module mixes the first audio signal and the second audio signal to obtain the first audio signal.
  • the fourth channel sends the third audio signal from the sound mixing module to the wireless audio interface.
  • the access control module when the application layer starts the live application, the access control module enables the electronic device to enable a wireless audio interface and a wired audio interface at the same time, and the access control module The mixing module is activated for mixing, and a channel, a channel b, a channel c, a channel d, and a channel e are established at the same time,
  • the a channel sends the accompaniment sound signal from the wired audio interface to the mixing module
  • the b channel sends the main broadcast audio signal from the wireless audio interface to the mixing module
  • the c channel sends the first audio signal obtained by mixing the sound mixing module based on the accompaniment sound signal and the main broadcast sound signal from the sound mixing module to the live broadcast application for publishing through the Internet,
  • the channel d sends the second audio signal received through the wireless network to the mixing module by the live broadcast application
  • the e-path sends the third audio signal obtained by mixing the first audio signal and the second audio signal by the audio mixing module to the wireless audio interface, so as to be sent to the wireless audio interface through a wireless network Wireless Headphones.
  • FIG. 1 is an application scenario diagram of an audio processing method provided according to an embodiment of the present application.
  • Fig. 2 (a) is a schematic diagram of a live audio processing system according to a prior art
  • FIG. 2(b) is a schematic diagram of a live audio processing system according to another prior art
  • Fig. 3 is the architecture diagram of the live audio processing system
  • FIG. 4 is a schematic structural diagram of an electronic device provided according to an embodiment of the present application.
  • FIG. 5 is a block diagram of a software structure of an electronic device provided according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an audio processing system provided according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an implementation manner in which a live mobile phone simultaneously enables multiple earphones (audio interfaces) according to some embodiments of the present application;
  • FIG. 8(a) is a schematic flowchart of an audio processing method using the audio processing system of FIG. 6;
  • FIG. 8(b) is another schematic flowchart of the audio processing method using the audio processing system of FIG. 6;
  • Fig. 9 is the architecture diagram of the host side in the audio processing system according to Fig. 6;
  • FIG. 10 is a system architecture diagram of a live mobile phone according to the audio processing system of FIG. 6;
  • FIG. 11 is a schematic diagram of an audio processing system provided according to another embodiment of the present application.
  • Fig. 12(a) is a schematic flowchart of an audio processing method applying the audio processing system of Fig. 11;
  • Fig. 12(b) is another schematic flowchart of the audio processing method applying the audio processing system of Fig. 11;
  • Fig. 13 is the architecture diagram of the anchor side in Fig. 11;
  • FIG. 14 is a system architecture diagram of the live mobile phone in FIG. 13 .
  • FIG. 1 is an exemplary application scenario of the audio processing method and system according to the present application.
  • Figure 1 shows that the host (ie, the performer) conducts a live broadcast on the Internet through a specific live broadcast application, such as Kuaishou TM, Douyin TM, etc., through the host device (electronic device such as a mobile phone or a tablet computer).
  • the host device electronic device such as a mobile phone or a tablet computer
  • the audio of the live broadcaster such as singing songs, recitations, or product introductions, is also released.
  • the audience watches the live broadcast through the live broadcast application through the audience device (electronic device such as a mobile phone, a tablet computer, etc.).
  • the audience can give corresponding feedback through the audience-side device.
  • you can also interact with live broadcasters by voice. That is, the viewer-side device can receive the audio sent by the live-streaming-side device, and can also send feedback audio in response to the audio to the live-streaming-side device, so that the live broadcaster can better interact with the audience.
  • the anchor releases the video and audio of the singing song through his mobile phone.
  • you can also interact with the live broadcaster through mobile phones and other devices, for example, commenting or chorus.
  • the equipment on the live broadcast side includes a live broadcast mobile phone 1001, a sound card 1002, an accompaniment mobile phone 1003, a microphone 1004, and a wired headset 1005a for live broadcast Wait.
  • the live broadcast mobile phone 1001 runs a live broadcast application, and interacts with the audience through the live broadcast mobile phone 1001 .
  • Sound card 1002 includes multiple input and output interfaces.
  • the live broadcast mobile phone 1001 is connected to one of the output interfaces of the sound card 1002 in a wired manner, and the sound card 1002 outputs audio to the live broadcast mobile phone 1001 .
  • the accompaniment mobile phone 1003, the microphone 1004, and the wired earphone 1005a are respectively connected to the sound card 1002 by wired means, so as to output the accompaniment sound and the main broadcast sound collected by the microphone 1004 to the sound card 1002 for mixing, and mix the mixed audio on the one hand. It is transmitted to the audience through the live broadcast mobile phone 1001, and on the one hand, is provided to the anchor through the wired headset 1005a for listening.
  • the above solution can realize both human voice (for example, the singing voice of the host) and accompaniment (accompaniment music for singing songs), and can also obtain the audience voice (that is, the interactive voice of the audience), and realize sound collection, mixing, listening and monitoring. function. But at the same time, it led to the problems of too many devices, complicated operations and too cluttered connections, which affected the activities of the hosts. In addition, in order to make the live broadcast have a good sound effect, the host generally needs to purchase professional microphone 1004, sound card 1002, wired earphone 1005a, accompaniment mobile phone 1003 and other equipment, which are bulky. Case.
  • a live broadcast solution using the Bluetooth headset 1005b is also proposed.
  • three mobile phones are used in this method, one of which is used as a live broadcast mobile phone 1001 , and the other two are used as accompaniment mobile phones 1003 for providing accompaniment sounds.
  • One of the accompaniment mobile phones 1003A is connected to the Bluetooth headset 1005b to provide the accompaniment sound to the host through Bluetooth communication
  • the other accompaniment mobile phone 1003B is connected to the sound card 1002 to mix the accompaniment sound with the host sound collected by the microphone 1004 through the sound card 1002 sound.
  • the principle of live broadcast is to push the audio and video recorded by the host to the server, and then the server will distribute it to the audience to watch. Viewers can participate and interact with the host while watching.
  • the process mainly includes: audio and video acquisition, data processing, audio and video encoding, streaming, data distribution, streaming, audio and video decoding, audio and video playback, and interaction.
  • the system architecture mainly includes a collection end 1 (ie, the host end), a streaming media server 2 , and a playback end 3 (ie, the viewer end).
  • the acquisition terminal 1 realizes audio and video collection, data processing, audio and video encoding, and performs audio and video packaging and streaming
  • the streaming media server 2 realizes data distribution, transcoding, pornography, and screenshots, etc.; Realize streaming, audio and video decoding, audio and video playback.
  • the reason why the live broadcast is popular mainly lies in the participation of the audience, thereby generating interaction with the anchor.
  • the playback terminal 3 in addition to playing audio and video, the playback terminal 3 also collects the audience's feedback (which may include audio and video), encodes and encapsulates it, and pushes the stream to the streaming media server 2, where the streaming media The server 2 realizes data distribution, and then feeds it back to the player 1, so that the host can listen to it, thereby realizing interaction.
  • the host usually needs to monitor the audio content of the live broadcast and the feedback audio content of the audience through headphones.
  • This application is an audio processing solution proposed for the acquisition terminal 1, in order to make it easier for the anchor to monitor.
  • the audio processing scheme of the present application realizes monitoring by simultaneously enabling the wireless audio interface and the wired audio interface of the live mobile phone 1001, and utilizing the wireless earphone to receive the audio frequency sent by the wireless audio interface.
  • the accompaniment sound and the main voice sound are mixed and input into the live mobile phone in a wired manner, and the live broadcast mobile phone is used for publishing through the Internet.
  • the feedback audio is sent to the server to be transmitted to the application in the live mobile phone.
  • the live mobile phone mixes the accompaniment sound, the anchor sound and the audience feedback sound and sends it to the wireless headset through wireless communication, providing a monitoring function for the wireless headset that wirelessly communicates with the live broadcast mobile phone. That is to say, the monitoring function is realized through wireless headphones.
  • the accompaniment sound is input into the live broadcast mobile phone in a wired manner, and the wireless headset wirelessly communicates with the live broadcast mobile phone to pick up the main broadcast sound, and the live broadcast mobile phone is used to mix the accompaniment sound and the human voice and publish it through the Internet so that the audience can
  • the mobile phone listens and gives feedback, and the live mobile phone mixes the accompaniment sound, vocals and audience feedback sound and sends it to the wireless headset to realize the monitoring function of the wireless headset. That is to say, the functions of sound pickup and monitoring are realized at the same time through wireless headphones.
  • FIG. 4 shows a schematic structural diagram of an electronic device 100 according to some embodiments of the present application.
  • the terminal device of the collection side 1 such as the live broadcast mobile phone mentioned in the above application scenarios of this application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) connector 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 1004170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the processor 110 may generate an operation control signal according to the instruction operation code and the timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system. For example, taking the live broadcast mobile phone according to a solution of the present invention as an example, after the live broadcast mobile phone starts the application, it runs an instruction to establish multiple audio channels, perform corresponding sound mixing, and send each audio signal to each corresponding device.
  • the live broadcast mobile phone receives the audio signal for distribution sent by the audio processor (the audio signal for distribution is mixed with the accompaniment sound signal and the main broadcast audio signal), and sends the audio signal for distribution to the audience's mobile phone through the Internet;
  • the feedback audio signal is mixed, and the audio signal for release and the audio signal for feedback are mixed to generate the audio signal for monitoring; and the audio signal for monitoring is sent to the Bluetooth headset through Bluetooth communication.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may contain multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering a call through the Bluetooth headset 1005b.
  • the audio signal for distribution (a signal formed by mixing the accompaniment sound and the main sound) obtained by the audio module 170 is transmitted to the remote audience for sharing, and the audio signal for monitoring (that is, the audio signal composed of the accompaniment sound
  • the audio signal for monitoring that is, the audio signal composed of the accompaniment sound
  • the audio signal after mixing the audio, the audience audio, and the host audio is transmitted to the Bluetooth headset 1005b to realize listening.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering a call through the Bluetooth headset 1005b. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset 1005b.
  • the internal memory 121 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions.
  • the internal memory 121 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as one or more hard disk drives (Hard-Disk Drive, HDD(s)), a or multiple Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives.
  • the memory 121 which is a computer-readable storage medium, stores instructions.
  • the processor 110 executes the audio processing method according to the embodiment of the present application. For details, refer to the above-mentioned implementation. The method of the example will not be repeated here.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • Electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 as an example.
  • FIG. 5 is a block diagram of a software structure of an electronic device 100 according to an embodiment of the present invention.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, a system library of Android runtime (Android runtime), and a kernel layer.
  • the application layer can include a series of application packages.
  • the live APP that starts the audio processing method of the present application is located in the application layer.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • its application framework layer may include a window manager, a content provider, a view system, a phone manager, and a resource management controller, channel control module, and mixing module, etc.
  • the channel control module is used to configure the audio channel and establish the audio channel to transmit the audio signal.
  • the mixing module is used to mix audio signals.
  • Each audio channel is essentially data transfer between buffers. That is, at the beginning of the audio path is a data buffer and at the end is another buffer. Under the control of the access control module, the data is moved from the buffer at the beginning to the buffer at the terminal. Logically, there is a path.
  • the access control module is a software module implemented via the CPU running specific code.
  • the access control module reads the data in the input buffer and stores it in the output buffer after processing, so as to achieve the processing of the data.
  • the above code exists in the form of a "library”, and the library encapsulates a plurality of functions for realizing the functions of access control. Calling these functions, you can create input and output buffers, start data movement and copying, and also include the function of undoing the buffer release "path". For example, calling the interface function of the "library” can realize reading data, processing and output, that is, the establishment of audio channels.
  • the mixing module is also a software module implemented by running a specific code via the CPU.
  • the essence of the mixing module is the summation of the input signal. Mixing by the mixing module can be performed by any mixing algorithm among the linear method, the fixed weight method (with the signal amplitude), and the dynamic weight method (parameter changes change the weight of the summation).
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the Bluetooth headset transmits audio signals with a live mobile phone through Bluetooth communication, and the above two solutions of the present application will be described in more detail.
  • the audio processing system for implementing the audio processing method of this embodiment includes: a live broadcast mobile phone 1001 , an accompaniment mobile phone 1003 , a sound card 1002 , a microphone 1004 and a Bluetooth headset 1005b.
  • FIG. 6 also shows a viewer mobile phone 1006 for playing the audio released by the live broadcast mobile phone 1001 .
  • the live broadcast mobile phone 1005 is an example of an electronic device
  • the sound card 1002 is an example of an audio processor
  • the accompaniment mobile phone 1003 is an example of an accompaniment sound providing device
  • the microphone 1004 is an example of a sound acquisition device
  • the sound card 1002 is an example of an audio processor instance of the device.
  • the sound card 1002 is an external sound card, which is used to transmit the audio signals from the microphone 1004 and the accompaniment mobile phone 1003 to the live broadcast mobile phone 1001 after corresponding processing. That is to say, the sound card 1002 has a plurality of audio input interfaces, so as to input audio signals from devices such as the microphone 1004 and the accompaniment mobile phone 1003 respectively, and the sound card 1002 performs corresponding processing. In addition, the sound card 1002 also has an audio output interface, so as to output the processed audio signal to the live broadcast mobile phone 1001 through the audio output interface.
  • the sound card 1002 can perform bel canto processing, such as electronic sound processing, sound mixing processing, voice changing processing, and the like.
  • the live sound card 1002 can also use its own interesting sound effects such as applause, kisses, contempt, laughter, etc. to make the live broadcast less monotonous.
  • the sound card 1002 may also have a noise reduction function, so that the background music is suddenly lowered when the anchors are speaking, and the volume is restored when the anchors finish speaking.
  • common sound cards on the market include ICKBTM SO8 sound card, Kesuosi TMFX5 sound card, Desheng TMMX1 live sound card, Senranboba TM sound card, Singba TMK10 sound card, etc.
  • the accompaniment mobile phone 1003 and the microphone 1004 are connected to the audio input interface of the sound card 1002 through a cable.
  • the accompaniment mobile phone 1003 is used to input the accompaniment sound of the sung song into the sound card 1002
  • the microphone 1004 is used to collect the voice sung by the host and input it into the sound card 1002 .
  • the sound card 1002 mixes the audio of the accompaniment sound and the audio of the singing sound, and performs corresponding processing
  • the sound card 1002 outputs it to the live broadcast mobile phone 1001 in a wired communication manner through the audio output interface.
  • the live broadcast mobile phone 1001 and the sound card 1002 are connected in a wired manner.
  • the live broadcast mobile phone 1001 publishes the mixed sound of the accompaniment sound sent by the sound card 1002 and the host singing sound through the live broadcast application running on it, and pushes the stream to the streaming media server, and the audience mobile phone 1006 can pull the stream from the streaming media server. play.
  • the live broadcast mobile phone 1001 also receives the feedback audio from the audience side through the Internet through the live broadcast application, and mixes the feedback audio with the accompaniment sound and human voice and sends it to the Bluetooth headset 1005b in the form of Bluetooth communication. Through the Bluetooth headset 1005b, the live broadcaster can monitor the accompaniment sound, the voice of his own singing, and the feedback audio from the audience's mobile phone 1006.
  • An audio signal for distribution formed after mixing the main broadcast sound signal collected by the microphone 1004 and the accompaniment sound signal provided by the accompaniment mobile phone 1003 is used as an example of the first audio signal.
  • the so-called audio signal for distribution means that the audio signal is distributed by live broadcasting through a live broadcasting application.
  • An example of the second audio signal is a feedback audio signal embodying a feedback sound on the viewer side.
  • the live broadcast mobile phone 1001 uses the received feedback audio signal, the audio signal for monitoring obtained by mixing the accompaniment sound and the human voice as an example of the third audio signal.
  • the so-called audio signal for monitoring means that the audio signal is used to transmit to the host to monitor the live content.
  • the live cell phone 1001 is used to launch the application to publish audio to the viewer cell phone 1006 through the application, and to receive feedback from the viewer cell phone 1006 .
  • Both the accompaniment mobile phone 1003 and the microphone 1004 are wired to the sound card 1002 , and the sound card 1002 is wired to the live broadcast mobile phone 1001 .
  • the sound card 1002 is used to mix the voice sung by the host collected by the microphone 1004 and the accompaniment sound provided by the accompaniment mobile phone 1003 to form a publishing audio signal, and send the publishing audio signal to the live broadcast mobile phone 1001 through a wired connection.
  • the viewer's mobile phone 1006 is connected to the live broadcast mobile phone 1001 through the Internet, so as to receive the audio signal for publishing issued by the live broadcast mobile phone 1001, and send the audio signal reflecting the feedback to the live broadcast mobile phone 1001.
  • the Bluetooth headset 1005b is connected with the live broadcast mobile phone 1001 through Bluetooth communication, and the live broadcast mobile phone 1001 is also used to mix the audio signal for release and the feedback audio signal, generate the audio signal for monitoring, and send the audio signal for monitoring to the Bluetooth headset 1005b for use.
  • the anchor is listening.
  • the live broadcast mobile phone 1001 is connected to the sound card 1002 by wire (ie, the audio output interface of the sound card 1002 is wired to be connected) to input the audio signal for distribution, and at the same time, the live broadcast mobile phone 1001 is connected to the Bluetooth headset 1005b through Bluetooth communication to output monitoring Using the audio signal, that is, the above-mentioned live mobile phone 1001 enables the functions of the analog earphone and the Bluetooth earphone 1005b at the same time.
  • multiple audio channels are established between the live broadcast mobile phone 1001, the sound card 1002, the Bluetooth headset 1005b and the audience mobile phone 1006, and the audio signal for release from the sound card 1002 and the feedback audio signal from the audience mobile phone 1006 are mixed. , generates an audio signal for monitoring and provides it to the Bluetooth headset 1005b through Bluetooth communication, thereby realizing the monitoring function of the Bluetooth headset.
  • a mobile phone allows to enable both analog headset and Bluetooth headset 1005b, which can be achieved by software.
  • the audiopolicy (audio control) configuration in the Android framework can be modified to set the input and output devices to include the IOProfile (user profile input and output) of two headphones.
  • the audio configuration file audio_policy.conf
  • multiple audio (audio) interfaces are defined at the same time, and each audio interface contains several outputs (outputs) and inputs (inputs), and each audio interface contains several outputs (outputs) and inputs (inputs).
  • the output and input support multiple IOprofiles at the same time, and each IOProfile supports several devices. Therefore, the two headphones are used as two devices and can be configured to the corresponding IOprofile. That is, by modifying the configuration file, multiple audio interfaces can be enabled at the same time, and on this basis, the audio processing method of the present application is implemented.
  • the audio processing method for the audio processing system of the above-mentioned embodiment includes:
  • step S110 first, the live broadcast mobile phone 1001 starts the live broadcast application.
  • the live broadcast mobile phone 1001 can release audio to the audience mobile phone 1006 watching the live broadcast application via the streaming media server.
  • the live broadcast mobile phone 1001 can also receive feedback audio from the audience obtained via the streaming media server.
  • the live broadcast application may be software applied to live broadcast, such as Douyin TM, Kuaishou TM, and the like.
  • the host opens the live broadcast application through the live broadcast mobile phone 1001, and sends the audio to the audience mobile phone 1006 through the Internet through the streaming media server through the live broadcast application.
  • the audio that is, the audience audio signal, is sent to the live broadcast mobile phone 1001 via the Internet through the streaming media server.
  • the host receives feedback audio from the audience through the live broadcast mobile phone 1001, such as the user's comment, the user's chorus, and the like.
  • the live broadcast mobile phone 1001 starts the listening function of the Bluetooth headset 1005b. Thereafter, steps S120-S160 in the audio processing method as shown in FIG. 8(a) are performed.
  • step S120 after the listening function of the Bluetooth headset 1005b is activated, the live broadcast mobile phone 1001 receives the distribution audio signal sent by the sound card 1002 via wired communication, and the distribution audio signal is mixed with the accompaniment sound signal and the main broadcast audio signal. More specifically, referring to the above description of FIG. 6 , the sound card 1002 is respectively connected to the accompaniment mobile phone 1003 and the microphone 1004 through the wired audio input interface, and is connected to the live broadcast mobile phone 1001 through the wired audio output interface.
  • the accompaniment phone 1003 transmits the accompaniment sound signal to the sound card 1002 through wired communication
  • the microphone 1004 transmits the main broadcast sound signal to the sound card 1002 through wired communication
  • the sound card 1002 mixes the accompaniment sound signal and the main broadcast sound signal to obtain the audio signal for distribution.
  • the sound card 1002 sends the audio signal for distribution to the live broadcast mobile phone 1001 through wired communication.
  • step S130 the live broadcast mobile phone 1001 publishes the audio signal for publishing through the Internet through the application.
  • the live broadcast application sends an audio signal for publishing, which is obtained by mixing the accompaniment sound signal and the main broadcast sound signal, to a server associated with the application through the Internet, so as to be implemented by the viewer's mobile phone 1006 .
  • step S140 the live broadcast mobile phone 1001 receives the audience audio from the audience mobile phone 1006 through the Internet through the live broadcast application. That is, the audience mobile phone 1006 receives the audio signal for distribution, and sends the feedback audio signal of the audio signal for distribution, that is, the audience voice (ie, the second audio signal), to the live broadcast mobile phone 1001 via the Internet.
  • the audience mobile phone 1006 receives the audio signal for distribution, and sends the feedback audio signal of the audio signal for distribution, that is, the audience voice (ie, the second audio signal), to the live broadcast mobile phone 1001 via the Internet.
  • step S150 the live broadcast mobile phone 1001 mixes the audio signal for release and the audio signal for feedback to generate an audio signal for monitoring.
  • the sound mixing method is performed through the sound mixing module in the application framework layer of the live mobile phone 1001 . It is not repeated here.
  • step S160 the live mobile phone 1001 sends the audio signal for monitoring to the Bluetooth headset 1005b through the Bluetooth communication network.
  • the live broadcast mobile phone 1001 starts the live broadcast application.
  • the live broadcast mobile phone 1001 allows the wired headset (analog headset) and the Bluetooth headset 1005b to be enabled at the same time by modifying the default audio channel configuration of the operating system. And confirm whether the analog headset is inserted by detecting the wired audio interface, and confirm whether the Bluetooth headset 1005b is paired successfully by detecting whether the Bluetooth is paired. In the case of confirming that the analog headset is inserted and the Bluetooth headset 1005b is successfully paired, the live broadcast mobile phone 1001 enters the listening mode.
  • step S11 the live broadcast mobile phone 1001 obtains the audio for the audience formed by mixing the accompaniment sound and the main broadcast sound through the sound card 1002.
  • step S12 the live broadcast mobile phone 1001 transmits the audience audio to the audience mobile phone 1006 through an application.
  • step S13 the live broadcast mobile phone 1001 receives the feedback audio from the audience mobile phone 1006 .
  • step S14 the live mobile phone 1001 sends the audio for monitoring, which is formed by mixing the audience audio and the feedback audio, to the Bluetooth headset 1005b through Bluetooth communication.
  • the sound card 1002 is wired to the analog headphone jack of the live broadcast mobile phone 1001 through its audio output interface, which can ensure that the audio signal from the sound card 1002 is not disturbed by the outside world, and the signal loss during the communication process is relatively small, so that the sound quality can be improved.
  • the live broadcast mobile phone 1001 and the audience mobile phone 1006 are connected through the Internet, for example, the connection can be made through a network server, which enables the anchor to target a wider audience, and the audience can listen and watch more conveniently.
  • the Bluetooth headset 1005b and the live broadcast mobile phone 1001 are connected through Bluetooth communication, which completely removes the connection, does not affect user activities, and improves portability.
  • the live broadcast mobile phone 1001 inputs the audio signal for publishing from the sound card 1002 (the accompaniment sound signal and the main broadcast sound are mixed) and the feedback audio signal from the audience mobile phone 1006 (audience audio signal reflecting the feedback).
  • the output of the sound card 1002 is an audio signal for distribution.
  • the input of the audience cell phone 1006 is the audio signal for distribution, and the output is the audio signal for feedback.
  • the Bluetooth earphone 1005b is used for listening, and the input is the audio signal for monitoring (the audio signal for publishing and the audio signal for feedback are mixed).
  • the system for implementing the audio processing method according to the present application shown in FIG. 9 includes each electronic device shown in FIG. 5 . Furthermore, FIG. 9 also shows various hardware modules included in the live broadcast mobile phone 1001 and related to running the solution of this embodiment.
  • Application processor AP
  • communication module cellular or WIFI communication module
  • Bluetooth module cellular or WIFI communication module
  • codec Codec
  • display module etc.
  • the AP is used for program processing, that is, the program runs in the AP to implement corresponding functions, such as sound mixing, communication within the device, and so on.
  • the communication module is used to communicate between different devices to realize audio signal transmission.
  • the accompaniment mobile phone 1003 plays music or background sound as the accompaniment to provide the accompaniment audio
  • the microphone 1004 collects the host's vocals to provide the host audio
  • the microphone 1004 and the accompaniment mobile phone 1003 convert each audio signal Output to sound card 1002 for mixing into an audio signal for distribution.
  • the sound card 1002 sends the audio signal for release to the live broadcast mobile phone 1001 .
  • the audio signal for publishing may be sent to the live broadcast mobile phone 1001 in the form of an analog signal, or may be sent to the live broadcast mobile phone 1001 in the form of a digital signal. When the analog signal is sent, the live broadcast mobile phone 1001 converts it into a digital audio signal (corresponding to signal A in FIG.
  • the viewer's cell phone 1006 sends the viewer's feedback audio (corresponding to the far-end sound signal C in FIG. 9 ) according to the viewer's audio to the live cell phone 1001 .
  • the live mobile phone mixes the audience audio A and the feedback audio signal C to generate the audio signal for monitoring (corresponding to the signal B in FIG. 9 ), and sends the audio signal for monitoring to the Bluetooth headset 1005b through the Bluetooth module.
  • the live mobile phone 1001 connects to the sound card 1002 through wired communication and receives the audio signal (A signal in the figure) for publishing sent by the sound card 1002 via wired communication, which corresponds to step S120 in FIG. 8( a ).
  • the live broadcast mobile phone 1001 receives the accompaniment and the main broadcast sound from the sound card 1002 .
  • the audio signal for distribution generated by the sound card 1002 is a digital signal.
  • the sound card 1002 is connected to the live broadcast mobile phone 1001 by means of digital input.
  • the sound card 1002 can provide digital audio signals, and is equipped with a USB connector. Insert the USB connector into the USB port of the live broadcast mobile phone 1001, and the live broadcast mobile phone 1001 receives the audio signal for distribution provided by the sound card 1002 through the USB data path, and sends the audio signal for distribution. passed to the AP.
  • the audio signal for distribution generated by the sound card 1002 is an analog signal.
  • the sound card 1002 is connected to the live broadcast mobile phone 1001 through an analog input method.
  • the analog audio signal is converted into a digital audio signal through a codec (Codec) at the bottom of the system of the live broadcast mobile phone 1001 , and the live broadcast mobile phone 1001 sends the digital audio signal to the audience mobile phone 1006 .
  • Codec codec
  • the sound card 1002 is inserted into the 3.5mm earphone socket of the live mobile phone 1001 through a 3.5mm headphone head (wired headset, also called an analog headset), and the live mobile phone 1001 transmits the audio signal for publishing the analog signal from the sound card 1002 to the Codec, After being sampled by the ADC inside Codec, it is converted into a digital audio signal for publishing, and the audio signal for publishing (signal A in the figure) passes through a bus (such as I2S, an audio bus built into an integrated circuit, or slimbus, a serial low-power chip internal media). bus, etc.) input to the AP.
  • a bus such as I2S, an audio bus built into an integrated circuit, or slimbus, a serial low-power chip internal media. bus, etc.
  • the AP sends the audio signal for publishing to the viewer's mobile phone 1006 through the communication module, which corresponds to step S130 in FIG. 8( a ).
  • the live broadcast mobile phone 1001 transmits the audio signal for publishing to the communication module through the AP, so that the audio signal for publishing is sent to the streaming media server through the communication module, so that the audience mobile phone 1006 can play the audio signal. In this way, it is realized that the live broadcast mobile phone 1001 publishes the audio signal for publishing through the live broadcast application.
  • the live broadcast mobile phone 1001 receives the feedback audio signal (the C signal in FIG. 9 ) as feedback from the viewer, which corresponds to step S140 in FIG. 8( a ).
  • the feedback audio signal from the viewer's mobile phone 1006 is received by the cellular or network communication module through the Internet, and the AP receives the feedback audio signal from the cellular or network communication module.
  • the live mobile phone 1001 receives the feedback audio signal from the audience.
  • the live mobile phone 1001 mixes the announcement audio signal and the feedback audio signal to generate the monitoring audio signal (B signal in the figure), which corresponds to step S150 in FIG. 8( a ).
  • the live broadcast mobile phone 1001 mixes the release audio signal from the sound card 1002 and the feedback audio signal from the audience mobile phone 1006 through the AP to generate the monitoring audio signal.
  • the live mobile phone 1001 connects the bluetooth headset 1005b through bluetooth communication and sends the audio signal for monitoring to the bluetooth headset 1005b through the bluetooth communication, so as to realize the listening of the bluetooth headset 1005b, which is corresponding to step S160 in Fig. 8(a).
  • the live mobile phone 1001 sends the monitoring audio signal to the Bluetooth headset 1005b through the Bluetooth module through the AP, so that the anchor can hear the monitoring audio signal through the Bluetooth headset 1005b.
  • the operating system of the live mobile phone 1001 includes an application layer, a framework layer, and a hardware access layer.
  • the application live application
  • the access control module and the mixing module are set at the framework layer
  • the hardware is accessed through the hardware access layer.
  • the hardware on the host side includes a sound card 1002 , a live broadcast mobile phone 1001 , and a Bluetooth headset 1005b , and the hardware on the viewer side includes a viewer mobile phone 1006 .
  • the operating system After the live broadcast application is started, the operating system notifies the access control module of the start information, and the access control module determines whether the connection between the Bluetooth headset 1005b and the analog headset is successful.
  • the channel control module activates the mixing module for mixing, and simultaneously establishes the first channel, the second channel, the third channel, and the fourth channel.
  • the first channel sends the audio signal for release from the sound card 1002 to the live broadcast application for release, so that the viewer's mobile phone 1006 can play through it.
  • the second channel sends the audio signal for distribution from the sound card 1002 to the mixing module, and the mixing module processes it at the frame layer.
  • the live broadcast application sends the feedback audio signal from the viewer's mobile phone 1006 received through the Internet to the mixing module, and the mixing module processes it at the framework layer.
  • the mixing module mixes the release audio signal and the feedback audio signal to obtain an audio signal for monitoring.
  • the fourth channel sends the audio signal for monitoring obtained by mixing the audio signal for distribution and the audio signal for feedback to the Bluetooth headset 1005b through the Bluetooth network. Thereby, the listening function of the Bluetooth headset 1005b on the host side can be realized.
  • the Bluetooth headset 1005b not only enables its listening function, but also enables its sound pickup function, that is, after the main broadcast sound is picked up by the Bluetooth headset 1005b, it is sent to the live broadcast mobile phone 1001 via Bluetooth communication. for mixing.
  • the audio processing system implementing the audio processing method of the embodiment includes: a live broadcast mobile phone 1001 , an accompaniment mobile phone 1003 , a viewer mobile phone 1006 , and a Bluetooth headset 1005b .
  • the live broadcast mobile phone 1001 is an example of an electronic device
  • the accompaniment mobile phone 1003 is an example of an accompaniment sound providing device.
  • the live cell phone 1001 is used to launch the application to publish audio to the viewer cell phone 1006 through the application, and to receive feedback from the viewer cell phone 1006 .
  • the live broadcast mobile phone 1001 and the accompaniment mobile phone 1003 are wired to receive the accompaniment sound signal sent by the accompaniment mobile phone 1003 via wired communication.
  • the live broadcast mobile phone 1001 and the Bluetooth headset 1005b are connected by Bluetooth communication, and the host audio signal is received through the Bluetooth headset 1005b.
  • the live broadcast mobile phone 1001 mixes the accompaniment sound signal and the main broadcast sound signal, obtains the audio signal for distribution, and sends the audio signal for distribution to the audience mobile phone 1006 connected to the live broadcast mobile phone 1001 through the wireless network, and receives the audio signal from the audience mobile phone 1006.
  • the feedback audio signal (ie, the audience audio signal) that embodies the feedback audio.
  • the live broadcast mobile phone 1001 will mix the release audio signal and the feedback audio signal, generate the audio signal for monitoring, and send the audio signal for monitoring to the Bluetooth headset 1005b through the Bluetooth communication network to realize the monitoring function.
  • the Bluetooth headset 1005b not only has a listening function, but also has a sound pickup function.
  • the host audio signal is sent to the live broadcast mobile phone 1001 through Bluetooth communication, and the live broadcast mobile phone performs mixing based on the host audio signal and publishes it to the audience side.
  • the audio channel configuration needs to be modified to allow the wired headset and the Bluetooth headset 1005b to be enabled at the same time.
  • the audio processing method implemented by the above-mentioned audio processing system includes:
  • step S210 the live broadcast mobile phone 1001 starts the live broadcast application.
  • the live broadcast mobile phone 1001 starts the live broadcast application.
  • Step S220 the live broadcast mobile phone 1001 receives the accompaniment sound signal sent by the accompaniment mobile phone 1003 via wired communication.
  • the accompaniment mobile phone 1003 is connected to the live broadcast mobile phone 1001 through a wired audio interface, so that the live broadcast mobile phone 1001 receives the accompaniment sound signal sent by the accompaniment mobile phone 1003 via wired communication.
  • Step S230 the live broadcast mobile phone 1001 receives the main broadcast audio signal from the Bluetooth headset 1005b through wireless communication.
  • the Bluetooth headset 1005b receives the host audio signal, and sends the host audio signal to the live broadcast mobile phone 1001 via Bluetooth communication.
  • the main broadcast audio signal is picked up by the Bluetooth headset 1005b and transmitted to the live broadcast mobile phone 1001 by Bluetooth communication. Therefore, compared with the above-mentioned embodiment, there is no need to specially equip the sound card 1002 to mix the accompaniment sound signal and the main broadcast sound signal directly through the mixing module in the live broadcast mobile phone 1001 . This further simplifies equipment and wiring.
  • the Bluetooth headset 1005b when the Bluetooth headset 1005b is used for sound acquisition, since it is worn on the outer ear, the position is basically fixed, so it is very close to the human mouth, and the relative position of the two is also basically fixed, which shows that the acquisition with TWS The unique superiority of human voice.
  • the use of the Bluetooth headset 1005b to pick up sound can be implemented by existing methods, such as picking up sound through a microphone hidden in the Bluetooth headset 1005b, and the detailed description thereof is omitted here.
  • Step S240 the live broadcast mobile phone 1001 performs mixing processing based on the accompaniment sound signal and the main broadcast sound signal to obtain an audio signal for distribution.
  • step S250 the live broadcast mobile phone 1001 publishes the audio signal for publishing via the application. That is, the live broadcast mobile phone 1001 publishes the live broadcast audio signal through the live broadcast application to push the stream to the streaming media server, and the Guangzhou mobile phone 1006 pulls the stream from the streaming media server through the Internet to play the broadcast audio signal.
  • Step S260 the live broadcast mobile phone 1001 receives the feedback audio signal through the live broadcast application.
  • the audience mobile phone 1006 receives the audio signal for distribution via the application, and sends the feedback audio signal to the live broadcast mobile phone 1001 via the wireless network.
  • Step S270 the live broadcast mobile phone 1001 mixes the audio signal for publishing and the audio signal for feedback to generate an audio signal for monitoring.
  • step S280 the live mobile phone 1001 sends the audio signal for monitoring to the Bluetooth headset 1005b through Bluetooth communication.
  • Step S20 First, the live broadcast mobile phone 1001 starts the application. Then, the live broadcast mobile phone 1001 obtains the accompaniment sound through the accompaniment mobile phone 1003 and picks up the main broadcast sound through the Bluetooth headset 1005b. Step S21: After that, the mobile phone 1001 broadcasts the audio signal for distribution formed by mixing the two audios. Step S22: The live broadcast mobile phone 1001 transmits the audio signal for distribution to the viewer mobile phone 1006 through the application. Step S23 : and receive the feedback audio signal from the viewer's mobile phone 1006 . Step S24: After that, the live mobile phone sends the audio signal for monitoring, which is formed by mixing the audio signal for publishing and the audio signal for feedback, to the Bluetooth headset 1005b through Bluetooth communication.
  • the mixing module performs two mixing, that is, mixes the accompaniment sound signal and the main broadcast sound signal, obtains the audio signal for distribution, and provides it to the audience's mobile phone 1006. In addition, it also mixes the audio signal for distribution. It is further mixed with the audience audio signal from the audience mobile phone 1006, and is provided to the anchor for monitoring through the Bluetooth module.
  • the Bluetooth headset 1005b is used for listening and picking up sound, the input is an audio signal for monitoring, and the output is the main broadcast sound. Because the Bluetooth headset 1005b is worn on the outer ear, the position is basically fixed, so it is very close to the human mouth, and the relative position of the two is also basically fixed, which shows the unique advantages of using the Bluetooth headset 1005b to collect human voices. sex. Therefore, the host can carry out live broadcast by carrying fewer electronic devices, and the host only needs to carry the live broadcast mobile phone 1001, the accompaniment sound providing device and the Bluetooth headset 1005b to carry out the live broadcast, and only the accompaniment mobile phone 1003 and the live broadcast mobile phone 1001 are wired to connect.
  • the system shown in FIG. 13 for implementing the audio processing method according to the present application includes various pieces of hardware as shown in FIG. 11 . Further, FIG. 13 shows that the live broadcast mobile phone 1001 includes an application processor (AP), a communication module (cellular or WIFI communication module), a Bluetooth module, a codec (Codec), a display module, and the like.
  • AP application processor
  • a communication module cellular or WIFI communication module
  • a Bluetooth module a codec (Codec)
  • display module and the like.
  • the accompaniment mobile phone 1003 plays music or background sound to provide the accompaniment audio
  • the Bluetooth headset 1005b collects the host's vocals to provide the host audio
  • the Bluetooth headset 1005b and the accompaniment mobile phone 1003 output the audio to the live broadcast mobile phone 1001 to mix into audience audio.
  • the live mobile phone 1001 sends the audience audio to the audience.
  • the viewer's cell phone 1006 sends the viewer's feedback audio according to the viewer's audio to the host cell phone.
  • the host mobile phone mixes the audience audio and the feedback audio to generate the monitoring audio, and sends the monitoring audio to the Bluetooth headset 1005b.
  • the live broadcast mobile phone 1001 is connected to the accompaniment mobile phone 1003 through wired communication and receives the accompaniment sound signal (A signal in the figure) sent by the accompaniment mobile phone 1003 via wired communication, which corresponds to step S220 in FIG. 12( a ).
  • the accompaniment sound signal received by the live broadcast mobile phone 1001 from the accompaniment mobile phone 1003 may be an analog signal or a digital signal.
  • the live broadcast mobile phone 1001 connects to the Bluetooth headset 1005b through Bluetooth communication, and receives the main broadcast audio signal (the E signal in the figure) through the Bluetooth headset 1005b, which corresponds to step S230 in FIG. 12(a).
  • the live broadcast mobile phone 1001 receives the main broadcast audio signal through the Bluetooth module and inputs the main broadcast audio signal to the AP.
  • the live broadcast mobile phone 1001 mixes the accompaniment sound signal and the main broadcast sound signal to obtain an audio signal for distribution (D signal in the figure), which corresponds to step S240 in FIG. 12( a ).
  • the live broadcast mobile phone 1001 mixes the accompaniment sound signal from the accompaniment mobile phone 1003 and the main broadcast sound signal from the Bluetooth headset 1005b through the AP to obtain an audio signal for publishing.
  • the live broadcast mobile phone 1001 connects the audience mobile phone 1006 via the wireless network through the application and sends the audio signal for publishing to the audience mobile phone 1006 , which corresponds to step S250 in FIG. 12( a ).
  • the audience mobile phone 1006 receives the audio signal for publishing via the application and sends the feedback audio signal (C signal in the figure) to the live broadcast mobile phone 1001 via the wireless network, which corresponds to step S260 in FIG. 12( a ).
  • the live broadcast mobile phone 1001 transmits the audio signal for publishing to the communication module through the AP, and the communication module sends the audio signal for publishing to the audience mobile phone 1006 .
  • the live broadcast mobile phone 1001 transmits the audio signal for distribution to the audience mobile phone 1006 .
  • the wireless communication module receives the feedback audio signal from the viewer's mobile phone 1006, and the AP receives the feedback audio signal from the wireless communication module.
  • the live mobile phone 1001 receives the feedback audio signal from the audience.
  • the live broadcast mobile phone 1001 mixes the audio signal for announcement and the audio signal for feedback to generate the audio signal for monitoring (B signal in the figure), which corresponds to step S270 in FIG. 12( a ).
  • the live broadcast mobile phone 1001 mixes the broadcast audio signal and the feedback audio signal from the audience mobile phone 1006 through the AP to generate the monitoring audio signal.
  • the live mobile phone 1001 sends the audio signal for monitoring to the Bluetooth headset 1005b, which corresponds to step S280 in FIG. 12(a).
  • the live mobile phone 1001 sends the monitoring audio signal to the Bluetooth module through the AP, and sends the monitoring audio signal to the Bluetooth headset 1005b through the Bluetooth module, so that the anchor can hear the monitoring audio signal through the Bluetooth headset 1005b.
  • the operating system of the live broadcast mobile phone 1001 includes an application layer, a framework layer, and a hardware access layer.
  • the application live application
  • the access control module and the mixing module are set at the framework layer
  • the hardware is accessed through the hardware access layer.
  • the hardware on the host side includes an accompaniment mobile phone 1003 , a live broadcast mobile phone 1001 , and a Bluetooth headset 1005b
  • the hardware on the audience side includes a viewer mobile phone 1006 .
  • the channel control module starts the mixing module for mixing, and establishes the a channel, the b channel, the c channel, the d channel, and the e channel at the same time. In this way, the functions of sound pickup and listening of the Bluetooth headset 1005b are realized.
  • the a channel sends the accompaniment sound signal from the accompaniment mobile phone 1003, that is, the accompaniment sound signal from the wired audio interface, to the mixing module.
  • the channel b sends the main broadcast audio signal from the Bluetooth headset 1005b, that is, the main broadcast audio signal from the wireless audio interface, to the mixing module.
  • the channel c sends the audio signal for distribution obtained by mixing the accompaniment sound signal and the main broadcast sound signal by the sound mixing module from the sound mixing module to the live broadcast application, so as to be sent to the viewer's mobile phone 1006 through the Internet.
  • the channel d sends the feedback audio signal from the viewer's mobile phone 1006 received through the Internet to the mixing module by the live application.
  • the e channel sends the audio signal for monitoring obtained by mixing the audio signal for distribution and the audio signal for feedback to the wireless audio interface, so as to be sent to the Bluetooth headset 1005b through Bluetooth communication.
  • the live broadcast mobile phone 1001 is used as an example of an electronic device.
  • the accompaniment cell phone 1003 is an example of an accompaniment sound providing device.
  • Microphone 1004 is an example of a sound collection device. Both the microphone 1004 and the accompaniment mobile phone 1003 are wired to the sound card 1002 .
  • the audio signal for publishing is an example of the first audio signal
  • the audio signal for feedback is an example of the second audio signal
  • the audio signal for monitoring is an example of the third audio signal.
  • each unit/module mentioned in each device embodiment of this application is a logical unit/module.
  • a logical unit/module may be a physical unit/module or a physical unit/module.
  • a part of a module can also be implemented by a combination of multiple physical units/modules.
  • the physical implementation of these logical units/modules is not the most important, and the combination of functions implemented by these logical units/modules is the solution to the problem of this application. The crux of the technical question raised.
  • the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems raised in the present application, which does not mean that the above-mentioned device embodiments do not exist. other units/modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

本申请提供一种音频处理方法、计算机可读存储介质以及电子设备。其中,根据一种实施方式的音频处理方法,包括:电子设备启动直播类应用,所述直播类应用发布音频并且接收与所述音频关联的反馈音频;电子设备接收包括伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号为或经处理后为用于发布的第一音频信号;电子设备通过所述直播类应用发布所述第一音频信号,并经由所述直播类应用通过互联网接收作为反馈音频的第二音频信号;电子设备基于所述第一音频信号以及所述第二音频信号混合得到第三音频信号;电子设备将所述第三音频信号通过无线通信发送给与电子设备关联的无线耳机以进行监听。

Description

音频处理方法、计算机可读存储介质、及电子设备
本申请要求于2020年09月23日提交中国专利局、申请号为202011008015.4申请名称为“音频处理方法、计算机可读存储介质、及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,具体涉及一种音频处理方法、计算机可读存储介质、及电子设备。
背景技术
随着网络技术的发展,直播逐渐成为一种热门的应用,用户群数量很大。某些网红主播,甚至具有百万“粉丝”。直播方式能够充分发挥移动互联网的优势,可以将产品展示、会议、测评、调查、访谈、课堂、培训等内容现场播出。直播完成后,其音视频内容还可以随时重播、点播以发挥直播内容的最大价值。
因此,为了提高直播时的体验感,需要高质量的采集现场的音视频。目前,采集音视频的主要设备有手机和直播声卡。
其中,对于视频采集而言,无论是前摄还是后摄,手机摄像头的性能已经达到很高水平,用手机做图像采集在技术上可以满足需求,主播没有必要购买独立的用于直播的摄像设备。
然而,直播现场的音频采集则不同。一方面,受尺寸限制,手机内部不能设计专业级的声音采集结构,也不能应用大尺寸的话筒器件。另一方面,直播时往往需要将现场主播的主播音与背景的伴奏音混合以作为发布的音频进行发布,同时主播还需要听到远方观众反馈的音频。
发明内容
有鉴于此,本申请提供一种音频处理方法、电子设备以及计算机可读存储介质,该音频处理方法不存在时延问题,连线少,便于实现。
本申请人经过研究发现,无线耳机大量快速普及是一个行业趋势,随着双耳真无线蓝牙耳机(TWS)销量逐渐增大,越来越多的用户在使用此类耳机。
TWS耳机的优点在于体积小巧,完全去除了连线,不会影响用户活动。
另一方面,在用于声音采集时,由于其被佩戴在外耳,位置是基本固定的,因而其与人嘴很近,同时二者的相对位置也是基本固定的,这显示了用TWS采集人声具有的独特的优越性。在此基础上,本申请人等提出了将无线耳机,尤其是TWS耳机用于直播,通过手机和无线耳机配合,使TWS耳机与直播应用紧密结合的直播音频处理方案,以消除连线、简化设备数量和提升方便性。
以下从多个方面介绍本申请,以下多个方面的实施方式和有益效果可互相参考。
第一方面,本申请提供一种音频处理方法,用于音频处理系统,所述音频处理系统包括电子设备、伴奏音提供设备、以及无线耳机。
根据本申请的一种实施方式,所述音频处理方法包括:
所述电子设备启动直播类应用,所述直播类应用发布音频,并且接收与所述音频关联的反馈音频;
所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号;
所述电子设备通过所述直播类应用发布所述第一音频信号,并经由所述直播类应用通过互联网接收反馈音频,所述反馈音频为第二音频信号;
所述电子设备基于所述第一音频信号以及所述第二音频信号混合得到第三音频信号;
所述电子设备将所述第三音频信号通过无线通信发送给与所述电子设备关联的所述无线耳机以进行监听。
以直播场景为例,其中,电子设备可以是主播进行直播用的终端,例如手机、平板电脑等电子设备。伴奏音提供设备可以是伴奏手机、音频播放器等能够提供音频输出的设备。无线耳机和电子设备通过无线连接,用于收听由电子设备基于第一音频信号与第二音频信号混音得到的第三音频信号(也即由伴奏音、主播音、观众音混音形成的音频信号),去除了连线,不会影响用户活动,提升便携性。由此,主播通过携带较少的电子设备进行就能进行直播,减少有线连接,提升便携性,且直播效果较好,没有延时,避免了图2(a)所示的设备多、连线复杂等问题和图2(b)所示的存在时延差,并且用户也很难发觉存在时延差的问题,这将直接影响直播效果的问题。其中,所述无线耳机例如可以是蓝牙耳机,其可以通过蓝牙模块与直播手机连接以传递音频信号。当然,不限于此,任何可以通过非有线方式在耳机与手机之间进行音频信号传递的无线耳机,都应该被理解为属于本申请的范畴之内。
在上述第一方面的一种可能的实现中,音频处理系统还包括音频处理器(提供混音等音频处理)与声音采集装置(用于采集主播音),所述伴奏音提供设备和所述声音采集装置分别连接所述声音采集装置,且所述声音采集装置与所述电子设备连接,所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:所述音频处理器分别通过有线通信接收由所述伴奏音提供设备提供的伴奏音频信号以及由所述声音采集装置采集的主播音信号,并混合得到所述第一音频信号,所述电子设备通过有线通信接收由所述音频处理器提供的所述第一音频信号。
也就是说,在该实现方式中,所述电子设备接收的音频信号是由音频处理器混音后的音频信号,将其直接作为第一音频信号。具体而言,所述伴奏音信号由所述伴奏音提供设备传送给所述音频处理器,所述主播音信号由所述声音采集装置采集并传送给所述音频处理器,所述音频处理器将所述伴奏音信号与所述主播音信号进行混音,得到所述第一音频信号,电子设备通过有线通信连接音频处理器,接收来自音频处理器的第一音频信号。由此,可以保证来自音频处理器的音频信号不受外界干扰,而且通信过程中信号损失比较少,从而能够提高音质。同时,电子设备通过直播类应用播放第一音频信号的同时,经由互联网接收体现反馈的音频信号,能够使得主播面向更广泛的观众,观众的收听观看比较便捷。
其中,音频处理器、声音采集装置、以及伴奏音提供设备可以集成在一个电子设备内,也可以由多个电子设备共同实现这些功能。声音采集装置可以是单独的话筒、麦克风,也可以是具有话筒零部件的手机等电子设备,能够提供主播音信号。伴奏音提供设备可以是伴奏手机、唱片机等能够播放音频的电子设备,能够提供伴奏音信号。伴奏音提供设备和声音采集装置均可以通过 有线的方式连接音频处理器,音频处理器将伴奏音信号与主播音信号进行混音,得到所述第一音频信号。由此,通过音频处理器、声音采集装置、以及伴奏音提供设备,能够便捷地获得品质较好第一音频信号。
可选地,所述第一音频信号是数字音频信号。也就是说,音频处理器输出的第一音频可以是数字音频信号,可以通过数字输入方式连接电子设备,例如音频处理器通过USB接头、TypeC接头连接电子设备的USB接口或TypeC接口。
此外,所述第一音频信号也可以为模拟音频信号。在此情况下,通过所述电子设备将模拟信号的第一音频信号转化为数字信号的第一音频信号,并将发布数字信号的第一音频信号。也就是说,音频处理器输出的第一音频可以是模拟音频信号,通过模拟输入方式连接电子设备,例如音频处理器通过耳机接头连接电子设备的耳机座。为了便于通过无线网络发布第一音频信号,电子设备可以通过底层的Codec(编译码器)将模拟音频信号转化为数字音频信号,再通过无线网络发布该数字音频信号。也就是说,电子设备可以适应不同种类音频信号的音频处理器,均能以数字音频信号进行传输,增加音频传输的精确性和稳定性。
进一步地,所述电子设备包括混音模块与通路控制模块,当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立第一通路、第二通路、第三通路、以及第四通路,其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述应用进行发布,所述第二通路将来自所有线音频接口的所述第一音频信号发送给所述混音模块,所述第三通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,所述第四通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号通过无线通信发送给所述无线音频接口,以便所述无线耳机监听。
也就是说,电子设备通过系统底层的应用层的直播类应用启动直播,通过框架层的混音模块和通路控制模块进行混音和通路控制,从而实现音频的混合和传输。由此,通过在电子设备的系统底层的框架层设置通路控制模块与混音模块,能够在进行音频混合的基础上,准确稳定地进行音频传输,并实现无线耳机收听。
此外,在上述第一方面的另一种可能的实现中,所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:
所述电子设备经由有线通信接收所述伴奏音提供设备发送的伴奏音信号;
所述无线耳机采集主播音信号,并经由无线通信将所述主播音信号发送至所述电子设备;
所述电子设备基于所述伴奏音信号和所述主播音信号,进行混音处理以得到所述第一音频信号。
也就是说,在该实现方式中,所述电子设备将所接收到的音频信号经过混音处理后作为所述第一音频信号。
以直播场景为例,电子设备可以是主播的直播用终端,例如手机、平板电脑等电子设备。伴奏音提供设备可以是伴奏手机、唱片机等能够播放音频的电子设备,以提供伴奏音信号电子设备和伴奏音提供设备通过有线通信连接,可以保证来自伴奏音提供设备的伴奏音信号不受外界干扰,而且通信过程中信号损失比较少,从而能够提高音质。电子设备通过所述直播类应用发布第一音频信号的同时,接收体现反馈的音频信号,能够使得主播面向更广泛的观众(观众可以通过互联网收听观看电子设备播放的应用),观众的收听观看比较便捷。无线耳机和电子设备通过无线连 接,用于由无线耳机对主播音进行拾音并提供给电子设备,同时接收由电子设备基于混音得到的第三音频信号(也即由伴奏音、主播音、观众音混音形成的音频信号),无线连接完全去除了连线,不会影响用户活动,提升便携性。无线耳机由于其被佩戴在外耳,位置是基本固定的,因而其与人嘴很近,同时二者的相对位置也是基本固定的,这显示了用无线耳机采集人声具有的独特的优越性。由此,主播通过携带更少的电子设备进行就能进行直播,主播仅需携带电子设备、伴奏音提供设备和无线耳机即可进行直播,而且仅伴奏音提供设备和电子设备有线连接,有线连接较少,进一步提升便携性,且直播效果较好,没有延时,避免了图2(a)所示的设备多、连线复杂等问题和图2(b)所示的存在时延差,并且用户也很难发觉存在时延差的问题,这将直接影响直播效果的问题。
可选地,所述伴奏音信号为数字音频信号。伴奏音提供设备输出的第一音频可以是数字音频信号,可以通过数字输入方式连接电子设备,例如伴奏音提供设备通过USB接头、TypeC接头连接电子设备的USB接口或TypeC接口。
此外,所述伴奏音信号也可以为模拟音频信号。在此情况下,模拟信号的伴奏音信号通过所述电子设备转化为数字信号的伴奏音信号,此后,所述电子设备基于数字信号的伴奏音信号与所述主播音信号进行混音。也就是说,伴奏音提供设备输出的第一音频可以是模拟音频信号,可以通过模拟输入方式连接电子设备,例如伴奏音提供设备通过耳机接头连接电子设备的耳机座。为了便于通过网络发布音频信号,电子设备可以通过底层的Codec(编译码器)将模拟音频信号转化为数字音频信号,再通过网络发布数字音频信号。由此,电子设备能够适应伴奏音提供设备的音频信号的不同种类,均能以数字音频信号进行发布,增加音频传输的精确性和稳定性。
进一步地,所述电子设备包括混音模块与通路控制模块,当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,其中,所述a通路将来自所述有线音频接口的所述伴奏音信号发送给所述混音模块,所述b通路将来自所述无线音频接口的主播音信号发送给所述混音模块,所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用,以通过互联网进行发布,所述d通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号发送给所述无线音频接口,以通过无线网络发送给与所述电子设备相关联的所述无线耳机。
也就是说,电子设备通过系统底层的应用层的应用启动直播,通过框架层的混音模块和通路控制模块进行混音和通路控制,从而实现音频的混合和传输。由此,通过在电子设备的系统底层的框架层设置通路控制模块与混音模块,能够实现由无线耳机进行拾音,并基于混音模块将拾音得到的主播音与伴奏音、观众反馈音(第二音频信号)进行混音,再通过无线通信传送给主播供主播收听。
在上述第一方面的一种可能的实现中,所述混音通过线性法、固定权重法、动态权重法中的任一混音算法进行。也就是说,可以通过线性法、固定权重法、动态权重法等混音算法对音频进行混音。由此,能够简便地实现混音,且混音效果较好。
第二方面,本身请提供一种音频处理方法,用于直播端的电子设备,所述方法包括:启动直播类应用,所述直播类应用发布音频并且接收与发布的音频关联的反馈音频;接收包括伴奏音信 号在内的音频信号,并将所接收的音频信号作为或处理后作为用于发布的第一音频信号;通过所述直播类应用发布所述第一音频信号,并经由所述直播类应用通过互联网接收反馈音频,所述反馈音频为第二音频信号;将所述第一音频信号与所述第二音频信号混合,得到第三音频信号;通过无线通信传输所述第三音频信号,以便所述无线耳机进行监听。
也就是说,作为直播端的电子设备,将发布用音频信号(即第一音频信号)与反馈音频信号(即第二音频信号)进行混音后通过无线通信传输给无线耳机,实现直播现场的监听。由此,主播能够携带更少的电子设备,连线更简单更少,便捷地进行直播。
在第二方面的一种可能的实现中,所述接收包括伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:经由有线通信接收第一音频信号,所述第一音频信号混合有伴奏音信号与主播音信号。
也就是说,上述实现方式的音频处理方法,电子设备通过有线通信直接输入混合有伴奏音信号与主播音信号的第一音频信号,通过互联网进行发布并接收与该第一音频信号关联的第二音频信号,并将第一音频信号与第二音频信号混音后通过无线通信进行发送,以实现无线耳机监听。
可选地,所述第一音频信号为数字音频信号。
可选地,所述第一音频信号为模拟音频信号,所述模拟音频信号通过所述电子设备转化为用于生成所述发布的音频的数字音频信号。
在本申请的一种可能的实现中,所述电子设备包括混音模块与通路控制模块,当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立第一通路、第二通路、第三通路、以及第四通路,其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述应用进行发布,所述第二通路将来自所有线音频接口的所述第一音频信号发送给所述混音模块,所述第三通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,所述第四通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号通过无线通信发送给所述无线音频接口,以便所述无线耳机监听。
在第二方面的另一种可能的实现中,所述接收包括伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:经由有线通信接收所述伴奏音信号;经由无线通信接收主播音信号;基于所述伴奏音信号与所述主播音信号进行混音,得到所述第一音频信号。
也就是说,上述实现方式的音频处理方法,电子设备通过有线通信输入伴奏音信号,通过无线耳机进行拾音并将拾取的主播音信号通过无线通信发送给电子设备,电子设备将伴奏音信号与主播音信号混音形成第一音频信号,通过互联网进行发布并接收与该第一音频信号关联的第二音频信号,并将第一音频信号与第二音频信号混音后通过无线通信进行发送,以实现无线耳机监听。换言之,通过无线耳机同时实现了拾音与监听的功能。能够进一步减少设备、减少连线,更加便利。
可选地,所述伴奏音信号为数字音频信号。
可选地,所述伴奏音信号为模拟音频信号,所述模拟音频信号通过所述电子设备转化为数字音频信号,所述电子设备基于所述数字音频信号的伴奏音信号与所述主播音信号进行混音。
在本申请的另一种可能的实现中,所述电子设备包括混音模块与通路控制模块,当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音 频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,其中,所述a通路将来自所述有线音频接口的所述伴奏音信号发送给所述混音模块,所述b通路将来自所述无线音频接口的主播音信号发送给所述混音模块,所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用,以通过互联网进行发布,所述d通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号发送给所述无线音频接口,以通过无线网络发送给与所述电子设备相关联的所述无线耳机。
在第二方面的一种可能的实现中,所述混音通过线性法、固定权重法、动态权重法中的任一混音算法进行。
第三方面,本申请提供一种计算机可读存储介质,存储了计算机可读代码,所述计算机可读代码当由一个或多个处理器运行时,使得所述处理器执行根据上述第二方面任一种实现方式的音频处理方法。
第四方面,本申请提供一种电子设备,用于通过所述直播类应用发布音频并接收与所发布的音频关联的反馈音频,包括:无线音频接口与有线音频接口;音频信号采集模块,用于通过所述无线音频接口以及有线音频接口接收包括伴奏音信号在内的音频信号;通路控制模块和混音模块,所述通路控制模块用于当所述电子设备启动所述直播类应用时,启用所述无线音频接口与有线音频接口,并且将所述音频信号采集模块所采集得到的音频信号发送给所述混音模块,所述混音模块用于基于所述音频信号采集模块所接收的音频信号,获得用于通过所述直播类应用进行发布的第一音频信号,且所述混音模块还用于基于所述第一音信号与所述直播类应用接收的作为反馈的第二音频信号进行混音生成第三音频信号,所述通路控制模块还用于将所述第三音频信号发送给所述无线音频接口,以通过无线通信进行传输,以便与所述电子设备关联的无线耳机进行监听。
在第四方面的一种可能的实现中,当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,并建立第一通路、第二通路、第三通路、以及第四通路,其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述直播类应用进行播放,所述第二通路将来自所述有线音频接口的所述第一音频信号发送给所述混音模块,所述第三通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,所述混音模块将所述第一音频信号与所述第二音频信号进行混音,得到所述第三音频信号,所述第四通路将所述第三音频信号由所述混音模块发送给无线音频接口。
在第四方面的另一种可能的实现中,当所述应用层启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,
其中,所述a通路将来自有线音频接口的所述伴奏音信号发送给所述混音模块,
所述b通路将来自无线音频接口的主播音信号发送给所述混音模块,
所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用以通过互联网进行发布,
所述d通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,
所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第 三音频信号发送给所述无线音频接口,以通过无线网络发送给所述无线耳机。
附图说明
图1是根据本申请实施例提供的音频处理方法的应用场景图;
图2(a)是根据一个现有技术的直播音频处理系统的示意图;
图2(b)是根据另一个现有技术的直播音频处理系统的示意图;
图3是直播音频处理系统的架构图;
图4是根据本申请一个实施例提供的电子设备的结构示意图;
图5是根据本申请一个实施例提供的电子设备的软件结构框图;
图6是根据本申请一个实施例提供的音频处理系统的示意图;
图7是根据本申请一些实施例的直播手机同时启用多个耳机(音频接口)的一种实现方式示意图;
图8(a)为应用图6的音频处理系统进行音频处理方法的流程示意图;
图8(b)为应用图6的音频处理系统进行音频处理方法的另一流程示意图;
图9为根据图6的音频处理系统中主播侧的架构图;
图10为根据图6的音频处理系统中的直播手机的系统架构图;
图11是根据本申请另一个实施例提供的音频处理系统的示意图;
图12(a)是应用图11的音频处理系统的音频处理方法流程示意图图;
图12(b)是应用图11的音频处理系统的音频处理方法的另一流程示意图;
图13为图11中主播侧的架构图;
图14为图13中直播手机的系统架构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整的描述。
下面,参考附图对本申请的实施例进行进一步详细说明。
图1是根据本申请的音频处理方法和系统的示例性的应用场景。图1示出了主播(即表演者)通过主播端设备(手机或平板电脑等电子设备)通过特定的直播类应用,例如快手TM,抖音TM等,在互联网上进行直播活动。在直播的过程中除了发布直播侧的视频影像之外,还发布直播者的音频,例如演唱歌曲、朗诵,或者对产品的介绍等。作为直播活动的接受者,观众通过观众端设备(手机、平板电脑等电子设备)经由该直播类应用来观看直播。同时,针对直播活动,观众可以通过观众侧设备进行对应的反馈。除了通过直播类应用执行评论、购买、点赞等操作以外,还可以以声音的方式与直播者互动。也就是说,观众侧设备可以接收直播侧设备发送的音频,也可以向直播侧设备发送响应于该音频的反馈音频,以便让直播者可以和观众进行更好的互动。
以一个演唱类的直播活动为例,基于直播应用提供的平台,主播通过其手机发布演唱歌曲的视频和音频,观众通过手机或平板电脑观看到该节目的时候,除常规的通过应用进行评论、点赞等以外,还可以通过手机等设备与直播者语音互动,例如,点评或者合唱。
作为现有技术中实施上述直播活动的一种方案,如图2(a)所示,直播侧的设备包括用于直播的直播手机1001、声卡1002、伴奏手机1003、以及话筒1004、有线耳机1005a等。直播手机 1001中运行直播类应用,并且通过直播手机1001与观众进行互动。声卡1002包括多个输入接口和输出接口。直播手机1001通过有线方式与声卡1002的其中之一输出接口连接,声卡1002将音频输出至直播手机1001。伴奏手机1003、话筒1004、以及有线耳机1005a通过有线方式分别与声卡1002连接,以分别将伴奏音与通过话筒1004采集的主播音输出至声卡1002进行混音,并将混音后的音频一方面通过直播手机1001传输给观众,一方面通过有线耳机1005a提供给主播进行鉴听。
上述方案,能够实现既有人声(例如主播的演唱声)又有伴奏(演唱歌曲的伴奏乐曲),还能获取观众声(即观众的互动声音),实现了声音采集,混音,收听和监听的功能。但是同时却导致了设备过多、操作复杂以及连线过于杂乱的问题,影响了主播的活动。另外,为了使直播具有良好的声音效果,主播一般需要购置专业话筒1004、声卡1002和有线耳机1005a、伴奏手机1003等设备,体积庞大,上述方案基本不适用于室外场合,尤其是经常变动直播地点的情况。
此外,目前还提出了一种使用蓝牙耳机1005b的直播方案。如图2(b)所示,该方法中使用三台手机,其中一台作为直播手机1001,另外两台提供伴奏音的伴奏手机1003。其中一台伴奏手机1003A连接蓝牙耳机1005b,用于向主播以蓝牙通信方式提供伴奏音,另一台伴奏手机1003B与声卡1002连接以将该伴奏音与话筒1004采集的主播音通过声卡1002进行混音。
然而,该方案在使用时,需要两台伴奏手机1003A、1003B启动相同的音乐APP,进入相同的播放界面,并且要求用户双手同时点击播放按钮。该方案虽然通过使用蓝牙耳机1005b减少了连线,但其使用场景受限,通常仅仅适用于唱歌的情形。更突出的问题是,用户双手点击两台伴奏手机1003A、1003B的播放按钮,难以严格同步,不可避免地会存在时延差,并且用户也很难发觉存在时延差的问题,这将直接影响直播效果。该方案虽然通过使用蓝牙耳机减少了连线,由于蓝牙耳机只能收听到伴奏音,对于观众的反馈、主播自己的声音不能实现监听,因此但其使用场景受限,通常仅仅适用于唱歌的情形。
相对于此,
下面,首先简单描述一下一个完整直播类应用的原理、流程、及系统架构。
直播的原理在于,把主播录制的音视频,推送到服务器,再由服务器分发给观众观看。观众在观看的同时可以参与,与主播之间进行互动。
其流程主要包括:音视频采集、数据处理、音视频编码、推流、数据分发、拉流、音视频解码、音视频播放、互动。
就系统架构而言,如图3所示,主要包括采集端1(也即主播端)、流媒体服务器2、以及播放端3(即观众端)。其中,采集端1实现音视频的采集、数据处理、音视频编码、并进行音视频的封装、以及推流;流媒体服务器2实现数据分发、转码、鉴黄、截屏等;播放端3主要实现拉流、音视频解码、音视频的播放。
此外,直播之所以火爆主要还在于观众参与,从而与主播之间产生互动。在互动的场景下,播放端3在进行音视频播放之外,还采集观众的反馈(可以包括音视频),并将其进行编码、封装后,推流至流媒体服务器2,并由流媒体服务器2实现数据分发,进而反馈至播放端1,使得主播能够收听,从而实现互动。为了更好地进行互动,主播通常需要对直播的音频内容、以及观众的反馈音频内容通过耳机进行监听。
本申请是针对采集端1提出的一种音频处理方案,以期使主播更方便实现监听。
本申请的音频处理方案,通过同时启用直播手机1001的无线音频接口与有线音频接口,利用 无线耳机接收无线音频接口发送的音频来实现监听。
根据本申请的一个具体方案,采用有线方式将伴奏音和主播音混合后输入直播手机,利用直播手机通过互联网进行发布,换言之,提供给观众侧手机,观众侧手机通过互联网将针对所发布音频的反馈音频发送给服务器以传递给直播手机中的应用,直播手机将伴奏音、主播音和观众反馈声音混合后通过无线通信发送到无线耳机,为与直播手机无线通信的无线耳机提供监听功能。也就是说,通过无线耳机来实现监听功能。
根据本申请的另一个具体方案,采用有线方式将伴奏音输入直播手机,此外与直播手机无线通信的无线耳机拾取主播音,利用直播手机将伴奏音和人声混合后通过互联网进行发布以便观众侧手机收听并进行反馈,直播手机将伴奏音、人声和观众反馈声音混合后发送到无线耳机以实现无线耳机的监听功能。也就是说,通过无线耳机同时实现了拾音与监听的功能。
图4示出了根据本申请一些实施例的电子设备100的结构示意图。例如,本申请的上述应用场景中所提及的直播手机等采集侧1的终端设备。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接头130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,话筒1004170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器110可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。例如,以根据本发明的一个方案的直播手机为例,在直播手机启动应用后,运行指令以建立多个音频通路,并进行相应的混音,以及将各音频信号发送给各对应的装置。具体而言,直播手机接收音频处理器发送的发布用音频信号(发布用音频信号中混合了伴奏音信号与主播音信号),并通过互联网将发布用音频信号发送给观众手机;接收来自观众的反馈音信号,并将发布用音频信号与反馈音频信号进行混音,生成监听用音频信号;并且将监听用音频信号通过蓝牙通信发送给蓝牙耳机。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机1005b接听电话的功能。根据本申请的音频处理方法,即是将音频模块170处理得到的发布用音频信号(由伴奏音和主播音混音形成的信号)传送给远端观众实现分享,监听用音频信号(即由伴奏音、观众音、主播音混音后的音频信号)传送给蓝牙耳机1005b以实现鉴听。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机1005b接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机1005b播放音乐的功能。
内部存储器121作为计算机可读存储介质,可以包括用于存储数据和/或指令的一个或多个有形的、非暂时性计算机可读介质。例如,内部存储器121可以包括闪存等任何合适的非易失性存储器和/或任何合适的非易失性存储设备,例如一个或多个硬盘驱动器(Hard-Disk Drive,HDD(s)),一个或多个光盘(Compact Disc,CD)驱动器,和/或一个或多个数字通用光盘(Digital Versatile Disc,DVD)驱动器。根据本申请的一些实施例,作为计算机可读存储介质的存储器121上存储有指令,该指令在计算机上执行时使处理器110执行根据本申请实施例中的音频处理方法,具体可参照上述实施例的方法,在此不再赘述。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码 器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图5是本发明实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)的系统库,以及内核层。
应用程序层可以包括一系列应用程序包。例如,启动本申请的音频处理方法的直播APP即处于该应用程序层。
如图5所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图5所示,根据本申请的音频处理系统中的直播用手机(作为电子设备的示例),其应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器、通路控制模块、以及混音模块等。
其中,通路控制模块用于进行音频通路配置并建立音频通路,以进行音频信号的传输。混音模块用于对音频信号进行混音。
每一条音频通路,本质上就是缓冲区(buffer)之间的数据搬运。也就是说,在音频通路的始端是一个数据缓冲区,终端是另一个缓冲区。在通路控制模块的控制下,数据从始端的缓冲区搬到终端的缓冲区,逻辑上看就是有一条通路。
通路控制模块是经由CPU运行特定代码实现的软件模块。通路控制模块读取输入缓冲区中的数据,处理后存入输出缓冲区,从而达成对数据的处理。将上述代码以“库”的形式存在,库中封装多个用于实现通路控制的功能的函数。调用这些函数,就能建立输入、输出缓冲区,启动数据搬运和复制的工作,也包括撤销缓冲区释放“通路”的功能。例如,调用“库”的接口函数就可实现读取数据,处理和输出,也即实现音频通路的建立。
混音模块也是经由CPU运行特定代码实现的软件模块。混音模块的本质即对输入信号的求和。混音模块进行混音可以通过线性法、固定权重法(随信号幅值)、动态权重法(参数变化改变求和的权重)中的任一混音算法进行。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面,以蓝牙耳机作为无线耳机的示例,该蓝牙耳机通过蓝牙通信与直播手机进行音频信号的传递,对于本申请的上述两种方案进行更详细的说明。
首先,结合图6~图10,说明本申请一个实施例的音频处理方法及实现该音频处理方法的音频 处理系统。
如图6所示,实现该实施例的音频处理方法的音频处理系统包括:直播手机1001、伴奏手机1003、声卡1002、话筒1004以及蓝牙耳机1005b。为了后续便于对该音频处理系统的工作流程进行理解,图6中还示出了用于播放直播手机1001发布的音频的观众手机1006。
在本实施例中,直播手机1005是电子设备的实例,声卡1002是音频处理器的实例,且伴奏手机1003是伴奏音提供设备的实例,话筒1004是声音采集装置的实例,声卡1002是音频处理器的实例。
声卡1002是一种外置声卡,用于将来自话筒1004、以及伴奏手机1003的音频信号进行相应处理后传送给直播手机1001。也就是说,声卡1002具有多个音频输入接口,以分别从话筒1004、伴奏手机1003等设备输入音频信号,通过声卡1002进行相应处理。此外,声卡1002还具有一个音频输出接口,以通过该音频输出接口输出处理后的音频信号给直播手机1001。
声卡1002例如可以进行美声处理,例如电音处理、混音处理、变音处理等。此外,直播声卡1002还可以使用自带的掌声、亲嘴、鄙视、笑声等趣味音效,让直播显得没有那么单调。此外,声卡1002还可以具有降噪功能,以便主播们讲话的时候背景音乐会突然降低,当主播说完背景音乐就恢复音量。
当前,市场上常见的声卡例如包括ICKBTM SO8声卡、客所思TMFX5声卡、得胜TMMX1直播声卡、森然播吧TM声卡、唱吧TMK10声卡等。
伴奏手机1003以及话筒1004,通过有线接入声卡1002的音频输入接口。在直播演唱的场景中,伴奏手机1003用于将所演唱的歌曲的伴奏音输入至声卡1002中,话筒1004用于收集主播演唱的声音并输入声卡1002中。声卡1002将伴奏音的音频和演唱音的音频混合并进行相应的处理后,通过音频输出接口以有线通信方式输出给直播手机1001。
直播手机1001与声卡1002采用有线方式连接。直播手机1001通过其上运行的直播应用,将声卡1002发送的伴奏音和主播演唱音的混合音进行发布并推流至流媒体服务器,观众手机1006通过从该流媒体服务器进行拉流即可进行播放。另外,直播手机1001还通过直播应用会通过互联网接收到观众侧的反馈音频,并且将反馈音频和伴奏音、人声混合并以蓝牙通信的方式发送给蓝牙耳机1005b。通过蓝牙耳机1005b,直播者可以监听伴奏音、自己演唱的声音,以及来自观众手机1006的反馈音频。
话筒1004采集的主播音信号和伴奏手机1003提供的伴奏音信号进行混音之后形成的发布用音频信号作为第一音频信号的实例。所谓发布用音频信号是指该音频信号是通过直播类应用进行直播发布的。
体现观众侧的反馈音的反馈音频信号作为第二音频信号的实例。直播手机1001将接收到的反馈音频信号,和伴奏音、人声混合后得到的监听用音频信号作为第三音频信号的实例。所谓监听用音频信号是指,该音频信号是用于传送给主播,以对直播内容进行监听。
直播手机1001用于启动应用以通过应用向观众手机1006发布音频,并且接收来自观众手机1006的反馈。
伴奏手机1003和话筒1004均有线连接声卡1002,声卡1002有线连接直播手机1001。声卡1002用于对话筒1004采集的主播演唱的声音和伴奏手机1003提供的伴奏音进行混音,形成发布用音频信号,将发布用音频信号通过有线连接发送给直播手机1001。
观众手机1006通过互联网连接直播手机1001,以接收由直播手机1001发布的发布用音频信 号,并将体现反馈的音频信号发送给直播手机1001。
蓝牙耳机1005b与直播手机1001通过蓝牙通信连接,直播手机1001还用于将发布用音频信号与反馈音频信号进行混音,生成监听用音频信号并将监听用音频信号发送给蓝牙耳机1005b,以供主播进行监听。
根据本申请的音频处理系统,直播手机1001一方面通过有线连接声卡1002(即有线连接声卡1002的音频输出接口)以输入发布用音频信号,同时直播手机1001通过蓝牙通信连接蓝牙耳机1005b以输出监听用音频信号,也就是说,上述直播手机1001同时启用模拟耳机和蓝牙耳机1005b功能。并基于此,通过在直播手机1001、声卡1002、蓝牙耳机1005b以及观众手机1006之间建立多条音频通路,并将来自声卡1002的发布用音频信号与来自观众手机1006的反馈音频信号进行混音,生成监听用音频信号并通过蓝牙通信提供给蓝牙耳机1005b,由此实现蓝牙耳机的监听功能。
下面参考图7,对上直播手机1001同时启用两个音频接口(模拟耳机和蓝牙耳机1005b)的实现方式进行说明。
一个手机允许同时启用模拟耳机和蓝牙耳机1005b,可以通过软件来实现。
具体而言,以安卓系统为例,可以修改安卓framework(编程框架)中的audiopolicy(音频控制)配置,设成输入输出设备包含两个耳机的IOProfile(用户配置文件输入输出)。具体地,如图6所示,在音频配置文件(audio_policy.conf)中,同时定义了多个音频(audio)接口,每一个audio接口包含若干output(输出)和input(输入),而每个output和input又同时支持多种IOprofile,每种IOProfile又支持若干种设备,因此两个耳机作为两个设备,配置到对应的IOprofile即可。也就是说,通过修改配置文件,即可以实现同时启用多个音频接口,在此基础上,实现本申请的音频处理方法。
如图8(a)所示,用于上述实施例的音频处理系统的音频处理方法包括:
步骤S110,首先,直播手机1001启动直播类应用。通过该直播类应用,直播手机1001可以经由流媒体服务器向收看该直播应用的观众手机1006发布音频。此外,直播手机1001还可以接收经由所述流媒体服务器获得的来自观众的反馈音频。
该直播类应用可以是应用于直播的软件,例如抖音TM、快手TM等。主播通过直播手机1001打开直播类应用,通过该直播类应用经由互联网通过流媒体服务器发送音频至观众手机1006,观众通过观众手机1006打开同样的应用接听来自主播的音频,并通过观众手机1006将反馈音频即观众音信号经由互联网通过流媒体服务器发送至直播手机1001。主播通过直播手机1001接收来自观众的反馈音频,例如用户的评论、用户的合唱等。根据一些实施例,在本申请的音频处理系统中,直播手机1001启动直播类应用时,直播手机1001启动蓝牙耳机1005b收听功能。此后,执行如图8(a)所示的音频处理方法中的S120-S160步骤。
步骤S120,在启动蓝牙耳机1005b收听功能之后,直播手机1001经由有线通信接收声卡1002发送的发布用音频信号,发布用音频信号混合有伴奏音信号与主播音信号。更具体而言,具体地,参考上述对于图6的说明,声卡1002分别通过有线音频输入接口连接伴奏手机1003以及话筒1004,并通过有线音频输出接口连接直播手机1001。伴奏手机1003通过有线通信将伴奏音信号传递给声卡1002,话筒1004通过有线通信将主播音信号传递给声卡1002,声卡1002将伴奏音信号和主播音信号进行混音,得到发布用音频信号。声卡1002通过有线通信将发布用音频信号发送给直播手机1001。
步骤S130,直播手机1001经由应用通过互联网对发布用音频信号进行发布。具体地,直播应用将伴奏音信号和主播音信号混合得到的发布用音频信号通过互联网发送到应用关联的服务器上,以由观众手机1006实现进行。
步骤S140,直播手机1001经由直播类应用通过互联网接收来自观众手机1006的观众音。也就是说,观众手机1006接收发布用音频信号,并将发布用音频信号的反馈音频信号,即观众音(也就是第二音频信号)经由互联网发送给直播手机1001。
步骤S150,直播手机1001将发布用音频信号与反馈音频信号进行混音,生成监听用音频信号。混音方法如上所述,通过直播手机1001的应用程序框架层中的混音模块来进行。在此不再赘述。
步骤S160,直播手机1001将监听用音频信号通过蓝牙通信网络发送给蓝牙耳机1005b。
上述步骤通过图8(b)能够更直观地了解各个设备间信号处理的示例性流程。
步骤10,直播手机1001启动直播类应用。在启动直播类应用的情况下,直播手机1001通过修改操作系统默认音频通路配置,允许同时启用有线耳机(模拟耳机)和蓝牙耳机1005b。并且通过检测有线音频接口来确认模拟耳机是否插入,通过检测蓝牙是否配对来确认蓝牙耳机1005b是否配对成功。在确认模拟耳机插入以及蓝牙耳机1005b配对成功的情况下,直播手机1001进入收听模式。
步骤S11,直播手机1001通过声卡1002获取由伴奏音与主播音混合形成的观众用音频。步骤S12,,直播手机1001将该观众用音频通过应用传送给观众手机1006。步骤S13,直播手机1001接收来自观众手机1006的反馈音频。步骤S14,直播手机1001将观众用音频与反馈音频混音后形成的监听用音频通过蓝牙通信发送给蓝牙耳机1005b。
也就是说,声卡1002通过其音频输出接口有线连接直播手机1001的模拟耳机接口,可以保证来自声卡1002的音频信号不受外界干扰,而且通信过程中信号损失比较少,从而能够提高音质。直播手机1001和观众手机1006,通过互联网实现连接,例如,具体可以通过网络服务器进行连接,能够使得主播面向更广泛的观众,观众的收听观看比较便捷。蓝牙耳机1005b和直播手机1001通过蓝牙通信连接,完全去除了连线,不会影响用户活动,提升便携性。直播手机1001输入为来自声卡1002的发布用音频信号(伴奏音信号和主播音混音而成)和来自观众手机1006的反馈音频信号(体现反馈的观众音信号)。声卡1002的输出为发布用音频信号。观众手机1006的输入为发布用音频信号,输出为反馈音频信号。蓝牙耳机1005b,用于收听,输入为监听用音频信号(发布用音频信号和反馈音频信号混音而成)。由此,主播通过携带较少的电子设备进行就能进行直播,减少有线连接,提升便携性,且直播效果较好,没有延时,避免了图2(a)所示的设备多、连线复杂等问题和图2(b)所示的存在时延差,并且用户也很难发觉存在时延差的问题,这将直接影响直播效果的问题。
下面,结合图9说明实现上述音频处理方法中各电子设备间的信号传递。
图9所示的实施根据本申请的音频处理方法的系统包括如图5所示的各电子设备。更进一步地,图9中还示出了直播手机1001中包括的与运行本实施例的方案相关的各硬件模块。应用程序处理器(AP)、通信模块(蜂窝或WIFI通信模块)、蓝牙模块、编译码器(Codec)、及显示模块等。AP用于程序的处理,即程序在AP中运行,以实现相应的功能,例如,混音、设备内部的通信等。通信模块用于在不同设备之间进行通信以实现音频信号传输。
在启动直播类应用(即上述步骤S110)后,伴奏手机1003播放音乐或者背景音作为伴奏以 提供伴奏音频,话筒1004收集主播的人声以提供主播音频,话筒1004和伴奏手机1003将各音频信号输出给声卡1002,以混合成发布用音频信号。声卡1002将发布用音频信号发送至直播手机1001。其中,发布用音频信号即可以以模拟信号的方式发送给直播手机1001,也可以以数字信号的方式发送给直播手机1001。当已模拟信号的方式发送,直播手机1001通过Codec转换成数字音频信号(对应图9中信号A),此后通过应用处理器AP经由通信模块将发布用音频信号发送至观众。观众手机1006将观众根据观众用音频的反馈音频(对应于图9中的远端声信号C)发送至直播手机1001。直播手机混合观众用音频A和反馈音频信号C生成监听用音频信号(对应图9中信号B),并将监听用音频信号通过蓝牙模块发送至蓝牙耳机1005b。
具体而言,首先,直播手机1001通过有线通信连接声卡1002并经由有线通信接收声卡1002发送的发布用音频信号(图中A信号),即对应于图8(a)中的步骤S120。
直播手机1001接收来自声卡1002的伴奏和主播音的方式有两种。
方式1,声卡1002产生的发布用音频信号为数字信号。声卡1002通过数字输入方式连接直播手机1001。
例如,声卡1002能够提供数字音频信号,且配有USB接头,将USB接头插入直播手机1001的USB接口,直播手机1001通过USB数据通路接收声卡1002提供的发布用音频信号,并将发布用音频信号传递至AP。
方式2,声卡1002产生的发布用音频信号为模拟信号。声卡1002通过模拟输入方式连接直播手机1001。模拟音频信号通过直播手机1001的系统底层的编解码器(Codec)转化为数字音频信号,直播手机1001将数字音频信号发送给观众手机1006。
例如,声卡1002通过3.5mm耳机头(有线耳机,也可称之为模拟耳机)插入直播手机1001的3.5mm耳机座,直播手机1001将来自声卡1002的模拟信号的发布用音频信号传递至Codec,经过Codec内部的ADC采样后转换成数字信号的发布用音频信号,发布用音频信号(图中信号A)通过总线(例如I2S,集成电路内置音频总线,或slimbus,串行低功耗芯片内部媒体总线等)输入给AP。
此后,AP将该发布用音频信号通过通信模块发送给观众手机1006,即对应于图8(a)中步骤S130。
直播手机1001的通过AP将发布用音频信号传递至通信模块,以通过通信模块将发布用音频信号发送流媒体服务器,以便观众手机1006播放。由此,实现了直播手机1001通过直播类应用对发布用音频信号进行发布。
然后,直播手机1001接收来自观众的作为反馈的反馈音频信号(图9中C信号),即对应于图8(a)中步骤S140。
具体地,由蜂窝或网络通信模块通过互联网接收来自观众手机1006的反馈音频信号,AP接收来自蜂窝或网络通信模块的反馈音频信号。由此,实现了直播手机1001接收来自观众的反馈音频信号。
然后,直播手机1001将发布用音频信号与反馈音频信号进行混音,生成监听用音频信号(图中B信号),即对应于图8(a)中步骤S150。
直播手机1001通过AP将来自声卡1002的发布用音频信号和来自观众手机1006的反馈音频信号进行混音,生成监听用音频信号。
然后,直播手机1001通过蓝牙通信连接蓝牙耳机1005b并将监听用音频信号通过蓝牙通信发 送给蓝牙耳机1005b,实现蓝牙耳机1005b的收听,即对应于图8(a)中步骤S160。
直播手机1001通过AP将监听用音频信号通过蓝牙模块发送监听用音频信号至蓝牙耳机1005b,从而使得主播通过蓝牙耳机1005b能够听到监听用音频信号。
接下来,结合图10的直播手机1001的功能模块图进一步详细说明根据本申请的直播手机1001中音频处理方法的在操作系统中的实施过程。
如图10所示,直播手机1001的操作系统包括应用层、框架层、硬件访问层。其中,在应用层设置应用(直播应用),在框架层设置通路控制模块和混音模块,通过硬件访问层对硬件进行访问。
主播侧的硬件包括声卡1002、直播手机1001、蓝牙耳机1005b,观众侧的硬件包括观众手机1006。
当启动直播类应用后,操作系统将启动信息通知通路控制模块,通路控制模块确定蓝牙耳机1005b与模拟耳机是否连接成功。当确定蓝牙耳机1005b配对成功且声卡1002与直播手机1001有线连接,通路控制模块启动混音模块以便进行混音,同时建立第一通路、第二通路、第三通路、以及第四通。
其中,第一通路将来自声卡1002的发布用音频信号发送给直播类应用进行发布,以便观众手机1006通过进行播放。
第二通路将来自声卡1002的发布用音频信号发送给混音模块,由混音模块在框架层进行处理。
第三通路由直播应用将通过互联网接收的来自观众手机1006的反馈音频信号发送给混音模块,由混音模块在框架层进行处理。
混音模块将发布用音频信号与反馈音频信号进行混音,得到监听用音频信号。
第四通路将由混音模块基于发布用音频信号与反馈音频信号混音得到的监听用音频信号通过蓝牙网络发送给蓝牙耳机1005b。由此,能够实现主播侧的蓝牙耳机1005b收听功能。
下面,结合图11~图14,说明本申请另一个实施例的音频处理方法及实现该音频处理方法的音频处理系统。与上述实施例不同,根据本实施例,蓝牙耳机1005b不仅启用其收听功能,同时还启用其拾音功能,也就是说,主播音通过蓝牙耳机1005b拾取后,经蓝牙通信发送给直播手机1001进行用于混音。
具体地,如图11所示,实现实施例的音频处理方法的音频处理系统包括:直播手机1001、伴奏手机1003、观众手机1006以及蓝牙耳机1005b。
其中,直播手机1001作为电子设备的示例,伴奏手机1003作为伴奏音提供设备的示例。
直播手机1001用于启动应用以通过应用向观众手机1006发布音频,并且接收来自观众手机1006的反馈。
直播手机1001与伴奏手机1003之间有线连接以经由有线通信接收伴奏手机1003发送的伴奏音信号。
直播手机1001与蓝牙耳机1005b之间蓝牙通信连接,并通过蓝牙耳机1005b接收主播音信号。
直播手机1001将伴奏音信号与主播音信号进行混音,得到发布用音频信号并将发布用音频信号发送给与直播手机1001之间通过无线网络连接的观众手机1006,并接收来自观众手机1006的体现反馈音频的反馈音频信号(即观众音信号)。
此外,直播手机1001还将发布用音频信号与反馈音频信号进行混音,生成监听用音频信号并将监听用音频信号通过蓝牙通信网络发送给蓝牙耳机1005b,以实现监听功能。
也就是说,与上述结合图5~图10描述的实施例不同的是,根据本实施例,蓝牙耳机1005b不仅具有收听功能,同时还具有拾音功能,通过蓝牙耳机1005b拾取主播音,将该主播音信号通过蓝牙通信发送给直播手机1001,由直播手机基于该主播音信号进行混音并发布给观众侧。
为了实现蓝牙耳机1005b的拾音加监听功能,同样地,直播手机1001在启动应用之后,需要修改音频通路配置来允许同时启用有线耳机和蓝牙耳机1005b。具体实施方式可以参考上述参考图8的描述,在此省略其详细说明。
相应地,如图12(a)所示,通过上述音频处理系统实现的音频处理方法包括:
步骤S210,直播手机1001启动直播类应用。关于该直播类应用的具体细节,参考上述实施例,在此省略其详细说明。
步骤S220,直播手机1001经由有线通信接收伴奏手机1003发送的伴奏音信号。具体地,将伴奏手机1003通过有线音频接口与直播手机1001进行连接,从而直播手机1001经由有线通信接收伴奏手机1003发送的伴奏音信号。
步骤S230,直播手机1001通过无线通信接收来自蓝牙耳机1005b的主播音信号。具体而言,蓝牙耳机1005b接收主播音信号,并经由蓝牙通信将主播音信号发送至直播手机1001。
也就是说,在本实施例中,与上述实施例不同的是,主播音信号是通过蓝牙耳机1005b进行拾音,并通过蓝牙通信传送给直播手机1001的。由此,相比于上述实施方式,无需特意配备声卡1002来对伴奏音信号与主播音信号直接通过直播手机1001中的混音模块进行混音。从而进一步简化了设备以及连线。
而且,蓝牙耳机1005b在用于声音采集时,由于其被佩戴在外耳,位置是基本固定的,因而其与人嘴很近,同时二者的相对位置也是基本固定的,这显示了用TWS采集人声具有的独特的优越性。关于利用蓝牙耳机1005b进行拾音,可以通过现有的方法来实现,例如通过蓝牙耳机1005b隐藏的麦进行拾音等,在此省略其详细说明。
步骤S240,直播手机1001基于伴奏音信号和主播音信号进行混音处理以得到发布用音频信号。
步骤S250,直播手机1001经由应用发布该发布用音频信号。也就是说,直播手机1001通过直播应用发布该直播用音频信号,以推流给流媒体服务器,广州手机1006通过互联网从该流媒体服务器进行拉流,播放该发布用音频信号。
步骤S260,直播手机1001通过该直播应用接收反馈音频信号。具体而言,观众手机1006经由应用接收发布用音频信号,将反馈音频信号经由无线网络发送给直播手机1001。
步骤S270,直播手机1001基于发布用音频信号与反馈音频信号进行混音,生成监听用音频信号。
步骤S280,直播手机1001将监听用音频信号通过蓝牙通信发送给蓝牙耳机1005b。
上述步骤通过图12(b)能够更直观地了解。
步骤S20:首先,直播手机1001启动应用。然后直播手机1001通过伴奏手机1003获取由伴奏音并通过蓝牙耳机1005b拾取主播音。步骤S21:此后直播手机1001将该两音频进行混合形成的发布用音频信号。步骤S22:直播手机1001将该发布用音频信号通过应用传送给观众手机1006。步骤S23:并接收来自观众手机1006的反馈音频信号。步骤S24:此后直播手机将发布用音频信号与反馈音频信号混音后形成的监听用音频信号,通过蓝牙通信发送给蓝牙耳机1005b。
也就是说,本实施方式中,混音模块进行2次混音,即将伴奏音信号与主播音信号进行混音, 得到发布用音频信号,提供给观众手机1006,此外,还将发布用音频信号进一步与来自观众手机1006的观众音信号进行混音,通过蓝牙模块提供给主播进行监听。
根据本实施例,蓝牙耳机1005b,用于收听和拾音,输入为监听用音频信号,输出为主播音。蓝牙耳机1005b由于其被佩戴在外耳,位置是基本固定的,因而其与人嘴很近,同时二者的相对位置也是基本固定的,这显示了用蓝牙耳机1005b采集人声具有的独特的优越性。由此,主播通过携带更少的电子设备进行就能进行直播,主播仅需携带直播手机1001、伴奏音提供设备和蓝牙耳机1005b即可进行直播,而且仅伴奏手机1003和直播手机1001有线连接,有线连接较少,进一步提升便携性,且直播效果较好,没有延时,避免了图2(a)所示的设备多、连线复杂等问题和图2(b)所示的存在时延差,并且用户也很难发觉存在时延差的问题,这将直接影响直播效果的问题。
下面,结合图13说明实现上述音频处理方法中各电子设备间的信号传递。
图13所示的实施根据本申请的音频处理方法的系统包括如图11所示的各硬件。更进一步地,图13示出了直播手机1001包括应用程序处理器(AP)、通信模块(蜂窝或WIFI通信模块)、蓝牙模块、编译码器(Codec)、及显示模块等。
在启动应用(即上述步骤S210)后,伴奏手机1003播放音乐或者背景音以提供伴奏音频,蓝牙耳机1005b收集主播的人声以提供主播音频,蓝牙耳机1005b和伴奏手机1003将音频输出给直播手机1001,以混合成观众用音频。直播手机1001将观众用音频发送至观众。观众手机1006将观众根据观众用音频的反馈音频发送至主播手机。主播手机混合观众用音频和反馈音频生成监听用音频,并将监听用音频发送至蓝牙耳机1005b。
首先,直播手机1001通过有线通信连接伴奏手机1003并经由有线通信接收伴奏手机1003发送的伴奏音信号(图中A信号),即对应于图12(a)中的步骤S220。
与上述实施例同样地,直播手机1001接收来自伴奏手机1003的伴奏音信号可以是模拟信号,也可以是数字信号。关于具体的信号处理,可以参考上述实施例中对于发布用音频信号的处理,在此省略其详细说明。
接着,直播手机1001通过蓝牙通信连接蓝牙耳机1005b,并通过蓝牙耳机1005b接收主播音信号(图中E信号),即对应于图12(a)中的步骤S230。
直播手机1001通过蓝牙模块接收主播音信号并将主播音信号输入给AP。
接下来,直播手机1001将伴奏音信号与主播音信号进行混音,得到发布用音频信号(图中D信号),即对应于图12(a)中的步骤S240。
直播手机1001通过AP将来自伴奏手机1003的伴奏音信号和来自蓝牙耳机1005b的主播音信号进行混音,得到发布用音频信号。
然后,直播手机1001经由应用通过无线网络连接观众手机1006并将发布用音频信号发送给观众手机1006,即对应于图12(a)中的步骤S250。
观众手机1006经由应用接收发布用音频信号并将体现反馈的反馈音频信号(图中C信号)经由无线网络发送给直播手机1001,即对应于图12(a)中的步骤S260。
直播手机1001的通过AP将发布用音频信号传递至通信模块,通信模块将发布用音频信号发送至观众手机1006。由此,实现了直播手机1001将发布用音频信号发送至观众手机1006。
无线通信模块接收来自观众手机1006的反馈音频信号,AP接收来自无线通信模块的反馈音频信号。由此,实现了直播手机1001接收来自观众的反馈音频信号。
此后,直播手机1001将发布用音频信号和反馈音频信号进行混音,生成监听用音频信号(图中B信号),即对应于图12(a)中的步骤S270。
直播手机1001通过AP将发布用音频信号和来自观众手机1006的反馈音频信号进行混音,生成监听用音频信号。
最后,直播手机1001将监听用音频信号发送至蓝牙耳机1005b,即对应于图12(a)中的步骤S280。
直播手机1001通过AP将监听用音频信号发送至蓝牙模块,通过蓝牙模块发送监听用音频信号至蓝牙耳机1005b,从而使得主播通过蓝牙耳机1005b能够听到监听用音频信号。
下面,结合图14说明上述直播手机1001实施音频处理方法中的工作流程。
直播手机1001的操作系统包括应用层、框架层、硬件访问层。其中,在应用层设置应用(直播应用),在框架层设置通路控制模块和混音模块,通过硬件访问层对硬件进行访问。其中,主播侧的硬件包括伴奏手机1003、直播手机1001、蓝牙耳机1005b,观众侧的硬件包括观众手机1006。
当确定蓝牙配对成功且伴奏手机1003与直播手机1001有线连接,通路控制模块启动混音模块以便进行混音,同时建立a通路,b通路、c通路、d通路、以及e通路。由此,实现蓝牙耳机1005b的拾音和收听功能。
其中,a通路将来自伴奏手机1003的伴奏音信号,也就是来自有线音频接口的伴奏音信号发送给混音模块。
b通路将来自蓝牙耳机1005b的主播音信号,也就是来自无线音频接口的主播音信号发送给混音模块。
c通路将由混音模块基于伴奏音信号与主播音信号进行混音得到的发布用音频信号从混音模块发送给直播应用,以便通过互联网发送给观众手机1006。
d通路由直播应用将通过互联网接收的来自观众手机1006的反馈音频信号发送给混音模块。
e通路将由混音模块基于发布用音频信号与反馈音频信号混音得到的监听用音频信号发送给无线音频接口,以便通过蓝牙通信发送给蓝牙耳机1005b。
上述的实施例中,直播手机1001作为电子设备的示例。伴奏手机1003为伴奏音提供设备的示例。话筒1004为声音采集装置的示例。话筒1004和伴奏手机1003均与声卡1002有线连接。发布用音频信号为第一音频信号的示例,反馈音频信号为第二音频信号的示例,监听用音频信号为第三音频信号的示例。
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明书附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将 一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,在不偏离本申请的精神和范围内可以在形式上和细节上对其作各种改变、变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (25)

  1. 一种音频处理方法,用于音频处理系统,所述音频处理系统包括电子设备、伴奏音提供设备、以及无线耳机,其特征在于,包括:
    所述电子设备启动直播类应用,所述直播类应用发布音频,并且接收与所述音频关联的反馈音频;
    所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为第一音频信号,或处理后作为第一音频信号;
    所述电子设备通过所述直播类应用发布所述第一音频信号,并经由所述直播类应用通过互联网接收反馈音频,所述反馈音频为第二音频信号;
    所述电子设备基于所述第一音频信号以及所述第二音频信号混合得到第三音频信号;
    所述电子设备将所述第三音频信号通过无线通信发送给与所述电子设备关联的所述无线耳机以进行监听。
  2. 根据权利要求1所述的音频处理方法,所述音频处理系统还包括音频处理器与声音采集装置,所述伴奏音提供设备和所述声音采集装置分别连接所述声音采集装置,且所述声音采集装置与所述电子设备连接,其特征在于,所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:
    所述音频处理器分别通过有线通信接收由所述伴奏音提供设备提供的伴奏音频信号以及由所述声音采集装置采集的主播音信号,并混合得到所述第一音频信号,
    所述电子设备通过有线通信接收由所述音频处理器提供的所述第一音频信号。
  3. 根据权利要求2所述的音频处理方法,其特征在于,所述音频处理器得到的所述第一音频信号是数字音频信号。
  4. 根据权利要求2所述的音频处理方法,其特征在于,所述音频处理器得到的所述第一音频信号为模拟信号,所述电子设备将模拟信号的第一音频信号转化为数字信号的第一音频信号,并将数字信号的第一音频信号通过所述直播类应用进行发布。
  5. 根据权利要求3或4所述的音频处理方法,其特征在于,所述电子设备包括混音模块与通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立第一通路、第二通路、第三通路、以及第四通路,
    其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述应用进行发布,
    所述第二通路将来自所有线音频接口的所述第一音频信号发送给所述混音模块,
    所述第三通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,
    所述第四通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号通过无线通信发送给所述无线音频接口,以便所述无线耳机监听。
  6. 根据权利要求1所述的音频处理方法,其特征在于,所述电子设备接收包括所述伴奏音提供设备发送的伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:
    所述电子设备经由有线通信接收所述伴奏音提供设备发送的伴奏音信号;
    所述无线耳机采集主播音信号,并经由无线通信将所述主播音信号发送至所述电子设备;
    所述电子设备基于所述伴奏音信号和所述主播音信号,进行混音处理得到所述第一音频信号。
  7. 根据权利要求6所述的音频处理方法,其特征在于,所述伴奏音信号为数字音频信号。
  8. 根据权利要求6所述的音频处理方法,其特征在于,所述伴奏音信号为模拟信号,模拟信号的伴奏音信号通过所述电子设备转化为数字信号的伴奏音信号,所述电子设备基于数字信号的伴奏音信号与所述主播音信号进行混音。
  9. 根据权利要求7或8所述的音频处理方法,其特征在于,所述电子设备包括混音模块与通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,
    其中,所述a通路将来自所述有线音频接口的所述伴奏音信号发送给所述混音模块,
    所述b通路将来自所述无线音频接口的主播音信号发送给所述混音模块,
    所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用,以通过互联网进行发布,
    所述d通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,
    所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号发送给所述无线音频接口,以通过无线网络发送给与所述电子设备相关联的所述无线耳机。
  10. 根据权利要求1至9任一项所述的音频处理方法,其特征在于,所述混音通过线性法、固定权重法、动态权重法中的任一混音算法进行。
  11. 一种音频处理方法,用于电子设备,其特征在于,
    启动直播类应用,所述直播类应用发布音频并且接收与发布的音频关联的反馈音频;
    接收包括伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为用于直播类应用发布的第一音频信号;
    通过所述直播类应用发布所述第一音频信号,并经由所述直播类应用通过互联网接收反馈音频,所述反馈音频为第二音频信号;
    将所述第一音频信号与所述第二音频信号混合,得到第三音频信号;
    通过无线通信向无线耳机传输所述第三音频信号。
  12. 根据权利要求11所述的音频处理方法,其特征在于,所述接收包括伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:
    经由有线通信接收第一音频信号,所述第一音频信号混合有伴奏音信号与主播音信号。
  13. 根据权利要求12所述的音频处理方法,其特征在于,所述第一音频信号为数字音频信号。
  14. 根据权利要求12所述的音频处理方法,其特征在于,所述第一音频信号为模拟音频信号,所述模拟音频信号通过所述电子设备转化为用于生成所述发布的音频的数字音频信号。
  15. 根据权利要求12所述的音频处理方法,其特征在于,所述电子设备包括混音模块与通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音 频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立第一通路、第二通路、第三通路、以及第四通路,
    其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述应用进行发布,
    所述第二通路将来自所有线音频接口的所述第一音频信号发送给所述混音模块,
    所述第三通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,
    所述第四通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号通过无线通信发送给所述无线音频接口,以便所述无线耳机监听。
  16. 根据权利要求11所述的音频处理方法,其特征在于,所述接收包括伴奏音信号在内的音频信号,并将所接收的音频信号作为或处理后作为第一音频信号包括:
    经由有线通信接收所述伴奏音信号;
    经由无线通信接收主播音信号;
    基于所述伴奏音信号与所述主播音信号进行混音,得到所述第一音频信号。
  17. 根据权利要求16所述的方法,其特征在于,所述伴奏音信号为数字音频信号。
  18. 根据权利要求16所述的音频处理方法,其特征在于,所述伴奏音信号为模拟音频信号,所述模拟音频信号通过所述电子设备转化为数字音频信号,所述电子设备基于所述数字音频信号的伴奏音信号与所述主播音信号进行混音。
  19. 根据权利要求16所述的音频处理方法,其特征在于,所述电子设备包括混音模块与通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,
    其中,所述a通路将来自所述有线音频接口的所述伴奏音信号发送给所述混音模块,
    所述b通路将来自所述无线音频接口的主播音信号发送给所述混音模块,
    所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用,以通过互联网进行发布,
    所述d通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,
    所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号发送给所述无线音频接口,以通过无线网络发送给与所述电子设备相关联的所述无线耳机。
  20. 根据权利要求11至19任一项所述的音频处理方法,其特征在于,所述混音通过线性法、固定权重法、动态权重法中的任一混音算法进行。
  21. 一种计算机可读存储介质,其特征在于,存储了计算机可读代码,所述计算机可读代码当由一个或多个处理器运行时,使得所述处理器执行如权利要求11至20任一项所述的音频处理方法。
  22. 一种电子设备,用于通过直播类应用发布音频并接收与所发布的音频关联的反馈音频,其特征在于,包括:
    无线音频接口与有线音频接口;
    音频信号采集模块,用于通过所述无线音频接口以及有线音频接口接收包括伴奏音信号在内 的音频信号;
    通路控制模块和混音模块,
    所述通路控制模块用于当所述电子设备启动所述直播类应用时,启用所述无线音频接口与有线音频接口,并且将所述音频信号采集模块所采集得到的音频信号发送给所述混音模块,
    所述混音模块用于基于所述音频信号采集模块所接收的音频信号,获得用于通过所述直播类应用进行发布的第一音频信号,且所述混音模块还用于基于所述第一音信号与所述直播类应用接收的作为反馈的第二音频信号进行混音生成第三音频信号,
    所述通路控制模块还用于将所述第三音频信号发送给所述无线音频接口,以通过无线通信进行传输,以便与所述电子设备关联的无线耳机进行监听。
  23. 根据权利要求22所述的电子设备,其特征在于,所述电子设备的包括所述混音模块与所述通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,
    所述通路控制模块启动所述混音模块以便进行混音,并建立第一通路、第二通路、第三通路、以及第四通路,
    其中,所述第一通路将来自所述有线音频接口的所述第一音频信号发送给所述直播类应用,以通过所述直播类应用进行发布,
    所述第二通路将来自所述有线音频接口的所述第一音频信号发送给所述混音模块,
    所述第三通路由所述直播类应用将通过互联网接收的第二音频信号发送给所述混音模块,
    所述混音模块将所述第一音频信号与所述第二音频信号进行混音,得到所述第三音频信号,
    所述第四通路将所述第三音频信号由所述混音模块发送给无线音频接口。
  24. 根据权利要求22所述的电子设备,其特征在于,所述电子设备包括混音模块与通路控制模块,
    当所述电子设备启动所述直播类应用,所述通路控制模块使得所述电子设备同时启用无线音频接口与有线音频接口,
    所述通路控制模块启动所述混音模块以便进行混音,同时建立a通路、b通路、c通路、d通路、以及e通路,
    其中,所述a通路将来自有线音频接口的所述伴奏音信号发送给所述混音模块,
    所述b通路将来自无线音频接口的主播音信号发送给所述混音模块,
    所述c通路将由所述混音模块基于所述伴奏音信号与所述主播音信号进行混音得到的第一音频信号从所述混音模块发送给所述直播类应用以通过互联网进行发布,
    所述d通路由所述直播类应用将通过无线网络接收的第二音频信号发送给所述混音模块,
    所述e通路将由所述混音模块基于所述第一音频信号与所述第二音频信号混音得到的所述第三音频信号发送给所述无线音频接口,以通过无线网络发送给所述无线耳机。
  25. 根据权利要求22至24任一项所述的电子设备,其特征在于,所述混音模块通过线性法、固定权重法、动态权重法中的任一混音算法进行混音。
PCT/CN2021/118398 2020-09-23 2021-09-15 音频处理方法、计算机可读存储介质、及电子设备 WO2022062979A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21871350.1A EP4210344A4 (en) 2020-09-23 2021-09-15 AUDIO PROCESSING METHOD, COMPUTER READABLE STORAGE MEDIUM AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011008015.4A CN114257905B (zh) 2020-09-23 2020-09-23 音频处理方法、计算机可读存储介质、及电子设备
CN202011008015.4 2020-09-23

Publications (1)

Publication Number Publication Date
WO2022062979A1 true WO2022062979A1 (zh) 2022-03-31

Family

ID=80789770

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118398 WO2022062979A1 (zh) 2020-09-23 2021-09-15 音频处理方法、计算机可读存储介质、及电子设备

Country Status (3)

Country Link
EP (1) EP4210344A4 (zh)
CN (2) CN116437256A (zh)
WO (1) WO2022062979A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095564A (zh) * 2023-04-10 2023-05-09 深圳市嘉润原新显科技有限公司 显示器混音电路及显示器
WO2023217003A1 (zh) * 2022-05-07 2023-11-16 北京字跳网络技术有限公司 音频处理方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317240A1 (en) * 2011-06-10 2012-12-13 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in a Data Stream
CN106375846A (zh) * 2016-09-19 2017-02-01 北京小米移动软件有限公司 直播音频的处理方法及装置
CN108769851A (zh) * 2018-05-28 2018-11-06 广州三星通信技术研究有限公司 一种数字耳机
CN110856062A (zh) * 2019-11-28 2020-02-28 广东辉杰智能科技股份有限公司 一种网络直播的乐器拾音系统
CN111326132A (zh) * 2020-01-22 2020-06-23 北京达佳互联信息技术有限公司 音频处理方法、装置、存储介质及电子设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210763B (zh) * 2016-07-20 2019-07-09 平安健康互联网股份有限公司 Im端同主播端的交互系统及其方法
CN110692252B (zh) * 2017-04-03 2022-11-01 思妙公司 具有用于广域广播的延迟管理的视听协作方法
CN107124661B (zh) * 2017-04-07 2020-05-19 广州市百果园网络科技有限公司 直播频道中的通信方法、装置及系统
CN206728217U (zh) * 2017-05-05 2017-12-08 江西创成微电子有限公司 跨直播平台的语音连麦系统
CN107396137B (zh) * 2017-07-14 2020-06-30 腾讯音乐娱乐(深圳)有限公司 在线互动的方法、装置及系统
CN109788139A (zh) * 2019-03-05 2019-05-21 北京会播科技有限公司 具有直播功能的手机
CN210053547U (zh) * 2019-05-27 2020-02-11 深圳市君睿诚电子有限公司 一种无线直播一体式音箱

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317240A1 (en) * 2011-06-10 2012-12-13 Shazam Entertainment Ltd. Methods and Systems for Identifying Content in a Data Stream
CN106375846A (zh) * 2016-09-19 2017-02-01 北京小米移动软件有限公司 直播音频的处理方法及装置
CN108769851A (zh) * 2018-05-28 2018-11-06 广州三星通信技术研究有限公司 一种数字耳机
CN110856062A (zh) * 2019-11-28 2020-02-28 广东辉杰智能科技股份有限公司 一种网络直播的乐器拾音系统
CN111326132A (zh) * 2020-01-22 2020-06-23 北京达佳互联信息技术有限公司 音频处理方法、装置、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4210344A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023217003A1 (zh) * 2022-05-07 2023-11-16 北京字跳网络技术有限公司 音频处理方法、装置、设备及存储介质
CN116095564A (zh) * 2023-04-10 2023-05-09 深圳市嘉润原新显科技有限公司 显示器混音电路及显示器

Also Published As

Publication number Publication date
CN116437256A (zh) 2023-07-14
EP4210344A1 (en) 2023-07-12
CN114257905B (zh) 2023-04-07
CN114257905A (zh) 2022-03-29
EP4210344A4 (en) 2024-03-20

Similar Documents

Publication Publication Date Title
WO2020098437A1 (zh) 一种播放多媒体数据的方法及电子设备
CN111345010B (zh) 一种多媒体内容同步方法、电子设备及存储介质
CN109559763B (zh) 一种实时数字音频信号混音的方法及装置
US20070087686A1 (en) Audio playback device and method of its operation
WO2022062979A1 (zh) 音频处理方法、计算机可读存储介质、及电子设备
WO2020132818A1 (zh) 无线短距离音频共享方法及电子设备
CN111628916B (zh) 一种智能音箱与电子设备协作的方法及电子设备
WO2020253754A1 (zh) 一种多终端的多媒体数据通信方法和系统
JP2017528009A (ja) マルチメディアファイルを再生するための方法及び装置
WO2022100304A1 (zh) 应用内容跨设备流转方法与装置、电子设备
WO2021249318A1 (zh) 一种投屏方法和终端
WO2020078336A1 (zh) 翻译方法及终端
WO2018152679A1 (zh) 音频文件的传输、接收方法及装置、设备及其系统
WO2022135527A1 (zh) 一种视频录制方法及电子设备
WO2022166618A1 (zh) 一种投屏的方法和电子设备
WO2023125847A1 (zh) 一种音频处理方法、系统及相关装置
CN116795753A (zh) 音频数据的传输处理的方法及电子设备
WO2021244368A1 (zh) 一种视频播放的方法及设备
CN113573119B (zh) 多媒体数据的时间戳生成方法及装置
CN105491302A (zh) 音频信号输出方法、装置、终端及系统
CN115550683A (zh) 一种视频数据的传输方法及装置
WO2021049683A1 (ko) 모바일 단말에 기반한 디지털 무전 시스템
WO2020062861A1 (zh) 一种蓝牙音箱语音播放控制的方法及装置
WO2022068654A1 (zh) 一种终端设备交互方法及装置
WO2024083192A1 (zh) 音频处理方法、装置、设备、存储介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871350

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021871350

Country of ref document: EP

Effective date: 20230406

NENP Non-entry into the national phase

Ref country code: DE