CN113611272A

CN113611272A - Multi-mobile-terminal-based loudspeaking method, device and storage medium

Info

Publication number: CN113611272A
Application number: CN202110771032.1A
Authority: CN
Inventors: 魏耀都; 王国腾; 陈华
Original assignee: Beijing Xiaochang Technology Co ltd
Current assignee: Beijing Xiaochang Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-11-05
Anticipated expiration: 2041-07-08
Also published as: CN113611272B

Abstract

The invention provides a loudspeaking method, a loudspeaking device and a storage medium based on multiple mobile terminals, wherein the loudspeaking method, the loudspeaking device and the storage medium comprise the following steps: configuring a plurality of mobile terminals to respectively reach a loudspeaker state based on the received configuration signals; a microphone of each mobile terminal acquires audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environmental sounds and crosstalk sounds; each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other human sounds exist in the audio information, and only target human sound information corresponding to the current terminal is reserved; synchronizing the target voice information of each mobile terminal to other mobile terminals; each mobile terminal mixes the reserved target voice information, the received target voice information of other terminals and the accompaniment information to generate an external audio; the speaker of the mobile terminal is loudspeaking based on the outgoing audio. The invention can enable different personnel to simultaneously carry out loudspeaking and singing based on a plurality of mobile terminals, and improve the volume and the tone quality of loudspeaking and singing.

Description

Multi-mobile-terminal-based loudspeaking method, device and storage medium

Technical Field

The invention relates to the technical field of audio data processing, in particular to a multi-mobile-terminal-based loudspeaking method, a multi-mobile-terminal-based loudspeaking device and a storage medium.

Background

With the rapid development of science and technology and music culture, the way of singing (singing) is more and more diversified. The mode of carrying out K based on mobile terminal sings has the advantage of convenience, easy operation, and is most favored by everyone, in the present scheme of carrying out K based on mobile terminal sings, include several kinds at least:

in the scheme 1, as shown in fig. 1A, an external microphone (which means not belonging to the mobile terminal) converts singing voice in the space into a digital signal, which is called digitized singing voice. The digitized singing voice is transmitted through a digitized connecting channel between the microphone and the terminal. The mobile terminal beautifies the digitized singing voice and mixes the beautified digitized singing voice and the digitized accompaniment music into digitized audio. And the mixed digital audio is transmitted through a digital connecting channel between the mobile terminal and the external earphone. The external earphone converts the digital audio into sound waves in the space, and the sound waves are transmitted in the closed space between the earphone and the human ears. I.e. external earphones and external microphones are required.

Scheme 2, as shown in fig. 1B, the technical scheme is similar to scheme 1, a connection channel exists between the external microphone and the external earphone (for example, integrated on the same device) in scheme 2, and the digitized singing voice directly enters the external earphone through the connection channel; the mobile terminal only transmits the digital accompaniment to an external earphone; the mixing of the digitized singing voice and the digitized accompaniment is performed in a headphone. I.e. an integrated external earphone and external microphone is required.

In scheme 3, as shown in fig. 1C, in the technical scheme, similar to scheme 1, the mobile terminal plays the mixed digitized audio through an internal speaker without using an external earphone, but needs an external microphone.

In scheme 4, as shown in fig. 1D, similar to scheme 1, a microphone built in the mobile terminal is used, and an external microphone is not used for speaker, but an external earphone is required.

Scheme 5, as shown in fig. 1E, use external loudspeaker microphone (the portable equipment that has loudspeaker and microphone) to carry out digital processing to the song, mobile terminal transmits the digital accompaniment to loudspeaker microphone, and loudspeaker microphone mixes digital song sound with the digital accompaniment, and loudspeaker microphone carries out the space propagation through its built-in speaker with mixed audio frequency. I.e. an external horn microphone is required.

Scheme 6, as shown in fig. 1F, the mobile terminal is held at a distance of normal viewing screen, and the microphone built in the mobile terminal converts the singing voice in the space into digital singing voice; the mobile terminal beautifies the digitized singing voice; the mobile terminal plays the accompaniment audio through the built-in loudspeaker, but does not play the singing voice. I.e. without an external device, but without the ability to play singing voice.

Scheme 7, as shown in fig. 1G, includes two sub-technical schemes. Sub-technical scheme 1: the earphone of the mobile terminal is close to the ear of a person and is held in a phone answering mode, and under the holding mode, the mobile terminal ensures that the quality of sound reaches the level of intelligibility through common processing such as noise reduction, echo cancellation, automatic gain control and the like in a communication system, and experiences similar daily phones. Sub-technical scheme 2: the mobile terminal is held by hand at a distance of normally watching the screen, and performs processing such as noise reduction, echo cancellation, automatic gain control and the like which are common in a communication system. Namely, the audio is processed through the communication system without an external device.

In the scheme 8, as shown in fig. 1H, external microphones convert singing voice in the multipath space into digitized singing voice, and the digitized singing voice is transmitted through a digital connecting channel between the microphones and the terminal. Then, the mobile terminal beautifies the digitized singing voice, the beautified digitized singing voice and the digitized accompaniment music are mixed into digitized audio, the terminal plays the mixed digitized audio through an internal loudspeaker (or external audio playing equipment), the mode needs to use external loudspeaker microphone equipment to carry out K singing, and the lyrics are inconvenient to watch at the same terminal by a plurality of people.

To perform karaoke on the loud speaking condition of the mobile terminal, at least two of the following requirements are satisfied:

1) the sound volume is required to cover the sound coming out of the mouth and two ears simultaneously;

2) and (3) real-time synchronization of voice feedback (monitoring and singing), and addition of sound effects such as reverberation and the like to the audio.

Scheme 9, as shown in fig. 1I, is composed of 1 mobile terminal and a plurality of external speaker microphones, and each user uses 1 speaker microphone. The primary and secondary relations exist among the plurality of horn microphones, and the primary horn microphone is connected with the secondary horn microphone through a wired or wireless digital connection channel. The main loudspeaker is connected with the mobile terminal. When the mobile terminal is used, the mobile terminal provides digital audio of accompaniment music for the main horn microphone through the digital connecting channel, and the character audio of the accompaniment music is synchronously provided for all the secondary microphones by the main microphone and is played by the plurality of horn microphones together.

The singing voice of the user A is collected by a horn microphone A and is only broadcasted by a horn of the horn microphone A; the singing voice of the user B is collected by the horn microphone B and is only played by the horn of the horn microphone B.

The disadvantages of this solution include:

1) the external speaker microphone device is required to be used for singing the K songs of multiple persons.

In the existing karaoke schemes 1 to 9, there are at least the following problems:

1. external equipment (devices) is needed;

2. the audio required to be collected by the Karaoke is processed through the communication system, so that the processing effect is poor, and the Karaoke effect is poor;

3. it is impossible to simultaneously play singing voices collected by a plurality of mobile terminals in one mobile terminal.

Disclosure of Invention

The embodiment of the invention provides a loudspeaking method, a loudspeaking device and a storage medium based on multiple mobile terminals, which can enable different persons to simultaneously loudspeak and K songs based on multiple mobile terminals without any external equipment, effectively process the loudspeaking and K song audio of each mobile terminal holder, improve the volume and the tone quality of loudspeaking and K song, and enable one mobile terminal to simultaneously play the song collected by other mobile terminals.

In a first aspect of the embodiments of the present invention, a loudspeaking method based on multiple mobile terminals is provided, including:

configuring a plurality of mobile terminals to respectively reach a loudspeaker state based on the received configuration signals;

a microphone of each mobile terminal acquires audio frequency to generate audio frequency information, wherein the audio frequency at least comprises a plurality of human voices, environmental sounds and crosstalk sounds;

each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other human sounds exist in the audio information, and only target human sound information corresponding to the current terminal is reserved;

synchronizing the target voice information of each mobile terminal to other mobile terminals;

each mobile terminal mixes the reserved target voice information, the received target voice information of other terminals and the accompaniment information to generate an external audio;

the speaker of the mobile terminal is loudspeaking based on the played-out audio.

Optionally, in a possible implementation manner of the first aspect, the method further includes:

configuring the same accompaniment information for each mobile terminal in advance;

and controlling each mobile terminal to synchronously play the accompaniment information.

Optionally, in a possible implementation manner of the first aspect, synchronizing the target voice information of each mobile terminal to other mobile terminals includes:

presetting maximum delay time;

acquiring transmission delay of a current terminal in receiving target voice information sent by other mobile terminals, and comparing the transmission delay with the maximum delay time;

and if the transmission delay is larger than the maximum delay time, playing the target voice information sent by the mobile terminal after the maximum delay time.

Optionally, in a possible implementation manner of the first aspect, each mobile terminal processing the retained target voice information, the target voice information received from the other terminals, and the accompaniment information to generate an outgoing audio includes:

and carrying out voice beautification and mother tape mixing based on voice feedback loop characteristics pre-configured by the mobile terminal, target voice information of the current terminal, target voice information of other terminals and accompaniment information to obtain the play audio.

Optionally, in a possible implementation manner of the first aspect, the process of configuring the acoustic feedback loop feature for the mobile terminal includes:

sending a preset audio signal to a space where the mobile terminal is located through the loudspeaker;

receiving a feedback audio signal of a space through the microphone;

and comparing the preset audio signal with the feedback audio signal to obtain the characteristics of the acoustic feedback loop in the current space.

the loudspeaker of the mobile terminal based on the loudspeaker of the play audio comprises:

and carrying out digital sound amplification processing on the external sound frequency, and playing the external sound frequency after the digital sound amplification processing through a loudspeaker.

Optionally, in a possible implementation manner of the first aspect, each mobile terminal processes the audio information so that no ambient sound, crosstalk sound, or other human voice exists in the audio information, and obtaining only target human voice information corresponding to a current terminal is reserved includes:

obtaining a transfer function and an amplitude adjustment coefficient between any two terminals;

predicting the audio recorded by the current terminal of other terminals as a reference signal based on the transfer function and the amplitude adjustment coefficient;

and only the target voice is reserved after the reference signal is eliminated through the voice enhancement module.

and storing the external audio and/or the target voice information.

In a second aspect of the embodiments of the present invention, there is provided a speaker device based on multiple mobile terminals, including:

the configuration module is used for configuring the plurality of mobile terminals to enable the plurality of mobile terminals to respectively reach a loudspeaker state based on the received configuration signals;

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for enabling a microphone of each mobile terminal to acquire audio to generate audio information, and the audio at least comprises a plurality of human voices, environmental sounds and crosstalk sounds;

the processing module is used for enabling each mobile terminal to process the audio information so that no environmental sound, crosstalk sound and other human sounds exist in the audio information, and target human sound information only corresponding to the current terminal is reserved;

the synchronization module is used for synchronizing the target voice information of each mobile terminal to other mobile terminals;

the audio mixing module is used for mixing the reserved target voice information, the received target voice information of other terminals and the accompaniment information by each mobile terminal to generate an external audio;

and the play-out module is used for enabling the loudspeaker of the mobile terminal to play out based on the played-out audio.

In a third aspect of the embodiments of the present invention, a readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.

The multi-mobile-terminal-based loudspeaking method, the multi-mobile-terminal-based loudspeaking device and the multi-mobile-terminal-based loudspeaking storage medium can realize the function of synchronously loudspeaking and singing by a plurality of persons through a plurality of mobile terminals on the premise of not changing mobile terminal hardware and not needing any external equipment, process audio information collected by a microphone, reduce the interference of environmental sound, crosstalk sound and other persons held by the mobile terminals on the audio information, improve the fidelity of target human sound, and perform digital processing after human sounds and sound mixing accompaniment sent by different terminals are processed, so that the sound volume of sound after sound mixing played by a loudspeaker is larger.

Drawings

FIG. 1A is a schematic diagram of transmission of voice and data between devices in prior art scheme 1;

FIG. 1B is a schematic diagram of the transmission of voice and data between devices in prior art scheme 2;

FIG. 1C is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scheme 3;

FIG. 1D is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scheme 4;

FIG. 1E is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scenario 5;

FIG. 1F is a schematic diagram of the transmission of voice and data between devices according to prior art scheme 6;

fig. 1G is a schematic diagram of transmission of voice and data between devices in prior art scheme 7;

FIG. 1H is a schematic diagram of the transmission of voice and data between devices according to prior art scheme 8;

fig. 1I is a schematic diagram of transmission of voice and data between devices in a prior art scheme 9;

FIG. 2 is a flow chart of a multi-mobile terminal based loudspeaking method;

FIG. 3 is a schematic diagram illustrating a propagation process of sound/audio in an embodiment of the present invention;

FIG. 4 is a schematic illustration of the process of human voice beautification and mastering;

FIG. 5 is a schematic diagram of a measurement phase of an acoustic feedback loop characteristic;

FIG. 6 is a schematic diagram of a tracking phase of an acoustic feedback loop feature;

FIG. 7 is a schematic illustration of audio information processing;

fig. 8 is a schematic structural diagram of a speaker device based on a multi-mobile terminal.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The invention provides a loudspeaking method based on multiple mobile terminals, which is shown in a flow chart of fig. 2 and comprises the following steps:

step S110, configuring the plurality of mobile terminals to reach speaker states respectively based on the received configuration signals. In step S110, a plurality of mobile terminals are configured according to the environment, because the plurality of mobile terminals are located at different positions and under different scenes, different sound capturing conditions may be generated. For example, in a relatively open space, there is a possibility that an echo may occur when a mobile terminal plays a sound, and therefore, it is necessary to automatically configure the mobile terminal according to the environment of the field when the mobile terminal is used for speaker.

Step S120, a microphone of each mobile terminal acquires audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environmental sounds and crosstalk sounds. The main purpose of the mobile terminal is to loud the user's voice, such as the voice generated when the user sings a song, etc., but in practical situations, there are many noises, such as the voice of the environment, the crosstalk sound of the mobile terminal, etc., at the location where the mobile terminal is located. In step S120, the audio information is generated in preparation for the process of getting the human voice.

Step S130, each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other human sounds exist in the audio information, and target human sound information only corresponding to the current terminal is reserved. Because under the scene that a plurality of mobile terminals are synchronized, a plurality of people can occur simultaneously in a space, and in order to ensure high fidelity of human voice at each mobile terminal, collected audio information can be processed, and environmental sound, crosstalk sound and other human voice in the audio information can be removed. The environmental sound includes sound generated by each device in the environment and sound emitted by other people not holding the mobile terminal in the environment.

And step S140, synchronizing the target voice information of each mobile terminal to other mobile terminals. After each mobile terminal acquires the corresponding target voice information, the target voice information is synchronized to other mobile terminals, so that one mobile terminal processes a plurality of target voice information from different mobile terminals.

And S150, each mobile terminal mixes the reserved target voice information, the received target voice information of other terminals and the accompaniment information to generate the play-out audio. When a plurality of users respectively hold different mobile terminals for karaoke and chorus at the same time, each mobile terminal can mix the corresponding target human voice information, the target human voice information of other mobile terminals and the accompaniment information to obtain chorus audio, namely, play audio, so that the holder and different users of different mobile terminals can be mixed to obtain chorus audio.

And step S160, the loudspeaker of the mobile terminal raises the sound based on the external audio.

In one possible implementation, the sound/audio propagation process is shown in fig. 3, which is only illustrated by 2 mobile terminals. In space, at least human singing voice, i.e., human voice, environmental sound and crosstalk sound, exists, wherein the crosstalk sound is generated by playing through a loudspeaker of the mobile terminal, sound heard by human ears includes sound emitted by human mouth, and the sound emitted by human mouth is transmitted into human ears through 3 paths, including intracranial transmission, spatial transmission and mixed sound transmission through the mobile terminal. When the mobile terminal processes the voice, the accompaniment sound, the target voice information of the current mobile terminal and the target voice information of other mobile terminals can be mixed to obtain the sound played outside after the sound mixing, so that the loudspeaking, the karaoke and the chorus are realized.

In one possible embodiment, the method further comprises:

the same accompaniment information is configured for each mobile terminal in advance. Since the invention is based on the multiple mobile terminals Karaoke and chorus at the same time, the accompaniments are necessarily the same, and therefore, before using the method provided by the invention, each mobile terminal is configured with the same accompaniment information.

And controlling each mobile terminal to synchronously play the accompaniment information. The holder and the user of each mobile terminal can keep synchronization when singing and chorusing.

In one possible embodiment, synchronizing the target vocal information of each mobile terminal to the other mobile terminals comprises:

the maximum delay time is set in advance. In the actual karaoke process, the sound discrimination of the human ears is temporal, if the interval between two sounds is short enough, the same sound is heard by the human ears, so the maximum delay time set in the method can be obtained according to the sound discrimination interval of the human ears.

And acquiring the transmission delay of the current terminal in receiving the target voice information sent by other mobile terminals, and comparing the transmission delay with the maximum delay time. By comparing the transmission delay of the received voice information of the target person sent by the mobile terminal with the maximum delay, whether the Karaoke and chorus are not synchronous can be judged.

And if the transmission delay is larger than the maximum delay time, playing the target voice information sent by the mobile terminal after the maximum delay time. When the transmission delay is larger than the maximum delay time, the chorus asynchronism condition may occur at the moment, so that the received target voice information sent by other mobile terminals is played first, and the situations of karaoke and chorus asynchronism are avoided.

In one possible embodiment, as shown in fig. 4, the generating of the outgoing audio by each mobile terminal according to the reserved target voice information, the target voice information received from the other mobile terminal, and the accompaniment information mixing process includes:

The invention at least comprises two modules of voice beautification and master tape mixing, processes the input of multi-channel audio (target voice information and accompaniment information) through the two modules of voice beautification and master tape mixing, and obtains single-channel audio (external audio) to output, thereby realizing the beautification of the audio.

In one possible embodiment, the process of configuring the acoustic feedback loop characteristics of the mobile terminal includes:

receiving a feedback audio signal of a space through the microphone;

In one possible implementation, as shown in fig. 5, when the speaker/karaoke is not performed, the mobile terminal needs to be configured first, and the speaker of the mobile terminal does not play singing voice or music. Firstly, a mobile terminal loudspeaker is used for actively playing various sound feedback detection signals, a mobile terminal microphone is used for recording the signals, and echo power, frequency response and environment sound mixing impulse response are obtained through analysis and calculation. The obtained calculation result is saved as an acoustic feedback loop characteristic.

In a possible implementation manner, as shown in fig. 6, during the process of loudspeaking/singing, the environment of the mobile terminal may change, the present invention can further change the characteristics of the acoustic feedback loop according to the change of the environment of the mobile terminal, so that the mobile terminal can automatically adapt to the change of the environment and change the characteristics of the acoustic feedback loop, that is, when the environment information and the position information of the mobile terminal change, wherein the environment information and the position information include the attitude, the position and the moving speed of the mobile terminal, the echo power, the frequency response and the impulse response of the environment mixing sound are changed based on the environment information and the position information.

In one possible embodiment, the method further comprises:

The technical scheme provided by the invention can amplify the external audio to realize the volume enhancement based on software.

In one possible implementation, as shown in fig. 7, each mobile terminal processes the audio information so that no ambient sound, crosstalk sound, or other human voice exists in the audio information, and obtaining only target human voice information corresponding to the current terminal includes:

obtaining a transfer function and an amplitude adjustment coefficient between any two terminals; transfer functions and amplitude adjustment coefficients of audio frequencies between different terminals can be obtained according to interaction processes between the terminals.

And predicting the audio recorded by the current terminal of the other terminals as a reference signal based on the transfer function and the amplitude adjustment coefficient.

And only the target voice is reserved after the reference signal is eliminated through the voice enhancement module. The introduction of the predicted reference signal can effectively eliminate non-stationary noise, so that the played audio is more stable.

And after the current terminal receives the audio of other terminals, acquiring corresponding transfer functions between the two terminals, adjusting the signal amplitude according to the amplitude adjustment coefficient, and providing the adjusted signal amplitude and the accompaniment information to the voice enhancement module.

After the voice enhancement, the target voice of the current terminal can be obtained, then voice feedback suppression is carried out, and finally the target voice, the audio information received from other terminals and the accompaniment information are subjected to sound mixing processing to generate the play-out audio.

In the use process of the technical scheme, under the scene of a plurality of people singing with K, each terminal can display the lyrics and singing information, so that a singer can watch the lyrics and the singing information conveniently. And in the multi-player karaoke playing layer, no external equipment is needed for assistance.

Each terminal can synchronously play the sound recorded by the current terminal and other terminals. In the existing scheme, one terminal can only play the sound recorded by the current terminal or the sound recorded by other terminals.

And carrying out sound quality enhancement processing on the target voice information.

In one possible embodiment, the method further comprises:

and storing the external audio and/or the target voice information. In the actual processes of sound amplification, karaoke and chorus, the requirement of recording may exist, and the audio frequency and/or the target voice information can be correspondingly stored according to the corresponding requirement, so that subsequent calling and answering are facilitated.

In another embodiment of the present invention, a speaker device based on multiple mobile terminals, as shown in fig. 8, includes:

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A loudspeaking method based on multiple mobile terminals is characterized by comprising the following steps:

2. The multi-mobile-terminal based loudspeaking method of claim 1, further comprising:

3. The multi-mobile-terminal based loudspeaking method of claim 1,

synchronizing the target voice information of each mobile terminal to other mobile terminals includes:

presetting maximum delay time;

4. The multi-mobile-terminal based loudspeaking method of claim 1,

each mobile terminal generates the play audio by mixing the reserved target voice information, the target voice information received from other terminals and the accompaniment information, and comprises the following steps:

5. The multi-mobile-terminal based loudspeaking method of claim 4,

the process of carrying out the acoustic feedback loop characteristic configuration on the mobile terminal comprises the following steps:

receiving a feedback audio signal of a space through the microphone;

6. The multi-mobile-terminal based loudspeaking method of claim 1, further comprising:

7. The multi-mobile-terminal based loudspeaking method of claim 1,

each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other human sounds exist in the audio information, and obtaining target human sound information only corresponding to the current terminal is reserved, wherein the target human sound information comprises:

8. The multi-mobile-terminal based loudspeaking method of claim 1, further comprising:

and storing the external audio and/or the target voice information.

9. A speaker device based on multiple mobile terminals, comprising:

10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 8.