CN113611272B - Multi-mobile-terminal-based loudspeaker method, device and storage medium - Google Patents

Multi-mobile-terminal-based loudspeaker method, device and storage medium Download PDF

Info

Publication number
CN113611272B
CN113611272B CN202110771032.1A CN202110771032A CN113611272B CN 113611272 B CN113611272 B CN 113611272B CN 202110771032 A CN202110771032 A CN 202110771032A CN 113611272 B CN113611272 B CN 113611272B
Authority
CN
China
Prior art keywords
mobile terminal
audio
information
target voice
voice information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110771032.1A
Other languages
Chinese (zh)
Other versions
CN113611272A (en
Inventor
魏耀都
王国腾
陈华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaochang Technology Co ltd
Original Assignee
Beijing Xiaochang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaochang Technology Co ltd filed Critical Beijing Xiaochang Technology Co ltd
Priority to CN202110771032.1A priority Critical patent/CN113611272B/en
Publication of CN113611272A publication Critical patent/CN113611272A/en
Application granted granted Critical
Publication of CN113611272B publication Critical patent/CN113611272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1781Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions
    • G10K11/17813Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions characterised by the analysis of the acoustic paths, e.g. estimating, calibrating or testing of transfer functions or cross-terms
    • G10K11/17819Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase characterised by the analysis of input or output signals, e.g. frequency range, modes, transfer functions characterised by the analysis of the acoustic paths, e.g. estimating, calibrating or testing of transfer functions or cross-terms between the output signals and the reference signals, e.g. to prevent howling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/02Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W56/00Synchronisation arrangements
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephone Function (AREA)

Abstract

The invention provides a multi-mobile terminal-based loudspeaker method, a multi-mobile terminal-based loudspeaker device and a storage medium, wherein the loudspeaker method comprises the following steps: configuring the plurality of mobile terminals based on the received configuration signals to respectively reach a loudspeaker state; the microphone of each mobile terminal acquires audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices; each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other voice exist in the audio information, and only target voice information corresponding to the current terminal is reserved; synchronizing the target voice information of each mobile terminal to other mobile terminals; each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and accompaniment information to generate playing audio; the speaker of the mobile terminal is based on the loud-speaker. The invention can make different people to play the song and the sound based on a plurality of mobile terminals at the same time, and improve the volume and the tone quality of the played song and the sound.

Description

Multi-mobile-terminal-based loudspeaker method, device and storage medium
Technical Field
The present invention relates to the field of audio data processing technologies, and in particular, to a method and apparatus for speaker based on multiple mobile terminals, and a storage medium.
Background
With the rapid development of science, technology and music culture, the modes of singing (singing) are more and more diversified. The mode of K song based on mobile terminal has convenient, easy operating's advantage, and is most popular, in the scheme of K song based on mobile terminal at present, including at least following several:
scheme 1, as shown in fig. 1A, converts the singing voice in space into a digital signal by an external microphone (external microphone does not belong to the mobile terminal), which is called a digital singing voice. The digitized singing voice is transmitted through a digitized connection channel between the microphone and the terminal. The mobile terminal beautifies the digitized singing voice and mixes the beautified digitized singing voice and the digitized accompaniment music into digitized audio. The mixed digital audio is transmitted through a digital connecting channel between the mobile terminal and the external earphone. The external earphone converts the digital audio into sound waves in space, and the sound waves are transmitted in a closed space between the earphone and the human ear. I.e. an external earphone and an external microphone are required.
Scheme 2, as shown in fig. 1B, the technical scheme is similar to scheme 1, in which a connection channel exists between the external microphone and the external earphone (for example, integrated on the same device), and the digitized singing voice directly enters the external earphone through the connection channel; the mobile terminal only transmits the digital accompaniment to the external earphone; a mix of digitized singing and digitized accompaniment is performed in the headphones. I.e. an integrated external earphone and external microphone is required.
Scheme 3, as shown in fig. 1C, is similar to scheme 1 in that the mobile terminal plays the mixed digitized audio through the internal speaker without using an external earphone, but requires an external microphone.
Scheme 4, as shown in fig. 1D, uses a microphone built in the mobile terminal, does not use an external microphone for speaker, but requires an external earphone, similar to scheme 1.
Scheme 5, as shown in fig. 1E, the external speaker microphone (portable device with speaker and microphone) is used to digitally process the song, the mobile terminal transmits the digital accompaniment to the speaker microphone, the speaker microphone mixes the digital song with the digital accompaniment, and the speaker microphone spatially propagates the mixed audio through its built-in speaker. I.e. an external horn microphone is required.
Scheme 6, as shown in fig. 1F, the mobile terminal is held at a distance for normal viewing of the screen, and the microphone built in the mobile terminal converts singing voice in space into digital singing voice; the mobile terminal beautifies the digitized singing voice; the mobile terminal plays the accompaniment audio through the built-in speaker, but does not play singing. I.e. no external device is needed, but no singing voice can be played.
Scheme 7, as shown in fig. 1G, includes two sub-schemes. Sub-technical scheme 1: the earphone of the mobile terminal is attached to the ear of a person to hold the mobile terminal in a telephone answering mode, and in the holding mode, the mobile terminal ensures that the sound quality reaches an understandable degree through common noise reduction, echo cancellation, automatic gain control and other processes in a communication system, and experiences a telephone similar to a daily telephone. Sub-technical scheme 2: the mobile terminal is held by hand at a normal screen watching distance, and the mobile terminal performs the common processing of noise reduction, echo cancellation, automatic gain control and the like in a call system. Namely, an external device is not needed, and the audio is processed through the communication system.
In the scheme 8, as shown in fig. 1H, the external multiple microphones convert the singing voice in the multipath space into digital singing voice, and the digital singing voice is transmitted through the digital connection channel between the microphone and the terminal. Then, the mobile terminal beautifies the digital singing voice, mixes the beautified digital singing voice with the digital accompaniment music to form digital audio, and plays the mixed digital audio through an internal loudspeaker (or an external audio playing device), wherein the mode can only carry out K songs by using an external loudspeaker microphone device, and the lyrics of a plurality of K songs are inconvenient to watch on the same terminal.
To make a K song under the mobile terminal's speaker condition, at least two points are required:
1) The hearing of the volume is to cover the sound coming out of the mouth and cover two ears at the same time;
2) And synchronizing human voice feedback (monitoring and singing voice) in real time, and adding sound effects such as reverberation and the like to the audio.
Scheme 9, as shown in fig. 1I, consists of 1 mobile terminal and a plurality of external horn microphones, 1 horn microphone is used by each user. A primary-secondary relationship exists among the plurality of horn microphones, and the primary horn microphones are connected with the secondary horn microphones through wired or wireless digital connection channels. The main loudspeaker is connected with the mobile terminal. When the mobile terminal is used, the mobile terminal provides digital audio of accompaniment music for the main speaker microphone through the digital connecting channel, and the digital audio of the accompaniment music is synchronously transmitted to all the secondary microphones by the main microphone and is played together by the plurality of speaker microphones.
The singing voice of the user A is collected by the horn microphone A and is broadcasted only by the horn of the horn microphone A; the singing voice of the user B is collected by the horn microphone B and played only by the horn of the horn microphone B.
Disadvantages of this solution include:
1) An external speaker-microphone device must be used to make a multi-person K song.
In the existing K song schemes 1 to 9, there are at least the following problems:
1. external equipment (devices) are needed;
2. the audio required to be acquired by the K song is processed through the communication system, so that the processing effect is poor, and the K song effect is poor;
3. singing voice collected by a plurality of mobile terminals cannot be played in one mobile terminal at the same time.
Disclosure of Invention
The embodiment of the invention provides a multi-mobile-terminal-based sound playing method, a multi-mobile-terminal-based sound playing device and a multi-mobile-terminal-based storage medium, which can enable different people to simultaneously play a sound and a K song based on a plurality of mobile terminals, effectively process the audio of the sound and the K song of a holder of each mobile terminal, improve the volume and the tone quality of the sound and the K song, and enable one mobile terminal to simultaneously acquire singing sounds of other mobile terminals.
In a first aspect of an embodiment of the present invention, a method for speaker based on multiple mobile terminals is provided, including:
configuring the plurality of mobile terminals based on the received configuration signals to respectively reach a loudspeaker state;
the microphone of each mobile terminal acquires audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices;
each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other voice exist in the audio information, and only target voice information corresponding to the current terminal is reserved;
synchronizing the target voice information of each mobile terminal to other mobile terminals;
each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and accompaniment information to generate playing audio;
the speaker of the mobile terminal is based on the loud-speaker.
Optionally, in one possible implementation manner of the first aspect, the method further includes:
the same accompaniment information is configured for each mobile terminal in advance;
and controlling each mobile terminal to synchronously play the accompaniment information.
Optionally, in one possible implementation manner of the first aspect, synchronizing the target voice information of each mobile terminal to other mobile terminals includes:
presetting a maximum delay time;
acquiring the transmission delay of the current terminal in receiving the target voice information sent by other mobile terminals, and comparing the transmission delay with the maximum delay time;
and if the transmission delay is larger than the maximum delay time, playing the target voice information sent by the mobile terminal after the maximum delay time.
Optionally, in one possible implementation manner of the first aspect, generating, by each mobile terminal, the playback audio by performing audio mixing processing on the reserved target voice information, the target voice information received from the other terminals, and the accompaniment information includes:
and carrying out voice beautification and master tape mixing based on the voice feedback loop characteristics pre-configured by the mobile terminal, the target voice information of the current terminal and the target voice information of other terminals to obtain the playing audio.
Optionally, in a possible implementation manner of the first aspect, the process of configuring the acoustic feedback loop feature of the mobile terminal includes:
transmitting a preset audio signal to a space where the mobile terminal is located through the loudspeaker;
receiving a spatial feedback audio signal through the microphone;
and comparing the preset audio signal with the feedback audio signal to obtain the acoustic feedback loop characteristic of the current space.
Optionally, in one possible implementation manner of the first aspect, the method further includes:
the speaker of the mobile terminal includes, based on the loud-speaker:
and carrying out digital sound amplification processing on the play-out sound, and playing the play-out sound after the digital sound amplification processing through a loudspeaker.
Optionally, in one possible implementation manner of the first aspect, each mobile terminal processes the audio information so that no environmental sound, crosstalk sound, and other voice exist in the audio information, and obtaining to only preserve the target voice information corresponding to the current terminal includes:
acquiring a transfer function and an amplitude adjustment coefficient between any two terminals;
predicting the audio recorded by the current terminal by other terminals as a reference signal based on the transfer function and the amplitude adjustment coefficient;
and only the target voice is reserved after the reference signal is eliminated by the voice enhancement module.
Optionally, in one possible implementation manner of the first aspect, the method further includes:
and storing the play-out audio and/or the target voice information.
In a second aspect of the embodiment of the present invention, there is provided a speaker device based on a multi-mobile terminal, including:
the configuration module is used for configuring the plurality of mobile terminals based on the received configuration signals so as to respectively reach a loudspeaker state;
the acquisition module is used for enabling the microphone of each mobile terminal to acquire audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices;
the processing module is used for enabling each mobile terminal to process the audio information so that the audio information does not contain environmental sound, crosstalk sound and other voice, and only target voice information corresponding to the current terminal is reserved;
the synchronization module is used for synchronizing the target voice information of each mobile terminal to other mobile terminals;
the audio mixing module is used for generating playing audio by audio mixing processing of the reserved target voice information, the target voice information received by other terminals and the accompaniment information;
and the external playing module is used for enabling the loudspeaker of the mobile terminal to speaker based on the external playing audio.
In a third aspect of the embodiments of the present invention, there is provided a readable storage medium having stored therein a computer program for implementing the method of the first aspect and the various possible designs of the first aspect when the computer program is executed by a processor.
According to the multi-mobile-terminal-based sound-raising method, device and storage medium, provided by the invention, on the premise that mobile terminal hardware is not changed and no external equipment is needed, the functions of synchronous sound raising and song raising of a plurality of people through a plurality of mobile terminals are realized, the audio information collected by microphones is processed, the interference of environmental sound, crosstalk sound and other mobile-terminal holding personnel on the audio information is reduced, the fidelity of target voice is improved, digital sound-amplifying processing is carried out after voice mixing of voice and accompaniment sent by different terminals, and the sound volume of the mixed voice played by a loudspeaker is larger.
Drawings
FIG. 1A is a schematic diagram of voice and data transmission between devices according to scheme 1 in the prior art;
FIG. 1B is a schematic diagram of voice and data transmission between devices according to scheme 2 of the prior art;
FIG. 1C is a schematic diagram of voice and data transmission between devices according to scheme 3 of the prior art;
FIG. 1D is a schematic diagram of voice and data transmission between devices according to scheme 4 of the prior art;
FIG. 1E is a schematic diagram of voice and data transmission between devices according to scheme 5 of the prior art;
FIG. 1F is a schematic diagram of voice and data transmission between devices according to scheme 6 of the prior art;
FIG. 1G is a schematic diagram of voice and data transmission between devices according to scheme 7 of the prior art;
FIG. 1H is a schematic diagram of voice and data transmission between devices according to scheme 8 of the prior art;
FIG. 1I is a diagram showing the transmission of voice and data between devices according to scheme 9 of the prior art;
fig. 2 is a flow chart of a multi-mobile terminal based loudspeaker method;
FIG. 3 is a schematic diagram illustrating a sound/audio propagation process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process of human voice beautification and master tape shrinkage;
FIG. 5 is a schematic diagram of a measurement phase of an acoustic feedback loop feature;
FIG. 6 is a schematic diagram of a tracking phase of an acoustic feedback loop feature;
FIG. 7 is a schematic diagram of audio information processing;
fig. 8 is a schematic structural diagram of a speaker device based on a multi-mobile terminal.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.
It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.
It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
The invention provides a multi-mobile terminal-based loudspeaker method, as shown in a flow chart of FIG. 2, comprising the following steps:
step S110, a plurality of mobile terminals are configured based on the received configuration signals so as to respectively reach the loudspeaker state. In step S110, a plurality of mobile terminals are configured according to the environmental conditions, because the plurality of mobile terminals are located at different positions and under different scenes, and different sound acquisition conditions may occur. For example, in a relatively open space, echo may be generated when the mobile terminal plays sound, so that when the mobile terminal is used for playing sound, automatic configuration is required according to the environmental condition of the site.
Step S120, the microphone of each mobile terminal acquires audio to generate audio information, where the audio includes at least a plurality of human voices, ambient voices and crosstalk voices. The mobile terminal mainly aims to speaker the sound of a user, such as the sound generated when the user sings, etc., but in practical situations, the mobile terminal is located in a position where many noise exists, such as the sound of a speaking environment, the crosstalk sound of the mobile terminal, etc. In step S120, audio information is generated in preparation for the process of obtaining a human voice.
And step S130, each mobile terminal processes the audio information so that the audio information does not contain environmental sound, crosstalk sound and other voice, and only target voice information corresponding to the current terminal is reserved. Because a plurality of people can occur simultaneously in one space under the condition that a plurality of mobile terminals are synchronized, in order to ensure the high fidelity of the voice of each mobile terminal, the collected audio information is processed, and the environmental voice, crosstalk voice and other voice in the audio information are removed. Ambient sound includes sounds generated by various devices in the environment as well as sounds generated by other persons in the environment that do not have a mobile terminal.
And step S140, synchronizing the target voice information of each mobile terminal to other mobile terminals. After each mobile terminal acquires the corresponding target voice information, the corresponding target voice information is synchronously transmitted to other mobile terminals, so that one mobile terminal processes a plurality of target voice information from different mobile terminals.
And step S150, each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and accompaniment information to generate playing audio. When a plurality of users hold different mobile terminals respectively and simultaneously singing and chorus, each mobile terminal mixes the corresponding target voice information, the target voice information of other mobile terminals and accompaniment information to obtain chorus audio, namely playing audio, so that the owners of different mobile terminals and different users can mix the voice to obtain chorus audio.
Step S160, the loudspeaker of the mobile terminal is based on the playing audio speaker.
In one possible implementation, the sound/audio propagation process is shown in fig. 3, which is only illustrated with 2 mobile terminals. In space, at least human singing voice, namely human voice, environment voice and crosstalk voice exists, wherein the crosstalk voice is generated by playing through a loudspeaker of the mobile terminal, sound heard by human ears comprises sound emitted by human mouths, and the sound emitted by the human mouths is transmitted into human ears through 3 paths, including intracranial propagation, spatial propagation and propagation after being mixed through the mobile terminal. When the mobile terminal processes the voice, the accompaniment sound, the target voice information of the current mobile terminal and the target voice information of other mobile terminals can be mixed to obtain the mixed external voice, so that the effects of playing, singing and chorus are achieved.
In one possible embodiment, the method further comprises:
the same accompaniment information is previously configured for each mobile terminal. Since the invention is based on the simultaneous singing and chorus of a plurality of mobile terminals, the accompaniment is necessarily the same, and the same accompaniment information is configured for each mobile terminal before the method provided by the invention is used.
And controlling each mobile terminal to synchronously play the accompaniment information. So that the holder, user of each mobile terminal can stay synchronized while singing, chorus.
In one possible implementation, synchronizing the target voice information of each mobile terminal to other mobile terminals includes:
the maximum delay time is preset. In the actual karaoke process, the separation of the human ear from the sound is time-consuming, and if the interval between the two sounds is short enough, the human ear hears the same sound, so the maximum delay time set in the present invention can be obtained according to the separation of the human ear from the sound.
And acquiring the transmission delay of the target voice information sent by the other mobile terminals at the current terminal, and comparing the transmission delay with the maximum delay time. By comparing the transmission delay of the target voice information sent by the mobile terminal with the maximum delay, whether the situation of K songs and chorus out of sync possibly occurs can be judged.
And if the transmission delay is larger than the maximum delay time, playing the target voice information sent by the mobile terminal after the maximum delay time. When the transmission delay is longer than the maximum delay time, the condition of out-of-sync chorus may occur at this time, so that the received target voice information sent by other mobile terminals is played first, so as to avoid the condition of out-of-sync chorus and chorus.
In one possible implementation, as shown in fig. 4, generating, by each mobile terminal, the playback audio by mixing the reserved target voice information, the target voice information received from the other terminals, and the accompaniment information includes:
and carrying out voice beautification and master tape mixing based on the voice feedback loop characteristics pre-configured by the mobile terminal, the target voice information of the current terminal and the target voice information of other terminals to obtain the playing audio.
The invention at least comprises two modules of voice beautification and master tape shrinkage, and the voice beautification and master tape shrinkage are used for processing the input of multiple paths of audios (target voice information and accompaniment information) and obtaining and outputting single paths of audios (externally playing audios) so as to beautify the audios.
In one possible implementation, the process of configuring the acoustic feedback loop feature of the mobile terminal includes:
transmitting a preset audio signal to a space where the mobile terminal is located through the loudspeaker;
receiving a spatial feedback audio signal through the microphone;
and comparing the preset audio signal with the feedback audio signal to obtain the acoustic feedback loop characteristic of the current space.
In one possible implementation, as shown in fig. 5, when the speaker/K song is not being played, the mobile terminal needs to be configured first, and then the speaker of the mobile terminal does not play the singing or music. Firstly, using a loudspeaker of a mobile terminal to actively play various acoustic feedback detection signals, recording the signals by a microphone of the mobile terminal, and obtaining echo power, frequency response and environment mixing impulse response through analysis and calculation. The obtained calculation result is saved as an acoustic feedback loop feature.
In one possible implementation manner, as shown in fig. 6, the environment where the mobile terminal is located may change during the process of playing the voice/K song, so that the present invention can change the characteristics of the acoustic feedback loop according to the change of the environment where the mobile terminal is located, so that the mobile terminal can automatically adapt to the change of the environment to change the characteristics of the acoustic feedback loop, that is, when the environment information and the position information where the mobile terminal is located change, where the environment information and the position information include the gesture, the position and the moving speed of the mobile terminal, and change the echo power, the frequency response and the impulse response of the environmental mixing based on the environment information and the position information.
In one possible embodiment, the method further comprises:
the speaker of the mobile terminal includes, based on the loud-speaker:
and carrying out digital sound amplification processing on the play-out sound, and playing the play-out sound after the digital sound amplification processing through a loudspeaker.
According to the technical scheme provided by the invention, the amplified audio can be amplified, so that the volume enhancement can be realized based on software.
In one possible implementation manner, as shown in fig. 7, each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other voice exist in the audio information, and the obtaining of only retaining the target voice information corresponding to the current terminal includes:
acquiring a transfer function and an amplitude adjustment coefficient between any two terminals; the transfer function and the amplitude adjustment coefficient of the audio frequency between different terminals can be obtained according to the interaction process between the terminals.
And predicting the audio recorded by the current terminal by other terminals as a reference signal based on the transfer function and the amplitude adjustment coefficient.
And only the target voice is reserved after the reference signal is eliminated by the voice enhancement module. By introducing the predicted reference signal, the non-stationary noise can be effectively eliminated, so that the played audio is more stable.
And after the current terminal receives the audio of other terminals, acquiring a transfer function corresponding to the two terminals, adjusting the signal amplitude according to the amplitude adjustment coefficient, and providing the signal amplitude and the accompaniment information to the voice enhancement module.
And after the voice is enhanced, the target voice of the current terminal can be obtained, then the voice feedback inhibition is carried out, and finally the target voice, the received audio information of other terminals and the accompaniment information are subjected to audio mixing processing to generate the playing audio.
According to the technical scheme, in the using process, under the scene of K songs of multiple people, each terminal can display lyrics and singing information, and a singer can watch the lyrics and singing information conveniently. In addition, in the passing layer of the K songs of the plurality of people, no external equipment is needed for assistance.
According to the invention, each terminal can synchronously play the sound recorded by the current terminal and other terminals. In the existing scheme, one terminal can only play the sound recorded by the current terminal alone or play the sound recorded by other terminals alone.
And carrying out tone quality enhancement processing on the target voice information.
In one possible embodiment, the method further comprises:
and storing the play-out audio and/or the target voice information. In the actual sound amplifying, K singing and chorus processes, the recording requirement can exist, and the external audio and/or the target voice information can be stored correspondingly according to the corresponding requirement, so that the subsequent retrieval and hearing are convenient.
The technical scheme of the invention also provides a loudspeaker device based on a plurality of mobile terminals, as shown in a structural schematic diagram of fig. 8, comprising:
the configuration module is used for configuring the plurality of mobile terminals based on the received configuration signals so as to respectively reach a loudspeaker state;
the acquisition module is used for enabling the microphone of each mobile terminal to acquire audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices;
the processing module is used for enabling each mobile terminal to process the audio information so that the audio information does not contain environmental sound, crosstalk sound and other voice, and only target voice information corresponding to the current terminal is reserved;
the synchronization module is used for synchronizing the target voice information of each mobile terminal to other mobile terminals;
the audio mixing module is used for generating playing audio by audio mixing processing of the reserved target voice information, the target voice information received by other terminals and the accompaniment information;
and the external playing module is used for enabling the loudspeaker of the mobile terminal to speaker based on the external playing audio.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). In addition, the ASIC may reside in a user device. The processor and the readable storage medium may reside as discrete components in a communication device. The readable storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tape, floppy disk, optical data storage device, etc.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, the execution instructions being executed by the at least one processor to cause the device to implement the methods provided by the various embodiments described above.
In the above embodiments of the terminal or the server, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (8)

1. A multi-mobile terminal-based loudspeaker method, comprising:
configuring the plurality of mobile terminals based on the received configuration signals to respectively reach a loudspeaker state;
the microphone of each mobile terminal acquires audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices;
each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other voice exist in the audio information, and only target voice information corresponding to the current terminal is reserved;
synchronizing the target voice information of each mobile terminal to other mobile terminals;
each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and accompaniment information to generate playing audio;
a speaker of the mobile terminal speaks based on the externally played audio;
each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and the accompaniment information to generate an audio playback audio, wherein the audio playback audio generation comprises: performing voice beautification and master tape mixing based on voice feedback loop characteristics pre-configured by the mobile terminal, target voice information of the current terminal and target voice information of other terminals and accompaniment information to obtain externally-played audio;
the process for configuring the acoustic feedback loop characteristics of the mobile terminal comprises the following steps: transmitting a preset audio signal to a space where the mobile terminal is located through the loudspeaker; receiving a spatial feedback audio signal through the microphone; and comparing the preset audio signal with the feedback audio signal to obtain the acoustic feedback loop characteristic of the current space.
2. The multi-mobile terminal-based speaker method of claim 1, further comprising:
the same accompaniment information is configured for each mobile terminal in advance;
and controlling each mobile terminal to synchronously play the accompaniment information.
3. The multi-mobile terminal-based speaker method of claim 1, wherein,
synchronizing the target voice information of each mobile terminal to other mobile terminals includes:
presetting a maximum delay time;
acquiring the transmission delay of the current terminal in receiving the target voice information sent by other mobile terminals, and comparing the transmission delay with the maximum delay time;
and if the transmission delay is larger than the maximum delay time, playing the target voice information sent by the mobile terminal after the maximum delay time.
4. The multi-mobile terminal-based speaker method of claim 1, further comprising:
the speaker of the mobile terminal includes, based on the loud-speaker:
and carrying out digital sound amplification processing on the play-out sound, and playing the play-out sound after the digital sound amplification processing through a loudspeaker.
5. The multi-mobile terminal-based speaker method of claim 1, wherein,
each mobile terminal processes the audio information so that no environmental sound, crosstalk sound and other voice exist in the audio information, and obtaining the target voice information which only keeps corresponding to the current terminal comprises the following steps:
acquiring a transfer function and an amplitude adjustment coefficient between any two terminals;
predicting the audio recorded by the current terminal by other terminals as a reference signal based on the transfer function and the amplitude adjustment coefficient;
and only the target voice is reserved after the reference signal is eliminated by the voice enhancement module.
6. The multi-mobile terminal-based speaker method of claim 1, further comprising:
and storing the play-out audio and/or the target voice information.
7. A multi-mobile terminal-based speaker device, comprising:
the configuration module is used for configuring the plurality of mobile terminals based on the received configuration signals so as to respectively reach a loudspeaker state;
the acquisition module is used for enabling the microphone of each mobile terminal to acquire audio to generate audio information, wherein the audio at least comprises a plurality of human voices, environment voices and crosstalk voices;
the processing module is used for enabling each mobile terminal to process the audio information so that the audio information does not contain environmental sound, crosstalk sound and other voice, and only target voice information corresponding to the current terminal is reserved;
the synchronization module is used for synchronizing the target voice information of each mobile terminal to other mobile terminals;
the audio mixing module is used for generating playing audio by audio mixing processing of the reserved target voice information, the target voice information received by other terminals and the accompaniment information;
the external playing module is used for enabling a loudspeaker of the mobile terminal to speaker based on the external playing audio;
each mobile terminal mixes the reserved target voice information, the target voice information received by other terminals and the accompaniment information to generate an audio playback audio, wherein the audio playback audio generation comprises: performing voice beautification and master tape mixing based on voice feedback loop characteristics pre-configured by the mobile terminal, target voice information of the current terminal and target voice information of other terminals and accompaniment information to obtain externally-played audio;
the process for configuring the acoustic feedback loop characteristics of the mobile terminal comprises the following steps: transmitting a preset audio signal to a space where the mobile terminal is located through the loudspeaker; receiving a spatial feedback audio signal through the microphone; and comparing the preset audio signal with the feedback audio signal to obtain the acoustic feedback loop characteristic of the current space.
8. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program for implementing the method of any of claims 1 to 6 when being executed by a processor.
CN202110771032.1A 2021-07-08 2021-07-08 Multi-mobile-terminal-based loudspeaker method, device and storage medium Active CN113611272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110771032.1A CN113611272B (en) 2021-07-08 2021-07-08 Multi-mobile-terminal-based loudspeaker method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110771032.1A CN113611272B (en) 2021-07-08 2021-07-08 Multi-mobile-terminal-based loudspeaker method, device and storage medium

Publications (2)

Publication Number Publication Date
CN113611272A CN113611272A (en) 2021-11-05
CN113611272B true CN113611272B (en) 2023-09-29

Family

ID=78304140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110771032.1A Active CN113611272B (en) 2021-07-08 2021-07-08 Multi-mobile-terminal-based loudspeaker method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113611272B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101278597A (en) * 2005-10-01 2008-10-01 三星电子株式会社 Method and apparatus to generate spatial sound
CN109887523A (en) * 2019-01-21 2019-06-14 北京小唱科技有限公司 Audio data processing method and device, electronic equipment and storage medium for application of singing
CN209657794U (en) * 2018-12-20 2019-11-19 孙卫平 A kind of Karaoke control circuit, Karaoke control device and mobile terminal
CN110970045A (en) * 2019-11-15 2020-04-07 北京达佳互联信息技术有限公司 Mixing processing method, mixing processing device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8779268B2 (en) * 2009-06-01 2014-07-15 Music Mastermind, Inc. System and method for producing a more harmonious musical accompaniment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101278597A (en) * 2005-10-01 2008-10-01 三星电子株式会社 Method and apparatus to generate spatial sound
CN209657794U (en) * 2018-12-20 2019-11-19 孙卫平 A kind of Karaoke control circuit, Karaoke control device and mobile terminal
CN109887523A (en) * 2019-01-21 2019-06-14 北京小唱科技有限公司 Audio data processing method and device, electronic equipment and storage medium for application of singing
CN110970045A (en) * 2019-11-15 2020-04-07 北京达佳互联信息技术有限公司 Mixing processing method, mixing processing device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113611272A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN106162413B (en) The Headphone device of specific environment sound prompting mode
CN110166882B (en) Far-field pickup equipment and method for collecting human voice signals in far-field pickup equipment
EP2314077B1 (en) Wearable headset with self-contained vocal feedback and vocal command
US20050281421A1 (en) First person acoustic environment system and method
US20050106546A1 (en) Electronic communications device with a karaoke function
CN106170108B (en) Earphone device with decibel reminding mode
JP2008096483A (en) Sound output control device and sound output control method
JP2014174255A (en) Signal processing device, signal processing method, and storage medium
CN110636402A (en) Earphone device with local call condition confirmation mode
WO2021244056A1 (en) Data processing method and apparatus, and readable medium
US20230147435A1 (en) Audio Processing Method, Apparatus, and System
US20160267925A1 (en) Audio processing apparatus that outputs, among sounds surrounding user, sound to be provided to user
WO2021180115A1 (en) Recording method and recording system using true wireless earbuds
US11741984B2 (en) Method and apparatus and telephonic system for acoustic scene conversion
CN109658910A (en) A kind of wireless K song system
CN113611272B (en) Multi-mobile-terminal-based loudspeaker method, device and storage medium
CN113270082A (en) Vehicle-mounted KTV control method and device and vehicle-mounted intelligent networking terminal
CN113612881B (en) Loudspeaking method and device based on single mobile terminal and storage medium
CN113611266B (en) Audio synchronization method, device and storage medium suitable for multi-user K songs
WO2021004067A1 (en) Display device
JP2523366B2 (en) Audio playback method
JP7359896B1 (en) Sound processing equipment and karaoke system
CN113611271B (en) Digital volume augmentation method and device suitable for mobile terminal and storage medium
JP2012194295A (en) Speech output system
CN209746534U (en) Small-size portable headset K sings effect enhancer and K sings system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant