CN113612881A

CN113612881A - Loudspeaking method and device based on single mobile terminal and storage medium

Info

Publication number: CN113612881A
Application number: CN202110771046.3A
Authority: CN
Inventors: 王国腾; 魏耀都; 陈华
Original assignee: Beijing Xiaochang Technology Co ltd
Current assignee: Beijing Xiaochang Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-11-05
Anticipated expiration: 2041-07-08
Also published as: CN113612881B

Abstract

The invention provides a loudspeaking method, a loudspeaking device and a storage medium based on a single mobile terminal, wherein the loudspeaking method, the loudspeaking device and the storage medium comprise the following steps: configuring the mobile terminal to reach a loudspeaking state based on the received configuration signal; acquiring audio generation audio information based on a microphone of the mobile terminal; processing the audio information to ensure that no environmental sound and crosstalk sound exist in the audio information and only human voice is reserved; mixing the audio information of only human voice with prestored accompaniment information to generate an external audio; the speaker of the mobile terminal is loudspeaking based on the outgoing audio. According to the technical scheme, the monitoring function in the song K process can be realized on the premise that mobile terminal hardware is not changed and no external equipment is needed, the audio information collected by the microphone is processed, the interference of ambient sound and crosstalk sound to the audio information is reduced, and digital sound amplification processing is performed after the voice and the accompaniment are mixed, so that the mixed sound played by the loudspeaker can cover the voice and cover two ears, and the song K effect is realized.

Description

Loudspeaking method and device based on single mobile terminal and storage medium

Technical Field

The invention relates to the technical field of audio data processing, in particular to a loudspeaking method and device based on a single mobile terminal and a storage medium.

Background

With the rapid development of science and technology and music culture, the way of singing (singing) is more and more diversified. The mode of carrying out K based on mobile terminal sings has the advantage of convenience, easy operation, and is most favored by everyone, in the present scheme of carrying out K based on mobile terminal sings, include several kinds at least:

in the scheme 1, as shown in fig. 1A, an external microphone (which means not belonging to the mobile terminal) converts singing voice in the space into a digital signal, which is called digitized singing voice. The digitized singing voice is transmitted through a digitized connecting channel between the microphone and the terminal. The mobile terminal beautifies the digitized singing voice and mixes the beautified digitized singing voice and the digitized accompaniment music into digitized audio. And the mixed digital audio is transmitted through a digital connecting channel between the mobile terminal and the external earphone. The external earphone converts the digital audio into sound waves in the space, and the sound waves are transmitted in the closed space between the earphone and the human ears. I.e. external earphones and external microphones are required.

Scheme 2, as shown in fig. 1B, the technical scheme is similar to scheme 1, a connection channel exists between the external microphone and the external earphone (for example, integrated on the same device) in scheme 2, and the digitized singing voice directly enters the external earphone through the connection channel; the mobile terminal only transmits the digital accompaniment to an external earphone; the mixing of the digitized singing voice and the digitized accompaniment is performed in a headphone. I.e. an integrated external earphone and external microphone is required.

In scheme 3, as shown in fig. 1C, in the technical scheme, similar to scheme 1, the mobile terminal plays the mixed digitized audio through an internal speaker without using an external earphone, but needs an external microphone.

In scheme 4, as shown in fig. 1D, similar to scheme 1, a microphone built in the mobile terminal is used, and an external microphone is not used for speaker, but an external earphone is required.

Scheme 5, as shown in fig. 1E, use external loudspeaker microphone (the portable equipment that has loudspeaker and microphone) to carry out digital processing to the song, mobile terminal transmits the digital accompaniment to loudspeaker microphone, and loudspeaker microphone mixes digital song sound with the digital accompaniment, and loudspeaker microphone carries out the space propagation through its built-in speaker with mixed audio frequency. I.e. an external horn microphone is required.

Scheme 6, as shown in fig. 1F, the mobile terminal is held at a distance of normal viewing screen, and the microphone built in the mobile terminal converts the singing voice in the space into digital singing voice; the mobile terminal beautifies the digitized singing voice; the mobile terminal plays the accompaniment audio through the built-in loudspeaker, but does not play the singing voice. I.e. without an external device, but without the ability to play singing voice.

Scheme 7, as shown in fig. 1G, includes two sub-technical schemes. Sub-technical scheme 1: the earphone of the mobile terminal is close to the ear of a person and is held in a phone answering mode, under the holding mode, the mobile terminal ensures that the quality of sound reaches the level of intelligibility through common processing such as noise reduction, echo cancellation, automatic gain control and the like in a communication system, and experiences similar daily phones, and under the condition, the listening volumes of two ears are different. Sub-technical scheme 2: the mobile terminal is held by hand at a distance of normally watching the screen, and performs processing such as noise reduction, echo cancellation, automatic gain control and the like which are common in a communication system. The audio is processed through the communication system without an external device, but the playing tone quality of the method is poor.

In the scheme 8, as shown in fig. 1H, external microphones convert singing voice in the multipath space into digitized singing voice, and the digitized singing voice is transmitted through a digital connecting channel between the microphones and the terminal. Then, the mobile terminal beautifies the digitized singing voice, the beautified digitized singing voice and the digitized accompaniment music are mixed into digitized audio, the terminal plays the mixed digitized audio through an internal loudspeaker (or external audio playing equipment), the mode needs to use external loudspeaker microphone equipment to carry out K singing, and the lyrics are inconvenient to watch at the same terminal by a plurality of people.

To perform karaoke on the loud speaking condition of the mobile terminal, at least two of the following requirements are satisfied:

1) the sound volume is required to cover the sound coming out of the mouth and two ears simultaneously;

2) and (3) real-time synchronization of voice feedback (monitoring and singing), and addition of sound effects such as reverberation and the like to the audio.

On the premise of not using any additional equipment, the existing Karaoke schemes 1 to 8 cannot meet the two requirements.

Disclosure of Invention

The embodiment of the invention provides a loudspeaking method, a loudspeaking device and a storage medium based on a single mobile terminal, which can realize a monitoring function based on the mobile terminal without any external equipment, and process audio collected in the loudspeaking and K singing processes to achieve the corresponding K singing effect.

In a first aspect of the embodiments of the present invention, a loudspeaking method based on a single mobile terminal is provided, including:

configuring the mobile terminal to reach a loudspeaking state based on the received configuration signal;

acquiring audio based on a microphone of a mobile terminal to generate audio information, wherein the audio at least comprises human voice, environmental sound and crosstalk sound;

processing the audio information to ensure that no environmental sound and crosstalk sound exist in the audio information and only human voice is reserved;

mixing the audio information of only human voice with prestored accompaniment information to generate an external audio;

the speaker of the mobile terminal is loudspeaking based on the played-out audio.

Optionally, in a possible implementation manner of the first aspect, the mixing the audio information of only human voice with prestored accompaniment information to generate an external audio includes:

and carrying out voice beautification and mother tape mixing and shrinking based on voice feedback loop characteristics, voice information of the voice and accompaniment information which are pre-configured by the mobile terminal to obtain the external audio.

Optionally, in a possible implementation manner of the first aspect, the process of configuring the acoustic feedback loop feature for the mobile terminal includes:

sending a preset audio signal to a space where the mobile terminal is located through the loudspeaker;

receiving a feedback audio signal of a space through the microphone;

and comparing the preset audio signal with the feedback audio signal to obtain the characteristics of the acoustic feedback loop in the current space.

Optionally, in a possible implementation manner of the first aspect, the method further includes:

the speaker of the mobile terminal raises the sound based on the played audio, and the speaker includes:

and carrying out digital sound amplification processing on the external sound frequency, and playing the external sound frequency after the digital sound amplification processing through a loudspeaker.

Optionally, in a possible implementation manner of the first aspect, processing the audio information so that no ambient sound and crosstalk sound exist in the audio information, and only preserving the human sound includes:

receiving audio features of a target user;

extracting audio information corresponding to the audio features of the target user from the audio information as the audio information of only human voice;

and performing sound quality enhancement processing on the audio information of the human voice only.

and storing the external audio and/or the audio information of only human voice.

In a second aspect of the embodiments of the present invention, there is provided a speaker device based on a single mobile terminal, including:

the configuration module is used for configuring the mobile terminal to enable the mobile terminal to reach a loudspeaker state based on the received configuration signal;

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring audio based on a microphone of a mobile terminal to generate audio information, and the audio at least comprises human voice, environmental sound and crosstalk sound;

the processing module is used for processing the audio information to ensure that no environmental sound and crosstalk sound exist in the audio information and only human voice is reserved;

the audio mixing module is used for mixing the audio information of only human voice with prestored accompaniment information to generate an external audio;

and the play-out module is used for enabling the loudspeaker of the mobile terminal to play out based on the played-out audio.

Optionally, in a possible implementation manner of the second aspect, the mixing module is further configured to perform the following steps, including:

Optionally, in a possible implementation manner of the second aspect, the configuration module is further configured to perform the following steps, including:

receiving a feedback audio signal of a space through the microphone;

In a third aspect of the embodiments of the present invention, a readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method according to the first aspect of the present invention and various possible designs of the first aspect of the present invention.

The loudspeaking method, the loudspeaking device and the storage medium based on the single mobile terminal can realize the monitoring function in the K song process on the premise of not changing the hardware of the mobile terminal and not needing any external equipment, process the audio information collected by the microphone, reduce the interference of the environmental sound and the crosstalk sound to the audio information, and carry out digital sound amplification processing after mixing the voice and the accompaniment, so that the sound after the sound mixing played by the loudspeaker can cover the voice and both ears, and the K song effect is realized.

Drawings

FIG. 1A is a schematic diagram of transmission of voice and data between devices in prior art scheme 1;

FIG. 1B is a schematic diagram of the transmission of voice and data between devices in prior art scheme 2;

FIG. 1C is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scheme 3;

FIG. 1D is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scheme 4;

FIG. 1E is a schematic diagram of the transmission of voice and data between devices according to the prior art in the scenario 5;

FIG. 1F is a schematic diagram of the transmission of voice and data between devices according to prior art scheme 6;

fig. 1G is a schematic diagram of transmission of voice and data between devices in prior art scheme 7;

FIG. 1H is a schematic diagram of the transmission of voice and data between devices according to prior art scheme 8;

FIG. 2 is a flow chart of a first embodiment of a loudspeaking method based on a single mobile terminal;

FIG. 3 is a schematic diagram illustrating a propagation process of sound/audio in an embodiment of the present invention;

FIG. 4 is a schematic illustration of the process of human voice beautification and mastering;

FIG. 5 is a schematic diagram of a measurement phase of an acoustic feedback loop characteristic;

FIG. 6 is a schematic diagram of a tracking phase of an acoustic feedback loop feature;

FIG. 7 is a schematic illustration of audio information processing;

fig. 8 is a configuration diagram of a first embodiment of a speaker device based on a single mobile terminal.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The invention provides a loudspeaking method based on a single mobile terminal, which is shown in a flow chart of figure 2 and comprises the following steps:

and step S110, configuring the mobile terminal to achieve a loudspeaker state based on the received configuration signal. In step S110, the mobile terminal is configured according to the environment, because the mobile terminal is located at different positions and under different scenes, different sound capturing conditions may be generated. For example, in a relatively open space, there is a possibility that an echo may occur when a mobile terminal plays a sound, and therefore, it is necessary to automatically configure the mobile terminal according to the environment of the field when the mobile terminal is used for speaker.

Step S120, audio is obtained based on a microphone of the mobile terminal to generate audio information, wherein the audio at least comprises human voice, environmental sound and crosstalk sound. The main purpose of the mobile terminal is to loud the user's voice, such as the voice generated when the user sings a song, etc., but in practical situations, there are many noises, such as the voice of the environment, the crosstalk sound of the mobile terminal, etc., at the location where the mobile terminal is located. In step S120, the audio information is generated in preparation for the process of getting the human voice.

Step S130, processing the audio information to make the audio information have no environmental sound and crosstalk sound, and only human voice is retained. In order to ensure high fidelity of human voice, the collected audio information is processed, and environmental sound and crosstalk sound in the audio information are removed. Ambient sound includes sound generated by various devices in the environment as well as sound emitted by other people in the environment.

Step S140 mixes the audio information of only human voice with the pre-stored accompaniment information to generate a play audio. After obtaining the audio information of only human voice, mixing the audio information with the accompaniment information to obtain the external audio which only has the target human voice and the accompaniment

And S150, the loudspeaker of the mobile terminal raises the sound based on the external audio.

In one possible implementation, the propagation process of sound \ audio is as shown in fig. 3. In space, at least human singing voice, i.e., human voice, environmental sound and crosstalk sound, exists, wherein the crosstalk sound is generated by playing through a loudspeaker of the mobile terminal, sound heard by human ears includes sound emitted by human mouth, and the sound emitted by human mouth is transmitted into human ears through 3 paths, including intracranial transmission, spatial transmission and mixed sound transmission through the mobile terminal. The mobile terminal can mix the accompanying sound and the human voice when processing the human voice to obtain the played sound after mixing, thereby realizing loudspeaking \ K song.

In one possible embodiment, as shown in fig. 4, mixing audio information of only human voices with pre-stored accompaniment information to generate an outgoing audio includes:

and carrying out voice beautification and mother tape mixing and shrinking based on voice feedback loop characteristics, voice information of the voice and accompaniment information which are pre-configured by the mobile terminal to obtain the external audio. The invention at least comprises two modules of voice beautification and master tape mixing and shrinking, processes the input of multi-channel audio (voice information and accompaniment information) through the two modules of voice beautification and master tape mixing and shrinking, and obtains single-channel audio (external audio) to output, thereby realizing the beautification of the audio.

The process of carrying out the acoustic feedback loop characteristic configuration on the mobile terminal comprises the following steps:

receiving a feedback audio signal of a space through the microphone;

In one possible implementation, as shown in fig. 5, when the speaker/karaoke is not performed, the mobile terminal needs to be configured first, and the speaker of the mobile terminal does not play singing voice or music. Firstly, a mobile terminal loudspeaker is used for actively playing various sound feedback detection signals, a mobile terminal microphone is used for recording the signals, and echo power, frequency response and environment sound mixing impulse response are obtained through analysis and calculation. The obtained calculation result is saved as an acoustic feedback loop characteristic.

In a possible implementation manner, as shown in fig. 6, during the process of loudspeaking/singing, the environment of the mobile terminal may change, the present invention can further change the characteristics of the acoustic feedback loop according to the change of the environment of the mobile terminal, so that the mobile terminal can automatically adapt to the change of the environment and change the characteristics of the acoustic feedback loop, that is, when the environment information and the position information of the mobile terminal change, wherein the environment information and the position information include the attitude, the position and the moving speed of the mobile terminal, the echo power, the frequency response and the impulse response of the environment mixing sound are changed based on the environment information and the position information.

In one possible embodiment, the method further comprises:

The technical scheme provided by the invention can amplify the external audio to realize the volume enhancement based on software.

In one possible embodiment, as shown in fig. 7, processing the audio information so that no ambient sound and crosstalk sound exist in the audio information, and only preserving the human sound includes:

audio features of a target user are received. The audio feature may be a tone color, and since the tone color of each person is different, the attribution of the audio may be distinguished according to the tone color.

And extracting audio information corresponding to the audio features of the target user from the audio information as the audio information of only human voice. For example, if the target user is a handheld mobile terminal a, the singing voice (audio information) recorded by the terminal a in the song K is processed, the audio information with the corresponding timbre feature of the terminal a in the audio information is extracted, at this time, the audio information with the corresponding timbre feature of the terminal a and the accompaniment information are mixed to obtain an external audio, finally, the audio quality of the external audio is enhanced to obtain a high-quality voice, and then the voice is played, that is, the audio quality of the voice only audio information is enhanced.

In one possible embodiment, the method further comprises:

and storing the external audio and/or the audio information of only human voice. In the actual sound amplification and karaoke processes, the requirement of recording may exist, the audio information of the audio and/or only the voice can be correspondingly stored according to the corresponding requirement, and subsequent calling and answering are facilitated.

The technical solution of the present invention further provides a speaker device based on a single mobile terminal, as shown in fig. 8, including:

In one embodiment, the mixing module is further configured to perform the following steps, including:

In one embodiment, the configuration module is further configured to perform steps comprising:

receiving a feedback audio signal of a space through the microphone;

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A loudspeaking method based on a single mobile terminal is characterized by comprising the following steps:

2. The single mobile terminal based loudspeaking method of claim 1,

the mixing the audio information of only human voice with the prestored accompaniment information to generate the play audio comprises:

3. The single mobile terminal based loudspeaking method of claim 2,

receiving a feedback audio signal of a space through the microphone;

4. The single mobile terminal based loudspeaking method of claim 1, further comprising:

5. The single mobile terminal based loudspeaking method of claim 1,

processing the audio information to enable the audio information not to have environmental sound and crosstalk sound, wherein only human sound is reserved comprises:

receiving audio features of a target user;

6. The single mobile terminal based loudspeaking method of claim 1, further comprising:

7. A speaker device based on a single mobile terminal, comprising:

8. The single mobile terminal based speaker device according to claim 1,

the mixing module is further configured to perform the following steps, including:

9. The single mobile terminal based speaker device according to claim 8,

the configuration module is further configured to perform steps comprising:

receiving a feedback audio signal of a space through the microphone;

10. A readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 6.