CN111370012A - Bluetooth voice audio acquisition method and system - Google Patents

Bluetooth voice audio acquisition method and system Download PDF

Info

Publication number
CN111370012A
CN111370012A CN202010460221.2A CN202010460221A CN111370012A CN 111370012 A CN111370012 A CN 111370012A CN 202010460221 A CN202010460221 A CN 202010460221A CN 111370012 A CN111370012 A CN 111370012A
Authority
CN
China
Prior art keywords
audio data
amplitude
audio
preset
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010460221.2A
Other languages
Chinese (zh)
Other versions
CN111370012B (en
Inventor
江德祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202010460221.2A priority Critical patent/CN111370012B/en
Publication of CN111370012A publication Critical patent/CN111370012A/en
Application granted granted Critical
Publication of CN111370012B publication Critical patent/CN111370012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/01Correction of time axis

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Telephone Function (AREA)

Abstract

The invention provides a Bluetooth voice audio acquisition method and a system, wherein the method is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps: simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device; processing the plurality of first audio data based on a preset processing rule to obtain second audio data; and sending the second audio data to the voice receiving terminal. The Bluetooth voice audio acquisition method of the invention realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.

Description

Bluetooth voice audio acquisition method and system
Technical Field
The invention relates to the technical field of voice acquisition, in particular to a Bluetooth voice audio acquisition method and system.
Background
At present, when a user uses a mobile terminal such as a mobile phone and the like to connect with bluetooth audio peripherals such as a bluetooth headset/a sound box and the like and performs some voice operations, such as dialing, WeChat/QQ voice & video, WeChat sending voice, skype voice call and other operations, the voice acquisition operations are usually automatically completed under the assistance of the application and a mobile phone audio system. One possibility for collecting user voice data is that sound is collected by a microphone of an audio peripheral such as an earphone and the like and is transmitted to a mobile phone through a BT SCO link; one is that a microphone arranged on the mobile phone collects sound; different application scenarios may confuse the user because of different on-microphone acquisitions on the handset and the peripheral. In addition, the voice tone quality acquired by mic (Microphone) on the mobile phone or mic on the earphone is higher in different scenes or hardware factors of equipment, and the like, so that a user cannot select a better Microphone for inputting by himself; therefore, the optimization scheme can be used for simultaneously acquiring multi-channel voice input, and the voice acquisition quality and the user experience are improved.
In addition, the two-path microphone acquisition of the existing voice acquisition scene mobile phone terminal or the earphone accessory terminal and the like is generally independent and incompatible, and only one path of operation can be selected. Therefore, to acquire high-quality voice data, a user must speak near the earphone or near the mobile phone, and the user must distinguish which microphone acquires the voice, which results in poor user experience.
Disclosure of Invention
One of the purposes of the invention is to provide a Bluetooth voice audio acquisition method, which realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
The embodiment of the invention provides a Bluetooth voice audio acquisition method, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
and sending the second audio data to the voice receiving terminal.
Preferably, the preset processing rule includes:
respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as second audio data.
Preferably, the parameters include: frequency response, THD + N, volume.
Preferably, the preset processing rule further includes:
and carrying out fusion processing on the plurality of first audio data to obtain second audio data.
Preferably, the fusion processing method includes: MIX, enhance one or more combinations in the compensation.
The invention also provides a bluetooth voice audio acquisition system, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
and the audio sending module is used for sending the second audio data to the voice receiving terminal.
Preferably, the second audio generating module includes:
the parameter extraction module is used for respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters and acquiring the best quality of the plurality of first audio data as second audio data.
Preferably, the parameters include: frequency response, THD + N, volume.
Preferably, the second audio generating module further comprises:
and the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data.
Preferably, the fusion processing method includes: MIX, enhance one or more combinations in the compensation.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic diagram of a bluetooth voice audio acquisition method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a conventional voice audio acquisition method;
fig. 3 is a schematic diagram of another conventional voice audio acquisition method.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides a Bluetooth voice audio acquisition method, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
and sending the second audio data to the voice receiving terminal.
The working principle and the beneficial effects of the technical scheme are as follows:
when a user uses a mobile terminal 1 such as a mobile phone and the like to connect with bluetooth audio peripherals (a bluetooth terminal 2) such as a bluetooth earphone/a sound box and the like, voice operations such as call making, WeChat/QQ voice and video, WeChat voice sending, skype voice call and the like are often performed to send voice data to a remote receiving terminal 3, and the voice acquisition operations are usually automatically completed by assistance of application and a mobile phone audio software and hardware system. The current design of the voice collecting software system only supports collecting from one microphone of the mobile terminal 1 (as shown in fig. 2) or the bluetooth terminal 2 (as shown in fig. 3), and the user is confused because it is unclear whether the voice is collected from the mobile terminal 1 or the bluetooth terminal 2. In addition, the voice tone quality acquired by mic on the mobile terminal 1 or the bluetooth terminal 2 may be higher due to environmental factor influence or equipment hardware factor, and the user cannot select a better microphone for input.
Fig. 1 shows an application scenario of a two-way audio acquisition device. In the application scene, the mobile terminal 1 and the voice receiving terminal 3 carry out voice call operation, and the bluetooth terminal 2 is connected with the mobile terminal 1 through a bluetooth wireless link. When the voice application on the mobile terminal 1 actively or passively triggers a voice operation, such as a telephone operation or a WeChat voice transmission, the application sets the audio input channel as a mobile phone microphone or as a Bluetooth BTSCO (using a Bluetooth microphone for input) by calling an audio software framework interface of the mobile phone system. The mobile phone microphone is a path of audio acquisition device; the Bluetooth microphone is another audio acquisition device.
Two paths of voice acquisition: by optimizing the audio system software architecture on the mobile terminal 1, when the audio channel selection is triggered by the mobile terminal 1, if the bluetooth terminal 2 is currently connected with the mobile terminal 1, the audio software system of the mobile terminal 1 is simultaneously controlled to start a microphone (such as a mobile phone microphone 1) on the mobile terminal 1 and establish a BT SCO Link (to start an earphone microphone), the two microphones simultaneously start voice collection, the actual application scene of a user may be attached to the bluetooth terminal 2 to make voice production or be close to the mobile terminal 1 to make voice production, the two voice collections are simultaneously gathered in the ADSP digital audio processing of the mobile terminal 1, and the audio voice data collected by the two audio nodes are actually compared and fused. And finally, a second audio with good audio quality is obtained after fusion processing, and the second audio is transmitted to the receiving terminal 3, so that the far-end equipment can obtain a clearer voice signal.
The Bluetooth voice audio acquisition method of the invention realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
In one embodiment, the preset processing rule includes:
respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as second audio data.
Wherein the parameters include: frequency response, THD + N, volume.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, the indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-party algorithm, audio signals with better parameter performance are directly selected, one path with poor signal performance is discarded, and preferential selection is carried out; no matter whether the user carries out speech sound production by attaching to bluetooth terminal 2 or is close to mobile terminal 1 and carries out speech sound production, the audio frequency that mobile terminal 1 sent for receiving terminal 3 is the clearest, need not that the user distinguishes which microphone to gather sound, and has avoided the user to distinguish the mistake and cause the emergence of the condition that the audio frequency collection quality is low, and then improves user's experience.
In one embodiment, the pre-setting processing rule further includes:
and carrying out fusion processing on the plurality of first audio data to obtain second audio data.
The fusion processing method comprises the following steps: MIX, enhance one or more combinations in the compensation.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-way algorithm, and MIX sound mixing and enhancement compensation are carried out on the basis of two paths of signals; and the two paths of signals are fused to realize the complementation of audio signal acquisition and reduce the probability of audio signal distortion, and then the processed signals are output. Therefore, no matter whether the user carries out voice sound production by being attached to the Bluetooth terminal 2 or is close to the mobile terminal 1, the audio sent to the receiving terminal 3 by the mobile terminal 1 is the best effect of the user sound, the user is not required to distinguish which microphone collects the sound, the situation that the audio collection quality is low due to the fact that the user distinguishes errors is avoided, and the user experience is improved.
In one embodiment, performing fusion processing on a plurality of first audio data to obtain second audio data includes the following operations:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
Figure 208330DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 349462DEST_PATH_IMAGE002
indicating the second in the amplitude sequence
Figure 159024DEST_PATH_IMAGE003
Effective values corresponding to the amplitude values;
Figure 930671DEST_PATH_IMAGE004
Figure 881309DEST_PATH_IMAGE005
respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequence
Figure 826131DEST_PATH_IMAGE006
Calculating the position of the first audio dataValue of credit
Figure 178615DEST_PATH_IMAGE007
The calculation formula is as follows:
Figure 934213DEST_PATH_IMAGE008
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value is
Figure 372147DEST_PATH_IMAGE009
When the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
The working principle and the beneficial effects of the technical scheme are as follows:
while processing the first audio data; the audio acquisition devices corresponding to the first audio data are different, the setting positions of the audio acquisition devices are different, and the distances between the audio acquisition devices and the speaking positions of the users are different, so that the audio data directly acquired by the audio acquisition devices have time domain differences, the first audio data are aligned in time domain, the validity of the first audio data is verified after the alignment, and the first audio data acquired by the audio acquisition devices far away from the speaking positions of the users are removed; the collected first audio data are fused to obtain better second audio data; furthermore, the manner of fusion is not limited to fusion in amplitude.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on first preset time, and determining a position, with the highest matching degree with the alignment tag, in the first audio data as an alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
the short-time energy sequence is used as an alignment standard to realize the accuracy of alignment operation; the alignment label takes the part of the audio data with more concentrated energy, so that the alignment label has the marking property; through the alignment step in the embodiment, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition. The position of the first audio data with the highest matching degree with the alignment tag specifically is: the closeness of the ratio of the short-time energy values in the first audio data corresponding to the position of the alignment mark to the ratio of the short-time energy values in the alignment mark.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on the first preset time, and determining a position, which is the highest in matching degree with the alignment tag, in the first audio data as a matching position;
step S16A: acquiring matched audio data corresponding to the matched position, and acquiring label audio data corresponding to the aligned label;
step S16B: intercepting and discarding data of a second preset time at the front end of the matched audio data;
step S16C: then sampling the discarded matched audio data based on first preset time to obtain a plurality of second short-time energy data; the second preset time is one M times of the first preset time;
step S16D: respectively calculating short-time energy values of the second short-time energy data, and arranging according to a sampling sequence to form a short-time energy sequence;
step 16E: continuing to intercept data of a second preset time at the front end of the matched audio data and discarding the data, repeating the steps S16C to S16D until M short-time energy sequences are obtained, and discarding the last energy value in the short-time energy sequences before the M short-time energy sequences;
step 16F: discarding the first and last short-time energy values in the short-time energy sequence corresponding to the alignment label to obtain a second standard short-time energy sequence;
step 16G: comparing the matching degrees between the second standard short-time energy sequence and the M short-time energy sequences respectively, and taking the position corresponding to the highest matching degree as the alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
by sampling the matching position and the alignment label again, the more accurate alignment position is obtained, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition.
The invention also provides a bluetooth voice audio acquisition system, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
and the audio sending module is used for sending the second audio data to the voice receiving terminal.
The working principle and the beneficial effects of the technical scheme are as follows:
when a user uses a mobile terminal 1 such as a mobile phone and the like to connect with bluetooth audio peripherals (a bluetooth terminal 2) such as a bluetooth earphone/a sound box and the like, voice operations such as call making, WeChat/QQ voice and video, WeChat voice sending, skype voice call and the like are often performed to send voice data to a remote receiving terminal 3, and the voice acquisition operations are usually automatically completed by assistance of application and a mobile phone audio software and hardware system. The current design of the voice collecting software system only supports collecting from one microphone of the mobile terminal 1 (as shown in fig. 2) or the bluetooth terminal 2 (as shown in fig. 3), and the user is confused because it is unclear whether the voice is collected from the mobile terminal 1 or the bluetooth terminal 2. In addition, the voice tone quality acquired by mic on the mobile terminal 1 or the bluetooth terminal 2 may be higher due to environmental factor influence or equipment hardware factor, and the user cannot select a better microphone for input.
Fig. 1 shows an application scenario of a two-way audio acquisition device. In the application scene, the mobile terminal 1 and the voice receiving terminal 3 carry out voice call operation, and the bluetooth terminal 2 is connected with the mobile terminal 1 through a bluetooth wireless link. When the voice application on the mobile terminal 1 actively or passively triggers a voice operation, such as a telephone operation or a WeChat voice transmission, the application sets the audio input channel as a mobile phone microphone or as a Bluetooth BTSCO (using a Bluetooth microphone for input) by calling an audio software framework interface of the mobile phone system. The mobile phone microphone is a path of audio acquisition device; the Bluetooth microphone is another audio acquisition device.
Two paths of voice acquisition: by optimizing the audio system software architecture on the mobile terminal 1, when the audio channel selection is triggered by the mobile terminal 1, if the bluetooth terminal 2 is currently connected with the mobile terminal 1, the audio software system of the mobile terminal 1 is simultaneously controlled to start a microphone (such as a mobile phone microphone 1) on the mobile terminal 1 and establish a BT SCO Link (to start an earphone microphone), the two microphones simultaneously start voice collection, the actual application scene of a user may be attached to the bluetooth terminal 2 to make voice production or be close to the mobile terminal 1 to make voice production, the two voice collections are simultaneously gathered in the ADSP digital audio processing of the mobile terminal 1, and the audio voice data collected by the two audio nodes are actually compared and fused. And finally, a second audio with good audio quality is obtained after fusion processing, and the second audio is transmitted to the receiving terminal 3, so that the far-end equipment can obtain a clearer voice signal.
The Bluetooth voice audio acquisition system realizes clear voice acquisition of a user; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
In one embodiment, the second audio generation module comprises:
the parameter extraction module is used for respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters and acquiring the best quality of the plurality of first audio data as second audio data.
Wherein the parameters include: frequency response, THD + N, volume.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, the indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-party algorithm, audio signals with better parameter performance are directly selected, one path with poor signal performance is discarded, and preferential selection is carried out; no matter whether the user carries out speech sound production by attaching to bluetooth terminal 2 or is close to mobile terminal 1 and carries out speech sound production, the audio frequency that mobile terminal 1 sent for receiving terminal 3 is the clearest, need not that the user distinguishes which microphone to gather sound, and has avoided the user to distinguish the mistake and cause the emergence of the condition that the audio frequency collection quality is low, and then improves user's experience.
In one embodiment, the second audio generation module further comprises:
and the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data.
The fusion processing method comprises the following steps: MIX, enhance one or more combinations in the compensation.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-way algorithm, and MIX sound mixing and enhancement compensation are carried out on the basis of two paths of signals; and the two paths of signals are fused to realize the complementation of audio signal acquisition and reduce the probability of audio signal distortion, and then the processed signals are output. Therefore, no matter whether the user carries out voice sound production by being attached to the Bluetooth terminal 2 or is close to the mobile terminal 1, the audio sent to the receiving terminal 3 by the mobile terminal 1 is the best effect of the user sound, the user is not required to distinguish which microphone collects the sound, the situation that the audio collection quality is low due to the fact that the user distinguishes errors is avoided, and the user experience is improved.
In one embodiment, the second audio generation module performs operations comprising:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
Figure 855081DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 327651DEST_PATH_IMAGE011
indicating the second in the amplitude sequence
Figure 752685DEST_PATH_IMAGE012
Effective values corresponding to the amplitude values;
Figure 677916DEST_PATH_IMAGE013
Figure 902224DEST_PATH_IMAGE014
respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequence
Figure 26037DEST_PATH_IMAGE015
Calculating a confidence value for the first audio data
Figure 310388DEST_PATH_IMAGE016
The calculation formula is as follows:
Figure 535964DEST_PATH_IMAGE017
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value is
Figure 563963DEST_PATH_IMAGE018
When the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
The working principle and the beneficial effects of the technical scheme are as follows:
while processing the first audio data; the audio acquisition devices corresponding to the first audio data are different, the setting positions of the audio acquisition devices are different, and the distances between the audio acquisition devices and the speaking positions of the users are different, so that the audio data directly acquired by the audio acquisition devices have time domain differences, the first audio data are aligned in time domain, the validity of the first audio data is verified after the alignment, and the first audio data acquired by the audio acquisition devices far away from the speaking positions of the users are removed; the collected first audio data are fused to obtain better second audio data; furthermore, the manner of fusion is not limited to fusion in amplitude.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on first preset time, and determining a position, with the highest matching degree with the alignment tag, in the first audio data as an alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
the short-time energy sequence is used as an alignment standard to realize the accuracy of alignment operation; the alignment label takes the part of the audio data with more concentrated energy, so that the alignment label has the marking property; through the alignment step in the embodiment, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition. The position of the first audio data with the highest matching degree with the alignment tag specifically is: the closeness of the ratio of the short-time energy values in the first audio data corresponding to the position of the alignment mark to the ratio of the short-time energy values in the alignment mark.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on the first preset time, and determining a position, which is the highest in matching degree with the alignment tag, in the first audio data as a matching position;
step S16A: acquiring matched audio data corresponding to the matched position, and acquiring label audio data corresponding to the aligned label;
step S16B: intercepting and discarding data of a second preset time at the front end of the matched audio data;
step S16C: then sampling the discarded matched audio data based on first preset time to obtain a plurality of second short-time energy data; the second preset time is one M times of the first preset time;
step S16D: respectively calculating short-time energy values of the second short-time energy data, and arranging according to a sampling sequence to form a short-time energy sequence;
step 16E: continuing to intercept data of a second preset time at the front end of the matched audio data and discarding the data, repeating the steps S16C to S16D until M short-time energy sequences are obtained, and discarding the last energy value in the short-time energy sequences before the M short-time energy sequences;
step 16F: discarding the first and last short-time energy values in the short-time energy sequence corresponding to the alignment label to obtain a second standard short-time energy sequence;
step 16G: comparing the matching degrees between the second standard short-time energy sequence and the M short-time energy sequences respectively, and taking the position corresponding to the highest matching degree as the alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
by sampling the matching position and the alignment label again, the more accurate alignment position is obtained, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. The Bluetooth voice audio acquisition method is applied to a mobile terminal connected with a multi-channel audio acquisition device, and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
sending the second audio data to a voice receiving terminal;
the preset processing rule comprises:
performing fusion processing on the plurality of first audio data to obtain second audio data;
the method for fusing the plurality of first audio data to obtain the second audio data comprises the following operations:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
Figure 507181DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 937025DEST_PATH_IMAGE002
indicating the second in the amplitude sequence
Figure 116334DEST_PATH_IMAGE003
Corresponding to amplitude value isA virtual value;
Figure 391458DEST_PATH_IMAGE004
Figure 34929DEST_PATH_IMAGE005
respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequence
Figure 307778DEST_PATH_IMAGE006
Calculating a confidence value for the first audio data
Figure 36700DEST_PATH_IMAGE007
The calculation formula is as follows:
Figure 551732DEST_PATH_IMAGE008
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value is
Figure 49710DEST_PATH_IMAGE009
When the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
2. The bluetooth voice audio acquisition method according to claim 1, wherein the preset processing rule comprises:
performing parameter extraction on a plurality of first audio data respectively to obtain the parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as the second audio data.
3. The bluetooth voice audio capture method of claim 2, characterized in that the parameters comprise: frequency response, THD + N, volume.
4. The bluetooth voice audio acquisition method according to claim 1, wherein the fusion processing method comprises: MIX, enhance one or more combinations in the compensation.
5. The utility model provides a bluetooth pronunciation audio acquisition system which characterized in that is applied to and is connected with multichannel audio acquisition device's mobile terminal, includes:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
the audio sending module is used for sending the second audio data to a voice receiving terminal;
the second audio generation module further comprises:
the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data;
the second audio generation module performs operations comprising:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
Figure 759040DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 709678DEST_PATH_IMAGE011
indicating the second in the amplitude sequence
Figure 592184DEST_PATH_IMAGE003
Effective values corresponding to the amplitude values;
Figure 882351DEST_PATH_IMAGE004
Figure 824899DEST_PATH_IMAGE005
respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequence
Figure 200517DEST_PATH_IMAGE006
Calculating a confidence value for the first audio data
Figure 621134DEST_PATH_IMAGE007
The calculation formula is as follows:
Figure 532851DEST_PATH_IMAGE008
wherein N represents the amplitude sequenceThe number of amplitude values in the column; when the confidence value is
Figure 646301DEST_PATH_IMAGE007
When the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
6. The bluetooth voice audio capture system of claim 5, wherein the second audio generation module comprises:
a parameter extraction module, configured to perform parameter extraction on the plurality of first audio data, respectively, to obtain the parameter indicating the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters to obtain the second audio data with the best quality in the plurality of first audio data.
7. The bluetooth voice audio capture system of claim 6, where the parameters comprise: frequency response, THD + N, volume.
8. The bluetooth voice audio acquisition system according to claim 5, wherein the method of fusion processing comprises: MIX, enhance one or more combinations in the compensation.
CN202010460221.2A 2020-05-27 2020-05-27 Bluetooth voice audio acquisition method and system Active CN111370012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460221.2A CN111370012B (en) 2020-05-27 2020-05-27 Bluetooth voice audio acquisition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460221.2A CN111370012B (en) 2020-05-27 2020-05-27 Bluetooth voice audio acquisition method and system

Publications (2)

Publication Number Publication Date
CN111370012A true CN111370012A (en) 2020-07-03
CN111370012B CN111370012B (en) 2020-09-08

Family

ID=71211035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460221.2A Active CN111370012B (en) 2020-05-27 2020-05-27 Bluetooth voice audio acquisition method and system

Country Status (1)

Country Link
CN (1) CN111370012B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816201A (en) * 2020-08-07 2020-10-23 联想(北京)有限公司 Electronic equipment and voice signal processing method
WO2022262262A1 (en) * 2021-06-16 2022-12-22 荣耀终端有限公司 Method for sound pick-up by terminal device by means of bluetooth peripheral, and terminal device
CN117319291A (en) * 2023-11-27 2023-12-29 深圳市海威恒泰智能科技有限公司 Low-delay network audio transmission method and system
CN111816201B (en) * 2020-08-07 2024-05-28 联想(北京)有限公司 Electronic equipment and voice signal processing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106027755A (en) * 2016-04-28 2016-10-12 努比亚技术有限公司 Audio control method and terminal
WO2018148315A1 (en) * 2017-02-07 2018-08-16 Lutron Electronics Co., Inc. Audio-based load control system
CN108573699A (en) * 2017-03-13 2018-09-25 陈新 Voice sharing recognition methods
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 A kind of multi-microphone voice acquisition method and device
CN108737615A (en) * 2018-06-27 2018-11-02 努比亚技术有限公司 microphone reception method, mobile terminal and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106027755A (en) * 2016-04-28 2016-10-12 努比亚技术有限公司 Audio control method and terminal
WO2018148315A1 (en) * 2017-02-07 2018-08-16 Lutron Electronics Co., Inc. Audio-based load control system
CN108573699A (en) * 2017-03-13 2018-09-25 陈新 Voice sharing recognition methods
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 A kind of multi-microphone voice acquisition method and device
CN108737615A (en) * 2018-06-27 2018-11-02 努比亚技术有限公司 microphone reception method, mobile terminal and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816201A (en) * 2020-08-07 2020-10-23 联想(北京)有限公司 Electronic equipment and voice signal processing method
CN111816201B (en) * 2020-08-07 2024-05-28 联想(北京)有限公司 Electronic equipment and voice signal processing method
WO2022262262A1 (en) * 2021-06-16 2022-12-22 荣耀终端有限公司 Method for sound pick-up by terminal device by means of bluetooth peripheral, and terminal device
CN117319291A (en) * 2023-11-27 2023-12-29 深圳市海威恒泰智能科技有限公司 Low-delay network audio transmission method and system
CN117319291B (en) * 2023-11-27 2024-03-01 深圳市海威恒泰智能科技有限公司 Low-delay network audio transmission method and system

Also Published As

Publication number Publication date
CN111370012B (en) 2020-09-08

Similar Documents

Publication Publication Date Title
CN105162950B (en) Mobile terminal and method for switching microphones in call
US9406296B2 (en) Two way automatic universal transcription telephone
CN111370012B (en) Bluetooth voice audio acquisition method and system
JP5740145B2 (en) Apparatus and method for recognizing earphone wearing in portable terminal
KR20090033318A (en) Apparatus having mobile terminal as input/output device of computer and related system and method
WO2015154282A1 (en) Call device and switching method and device applied thereto
KR100793299B1 (en) Apparatus and method for storing/calling telephone number in a mobile station
CN103260124A (en) Audio testing method and system of mobile terminal
WO2015131743A1 (en) Incoming call processing method, device and terminal
RU2015156799A (en) SYSTEM AND METHOD FOR CREATING A WIRELESS TUBE FOR STATIONARY PHONES USING A HOME GATEWAY AND A SMARTPHONE
EP1638306A1 (en) The system and method implementing network telephon communication by applying the instant messenger
CN107277208B (en) Communication method, first communication device and terminal
CN203747882U (en) Test device used for efficiently detecting audio performance of IP telephone
CN101345938A (en) Mobile phone terminal television receiver based on broadcast network and its application method
CN104883450A (en) Communication device and communication method for enhancing voice reception capacity
CN2847716Y (en) Telephone set with traditional telephone and network telephone function
CN111190568A (en) Volume adjusting method and device
CN206993215U (en) First communicator and terminal
CN108551514A (en) A kind of telephone device of complete acoustic control
CN210380947U (en) Distributed communication device
CN106101860A (en) A kind of family based on the Internet communication device
CN103516865A (en) Photographing system and photographing method
CN103595951A (en) Audio frequency input state processing method, sending end equipment and receiving end equipment
CN107436747B (en) Terminal application program control method and device, storage medium and electronic equipment
CN112822591A (en) Call data transmission method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant