CN111370012A - Bluetooth voice audio acquisition method and system - Google Patents
Bluetooth voice audio acquisition method and system Download PDFInfo
- Publication number
- CN111370012A CN111370012A CN202010460221.2A CN202010460221A CN111370012A CN 111370012 A CN111370012 A CN 111370012A CN 202010460221 A CN202010460221 A CN 202010460221A CN 111370012 A CN111370012 A CN 111370012A
- Authority
- CN
- China
- Prior art keywords
- audio data
- amplitude
- audio
- preset
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000007499 fusion processing Methods 0.000 claims description 15
- 230000004927 fusion Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 description 14
- 230000009286 beneficial effect Effects 0.000 description 12
- 238000004519 manufacturing process Methods 0.000 description 10
- 238000013519 translation Methods 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 102220144835 rs148716754 Human genes 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/08—Arrangements for producing a reverberation or echo sound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/01—Correction of time axis
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Telephone Function (AREA)
Abstract
The invention provides a Bluetooth voice audio acquisition method and a system, wherein the method is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps: simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device; processing the plurality of first audio data based on a preset processing rule to obtain second audio data; and sending the second audio data to the voice receiving terminal. The Bluetooth voice audio acquisition method of the invention realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
Description
Technical Field
The invention relates to the technical field of voice acquisition, in particular to a Bluetooth voice audio acquisition method and system.
Background
At present, when a user uses a mobile terminal such as a mobile phone and the like to connect with bluetooth audio peripherals such as a bluetooth headset/a sound box and the like and performs some voice operations, such as dialing, WeChat/QQ voice & video, WeChat sending voice, skype voice call and other operations, the voice acquisition operations are usually automatically completed under the assistance of the application and a mobile phone audio system. One possibility for collecting user voice data is that sound is collected by a microphone of an audio peripheral such as an earphone and the like and is transmitted to a mobile phone through a BT SCO link; one is that a microphone arranged on the mobile phone collects sound; different application scenarios may confuse the user because of different on-microphone acquisitions on the handset and the peripheral. In addition, the voice tone quality acquired by mic (Microphone) on the mobile phone or mic on the earphone is higher in different scenes or hardware factors of equipment, and the like, so that a user cannot select a better Microphone for inputting by himself; therefore, the optimization scheme can be used for simultaneously acquiring multi-channel voice input, and the voice acquisition quality and the user experience are improved.
In addition, the two-path microphone acquisition of the existing voice acquisition scene mobile phone terminal or the earphone accessory terminal and the like is generally independent and incompatible, and only one path of operation can be selected. Therefore, to acquire high-quality voice data, a user must speak near the earphone or near the mobile phone, and the user must distinguish which microphone acquires the voice, which results in poor user experience.
Disclosure of Invention
One of the purposes of the invention is to provide a Bluetooth voice audio acquisition method, which realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
The embodiment of the invention provides a Bluetooth voice audio acquisition method, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
and sending the second audio data to the voice receiving terminal.
Preferably, the preset processing rule includes:
respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as second audio data.
Preferably, the parameters include: frequency response, THD + N, volume.
Preferably, the preset processing rule further includes:
and carrying out fusion processing on the plurality of first audio data to obtain second audio data.
Preferably, the fusion processing method includes: MIX, enhance one or more combinations in the compensation.
The invention also provides a bluetooth voice audio acquisition system, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
and the audio sending module is used for sending the second audio data to the voice receiving terminal.
Preferably, the second audio generating module includes:
the parameter extraction module is used for respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters and acquiring the best quality of the plurality of first audio data as second audio data.
Preferably, the parameters include: frequency response, THD + N, volume.
Preferably, the second audio generating module further comprises:
and the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data.
Preferably, the fusion processing method includes: MIX, enhance one or more combinations in the compensation.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a schematic diagram of a bluetooth voice audio acquisition method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a conventional voice audio acquisition method;
fig. 3 is a schematic diagram of another conventional voice audio acquisition method.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides a Bluetooth voice audio acquisition method, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
and sending the second audio data to the voice receiving terminal.
The working principle and the beneficial effects of the technical scheme are as follows:
when a user uses a mobile terminal 1 such as a mobile phone and the like to connect with bluetooth audio peripherals (a bluetooth terminal 2) such as a bluetooth earphone/a sound box and the like, voice operations such as call making, WeChat/QQ voice and video, WeChat voice sending, skype voice call and the like are often performed to send voice data to a remote receiving terminal 3, and the voice acquisition operations are usually automatically completed by assistance of application and a mobile phone audio software and hardware system. The current design of the voice collecting software system only supports collecting from one microphone of the mobile terminal 1 (as shown in fig. 2) or the bluetooth terminal 2 (as shown in fig. 3), and the user is confused because it is unclear whether the voice is collected from the mobile terminal 1 or the bluetooth terminal 2. In addition, the voice tone quality acquired by mic on the mobile terminal 1 or the bluetooth terminal 2 may be higher due to environmental factor influence or equipment hardware factor, and the user cannot select a better microphone for input.
Fig. 1 shows an application scenario of a two-way audio acquisition device. In the application scene, the mobile terminal 1 and the voice receiving terminal 3 carry out voice call operation, and the bluetooth terminal 2 is connected with the mobile terminal 1 through a bluetooth wireless link. When the voice application on the mobile terminal 1 actively or passively triggers a voice operation, such as a telephone operation or a WeChat voice transmission, the application sets the audio input channel as a mobile phone microphone or as a Bluetooth BTSCO (using a Bluetooth microphone for input) by calling an audio software framework interface of the mobile phone system. The mobile phone microphone is a path of audio acquisition device; the Bluetooth microphone is another audio acquisition device.
Two paths of voice acquisition: by optimizing the audio system software architecture on the mobile terminal 1, when the audio channel selection is triggered by the mobile terminal 1, if the bluetooth terminal 2 is currently connected with the mobile terminal 1, the audio software system of the mobile terminal 1 is simultaneously controlled to start a microphone (such as a mobile phone microphone 1) on the mobile terminal 1 and establish a BT SCO Link (to start an earphone microphone), the two microphones simultaneously start voice collection, the actual application scene of a user may be attached to the bluetooth terminal 2 to make voice production or be close to the mobile terminal 1 to make voice production, the two voice collections are simultaneously gathered in the ADSP digital audio processing of the mobile terminal 1, and the audio voice data collected by the two audio nodes are actually compared and fused. And finally, a second audio with good audio quality is obtained after fusion processing, and the second audio is transmitted to the receiving terminal 3, so that the far-end equipment can obtain a clearer voice signal.
The Bluetooth voice audio acquisition method of the invention realizes clear user voice acquisition; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
In one embodiment, the preset processing rule includes:
respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as second audio data.
Wherein the parameters include: frequency response, THD + N, volume.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, the indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-party algorithm, audio signals with better parameter performance are directly selected, one path with poor signal performance is discarded, and preferential selection is carried out; no matter whether the user carries out speech sound production by attaching to bluetooth terminal 2 or is close to mobile terminal 1 and carries out speech sound production, the audio frequency that mobile terminal 1 sent for receiving terminal 3 is the clearest, need not that the user distinguishes which microphone to gather sound, and has avoided the user to distinguish the mistake and cause the emergence of the condition that the audio frequency collection quality is low, and then improves user's experience.
In one embodiment, the pre-setting processing rule further includes:
and carrying out fusion processing on the plurality of first audio data to obtain second audio data.
The fusion processing method comprises the following steps: MIX, enhance one or more combinations in the compensation.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-way algorithm, and MIX sound mixing and enhancement compensation are carried out on the basis of two paths of signals; and the two paths of signals are fused to realize the complementation of audio signal acquisition and reduce the probability of audio signal distortion, and then the processed signals are output. Therefore, no matter whether the user carries out voice sound production by being attached to the Bluetooth terminal 2 or is close to the mobile terminal 1, the audio sent to the receiving terminal 3 by the mobile terminal 1 is the best effect of the user sound, the user is not required to distinguish which microphone collects the sound, the situation that the audio collection quality is low due to the fact that the user distinguishes errors is avoided, and the user experience is improved.
In one embodiment, performing fusion processing on a plurality of first audio data to obtain second audio data includes the following operations:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
wherein,indicating the second in the amplitude sequenceEffective values corresponding to the amplitude values;、respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequenceCalculating the position of the first audio dataValue of creditThe calculation formula is as follows:
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value isWhen the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
The working principle and the beneficial effects of the technical scheme are as follows:
while processing the first audio data; the audio acquisition devices corresponding to the first audio data are different, the setting positions of the audio acquisition devices are different, and the distances between the audio acquisition devices and the speaking positions of the users are different, so that the audio data directly acquired by the audio acquisition devices have time domain differences, the first audio data are aligned in time domain, the validity of the first audio data is verified after the alignment, and the first audio data acquired by the audio acquisition devices far away from the speaking positions of the users are removed; the collected first audio data are fused to obtain better second audio data; furthermore, the manner of fusion is not limited to fusion in amplitude.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on first preset time, and determining a position, with the highest matching degree with the alignment tag, in the first audio data as an alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
the short-time energy sequence is used as an alignment standard to realize the accuracy of alignment operation; the alignment label takes the part of the audio data with more concentrated energy, so that the alignment label has the marking property; through the alignment step in the embodiment, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition. The position of the first audio data with the highest matching degree with the alignment tag specifically is: the closeness of the ratio of the short-time energy values in the first audio data corresponding to the position of the alignment mark to the ratio of the short-time energy values in the alignment mark.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on the first preset time, and determining a position, which is the highest in matching degree with the alignment tag, in the first audio data as a matching position;
step S16A: acquiring matched audio data corresponding to the matched position, and acquiring label audio data corresponding to the aligned label;
step S16B: intercepting and discarding data of a second preset time at the front end of the matched audio data;
step S16C: then sampling the discarded matched audio data based on first preset time to obtain a plurality of second short-time energy data; the second preset time is one M times of the first preset time;
step S16D: respectively calculating short-time energy values of the second short-time energy data, and arranging according to a sampling sequence to form a short-time energy sequence;
step 16E: continuing to intercept data of a second preset time at the front end of the matched audio data and discarding the data, repeating the steps S16C to S16D until M short-time energy sequences are obtained, and discarding the last energy value in the short-time energy sequences before the M short-time energy sequences;
step 16F: discarding the first and last short-time energy values in the short-time energy sequence corresponding to the alignment label to obtain a second standard short-time energy sequence;
step 16G: comparing the matching degrees between the second standard short-time energy sequence and the M short-time energy sequences respectively, and taking the position corresponding to the highest matching degree as the alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
by sampling the matching position and the alignment label again, the more accurate alignment position is obtained, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition.
The invention also provides a bluetooth voice audio acquisition system, which is applied to a mobile terminal connected with a multi-channel audio acquisition device and comprises:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
and the audio sending module is used for sending the second audio data to the voice receiving terminal.
The working principle and the beneficial effects of the technical scheme are as follows:
when a user uses a mobile terminal 1 such as a mobile phone and the like to connect with bluetooth audio peripherals (a bluetooth terminal 2) such as a bluetooth earphone/a sound box and the like, voice operations such as call making, WeChat/QQ voice and video, WeChat voice sending, skype voice call and the like are often performed to send voice data to a remote receiving terminal 3, and the voice acquisition operations are usually automatically completed by assistance of application and a mobile phone audio software and hardware system. The current design of the voice collecting software system only supports collecting from one microphone of the mobile terminal 1 (as shown in fig. 2) or the bluetooth terminal 2 (as shown in fig. 3), and the user is confused because it is unclear whether the voice is collected from the mobile terminal 1 or the bluetooth terminal 2. In addition, the voice tone quality acquired by mic on the mobile terminal 1 or the bluetooth terminal 2 may be higher due to environmental factor influence or equipment hardware factor, and the user cannot select a better microphone for input.
Fig. 1 shows an application scenario of a two-way audio acquisition device. In the application scene, the mobile terminal 1 and the voice receiving terminal 3 carry out voice call operation, and the bluetooth terminal 2 is connected with the mobile terminal 1 through a bluetooth wireless link. When the voice application on the mobile terminal 1 actively or passively triggers a voice operation, such as a telephone operation or a WeChat voice transmission, the application sets the audio input channel as a mobile phone microphone or as a Bluetooth BTSCO (using a Bluetooth microphone for input) by calling an audio software framework interface of the mobile phone system. The mobile phone microphone is a path of audio acquisition device; the Bluetooth microphone is another audio acquisition device.
Two paths of voice acquisition: by optimizing the audio system software architecture on the mobile terminal 1, when the audio channel selection is triggered by the mobile terminal 1, if the bluetooth terminal 2 is currently connected with the mobile terminal 1, the audio software system of the mobile terminal 1 is simultaneously controlled to start a microphone (such as a mobile phone microphone 1) on the mobile terminal 1 and establish a BT SCO Link (to start an earphone microphone), the two microphones simultaneously start voice collection, the actual application scene of a user may be attached to the bluetooth terminal 2 to make voice production or be close to the mobile terminal 1 to make voice production, the two voice collections are simultaneously gathered in the ADSP digital audio processing of the mobile terminal 1, and the audio voice data collected by the two audio nodes are actually compared and fused. And finally, a second audio with good audio quality is obtained after fusion processing, and the second audio is transmitted to the receiving terminal 3, so that the far-end equipment can obtain a clearer voice signal.
The Bluetooth voice audio acquisition system realizes clear voice acquisition of a user; the user does not need to care which microphone collects the sound, and the user obtains better voice operation experience.
In one embodiment, the second audio generation module comprises:
the parameter extraction module is used for respectively extracting parameters of the plurality of first audio data to obtain parameters representing the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters and acquiring the best quality of the plurality of first audio data as second audio data.
Wherein the parameters include: frequency response, THD + N, volume.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, the indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-party algorithm, audio signals with better parameter performance are directly selected, one path with poor signal performance is discarded, and preferential selection is carried out; no matter whether the user carries out speech sound production by attaching to bluetooth terminal 2 or is close to mobile terminal 1 and carries out speech sound production, the audio frequency that mobile terminal 1 sent for receiving terminal 3 is the clearest, need not that the user distinguishes which microphone to gather sound, and has avoided the user to distinguish the mistake and cause the emergence of the condition that the audio frequency collection quality is low, and then improves user's experience.
In one embodiment, the second audio generation module further comprises:
and the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data.
The fusion processing method comprises the following steps: MIX, enhance one or more combinations in the compensation.
The working principle and the beneficial effects of the technical scheme are as follows:
in the application scenario of fig. 1, the ADSP audio fusion policy: after two paths of audio data are input into ADSP, indexes such as frequency response, THD + N, volume and the like of two paths of collected audio are compared through a three-way algorithm, and MIX sound mixing and enhancement compensation are carried out on the basis of two paths of signals; and the two paths of signals are fused to realize the complementation of audio signal acquisition and reduce the probability of audio signal distortion, and then the processed signals are output. Therefore, no matter whether the user carries out voice sound production by being attached to the Bluetooth terminal 2 or is close to the mobile terminal 1, the audio sent to the receiving terminal 3 by the mobile terminal 1 is the best effect of the user sound, the user is not required to distinguish which microphone collects the sound, the situation that the audio collection quality is low due to the fact that the user distinguishes errors is avoided, and the user experience is improved.
In one embodiment, the second audio generation module performs operations comprising:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
wherein,indicating the second in the amplitude sequenceEffective values corresponding to the amplitude values;、respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequenceCalculating a confidence value for the first audio dataThe calculation formula is as follows:
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value isWhen the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
The working principle and the beneficial effects of the technical scheme are as follows:
while processing the first audio data; the audio acquisition devices corresponding to the first audio data are different, the setting positions of the audio acquisition devices are different, and the distances between the audio acquisition devices and the speaking positions of the users are different, so that the audio data directly acquired by the audio acquisition devices have time domain differences, the first audio data are aligned in time domain, the validity of the first audio data is verified after the alignment, and the first audio data acquired by the audio acquisition devices far away from the speaking positions of the users are removed; the collected first audio data are fused to obtain better second audio data; furthermore, the manner of fusion is not limited to fusion in amplitude.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on first preset time, and determining a position, with the highest matching degree with the alignment tag, in the first audio data as an alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
the short-time energy sequence is used as an alignment standard to realize the accuracy of alignment operation; the alignment label takes the part of the audio data with more concentrated energy, so that the alignment label has the marking property; through the alignment step in the embodiment, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition. The position of the first audio data with the highest matching degree with the alignment tag specifically is: the closeness of the ratio of the short-time energy values in the first audio data corresponding to the position of the alignment mark to the ratio of the short-time energy values in the alignment mark.
In one embodiment, step S1: performing time domain alignment operation on a plurality of first audio data; the method specifically comprises the following steps:
step S11: sampling the first audio data based on a first preset time to obtain a plurality of first short-time energy data,
step S12: respectively calculating short-time energy values of the first short-time energy data, and arranging the short-time energy values according to a sampling sequence to form a short-time energy sequence;
step S13: comparing the sum of the short-time energy values in the short-time energy sequences of the first audio data, and taking the short-time energy sequence corresponding to the first audio data with the largest sum as a standard short-time energy sequence;
step S14: acquiring the length of an alignment label, and performing translation extraction in the standard short-time energy sequence based on the length of the alignment label to obtain a plurality of short-time energy labels;
step S15: comparing the energy sum value of each short-time energy label, and taking the short-time energy label with the maximum sum value as an alignment label;
step S16: performing translation detection on the alignment tag in each first audio data based on the first preset time, and determining a position, which is the highest in matching degree with the alignment tag, in the first audio data as a matching position;
step S16A: acquiring matched audio data corresponding to the matched position, and acquiring label audio data corresponding to the aligned label;
step S16B: intercepting and discarding data of a second preset time at the front end of the matched audio data;
step S16C: then sampling the discarded matched audio data based on first preset time to obtain a plurality of second short-time energy data; the second preset time is one M times of the first preset time;
step S16D: respectively calculating short-time energy values of the second short-time energy data, and arranging according to a sampling sequence to form a short-time energy sequence;
step 16E: continuing to intercept data of a second preset time at the front end of the matched audio data and discarding the data, repeating the steps S16C to S16D until M short-time energy sequences are obtained, and discarding the last energy value in the short-time energy sequences before the M short-time energy sequences;
step 16F: discarding the first and last short-time energy values in the short-time energy sequence corresponding to the alignment label to obtain a second standard short-time energy sequence;
step 16G: comparing the matching degrees between the second standard short-time energy sequence and the M short-time energy sequences respectively, and taking the position corresponding to the highest matching degree as the alignment position;
step S17: and determining a good alignment position based on each piece of the first audio data, and performing alignment operation on each piece of the first audio data in a time domain.
The working principle and the beneficial effects of the technical scheme are as follows:
by sampling the matching position and the alignment label again, the more accurate alignment position is obtained, the accuracy of the subsequent second audio data generation is ensured, and the mobile terminal realizes clear user voice acquisition.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (8)
1. The Bluetooth voice audio acquisition method is applied to a mobile terminal connected with a multi-channel audio acquisition device, and comprises the following steps:
simultaneously acquiring a plurality of first audio data through a multi-channel audio acquisition device;
processing the plurality of first audio data based on a preset processing rule to obtain second audio data;
sending the second audio data to a voice receiving terminal;
the preset processing rule comprises:
performing fusion processing on the plurality of first audio data to obtain second audio data;
the method for fusing the plurality of first audio data to obtain the second audio data comprises the following operations:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
wherein,indicating the second in the amplitude sequenceCorresponding to amplitude value isA virtual value;、respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequenceCalculating a confidence value for the first audio dataThe calculation formula is as follows:
wherein N represents the number of the amplitude values in the amplitude sequence; when the confidence value isWhen the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
2. The bluetooth voice audio acquisition method according to claim 1, wherein the preset processing rule comprises:
performing parameter extraction on a plurality of first audio data respectively to obtain the parameters representing the quality of the first audio data;
and comparing the parameters to obtain the best quality of the plurality of first audio data as the second audio data.
3. The bluetooth voice audio capture method of claim 2, characterized in that the parameters comprise: frequency response, THD + N, volume.
4. The bluetooth voice audio acquisition method according to claim 1, wherein the fusion processing method comprises: MIX, enhance one or more combinations in the compensation.
5. The utility model provides a bluetooth pronunciation audio acquisition system which characterized in that is applied to and is connected with multichannel audio acquisition device's mobile terminal, includes:
the first audio acquisition module is used for acquiring a plurality of first audio data through a multi-channel audio acquisition device at the same time;
the second audio generation module is used for processing the plurality of first audio data based on a preset processing rule to acquire second audio data;
the audio sending module is used for sending the second audio data to a voice receiving terminal;
the second audio generation module further comprises:
the audio fusion module is used for carrying out fusion processing on the plurality of first audio data to obtain second audio data;
the second audio generation module performs operations comprising:
step S1: performing time domain alignment operation on a plurality of first audio data;
step S2: calculating an amplitude sequence of each of the first audio data; the amplitude sequence includes: an amplitude value of each frame of the first audio data;
step S3: calculating an effective value corresponding to the amplitude value, wherein the calculation formula is as follows:
wherein,indicating the second in the amplitude sequenceEffective values corresponding to the amplitude values;、respectively is a preset maximum standard amplitude and a preset minimum standard amplitude;
step S4: corresponding the effective value based on each amplitude value in the amplitude sequenceCalculating a confidence value for the first audio dataThe calculation formula is as follows:
wherein N represents the amplitude sequenceThe number of amplitude values in the column; when the confidence value isWhen the first audio data is smaller than or equal to a preset value, the corresponding first audio data is abandoned;
step S5: when the number of the first audio data processed in step S4 is one, performing amplitude enhancement on the first audio data to make an amplitude value in the amplitude sequence of the first audio data meet a preset requirement to be used as the second audio data; the preset requirements include: the number of amplitude values in the amplitude sequence reaching the preset amplitude intensity reaches the set number;
step S6: when the number of the first audio data processed in step S4 is greater than one, performing amplitude enhancement on each first audio data to make the amplitude value in the amplitude sequence of each first audio data meet a preset requirement, averaging the amplitude values in the amplitude sequences to form a new amplitude sequence, and using the new first audio data corresponding to the new amplitude sequence as the second audio data.
6. The bluetooth voice audio capture system of claim 5, wherein the second audio generation module comprises:
a parameter extraction module, configured to perform parameter extraction on the plurality of first audio data, respectively, to obtain the parameter indicating the quality of the first audio data;
and the parameter comparison module is used for comparing the parameters to obtain the second audio data with the best quality in the plurality of first audio data.
7. The bluetooth voice audio capture system of claim 6, where the parameters comprise: frequency response, THD + N, volume.
8. The bluetooth voice audio acquisition system according to claim 5, wherein the method of fusion processing comprises: MIX, enhance one or more combinations in the compensation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460221.2A CN111370012B (en) | 2020-05-27 | 2020-05-27 | Bluetooth voice audio acquisition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010460221.2A CN111370012B (en) | 2020-05-27 | 2020-05-27 | Bluetooth voice audio acquisition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111370012A true CN111370012A (en) | 2020-07-03 |
CN111370012B CN111370012B (en) | 2020-09-08 |
Family
ID=71211035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010460221.2A Active CN111370012B (en) | 2020-05-27 | 2020-05-27 | Bluetooth voice audio acquisition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111370012B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111816201A (en) * | 2020-08-07 | 2020-10-23 | 联想(北京)有限公司 | Electronic equipment and voice signal processing method |
CN114466283A (en) * | 2022-02-08 | 2022-05-10 | 维沃移动通信有限公司 | Audio acquisition method and device, electronic equipment and peripheral component method |
WO2022262262A1 (en) * | 2021-06-16 | 2022-12-22 | 荣耀终端有限公司 | Method for sound pick-up by terminal device by means of bluetooth peripheral, and terminal device |
CN117319291A (en) * | 2023-11-27 | 2023-12-29 | 深圳市海威恒泰智能科技有限公司 | Low-delay network audio transmission method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106027755A (en) * | 2016-04-28 | 2016-10-12 | 努比亚技术有限公司 | Audio control method and terminal |
WO2018148315A1 (en) * | 2017-02-07 | 2018-08-16 | Lutron Electronics Co., Inc. | Audio-based load control system |
CN108573699A (en) * | 2017-03-13 | 2018-09-25 | 陈新 | Voice sharing recognition methods |
CN108597498A (en) * | 2018-04-10 | 2018-09-28 | 广州势必可赢网络科技有限公司 | Multi-microphone voice acquisition method and device |
CN108737615A (en) * | 2018-06-27 | 2018-11-02 | 努比亚技术有限公司 | microphone reception method, mobile terminal and computer readable storage medium |
-
2020
- 2020-05-27 CN CN202010460221.2A patent/CN111370012B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106027755A (en) * | 2016-04-28 | 2016-10-12 | 努比亚技术有限公司 | Audio control method and terminal |
WO2018148315A1 (en) * | 2017-02-07 | 2018-08-16 | Lutron Electronics Co., Inc. | Audio-based load control system |
CN108573699A (en) * | 2017-03-13 | 2018-09-25 | 陈新 | Voice sharing recognition methods |
CN108597498A (en) * | 2018-04-10 | 2018-09-28 | 广州势必可赢网络科技有限公司 | Multi-microphone voice acquisition method and device |
CN108737615A (en) * | 2018-06-27 | 2018-11-02 | 努比亚技术有限公司 | microphone reception method, mobile terminal and computer readable storage medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111816201A (en) * | 2020-08-07 | 2020-10-23 | 联想(北京)有限公司 | Electronic equipment and voice signal processing method |
CN111816201B (en) * | 2020-08-07 | 2024-05-28 | 联想(北京)有限公司 | Electronic equipment and voice signal processing method |
WO2022262262A1 (en) * | 2021-06-16 | 2022-12-22 | 荣耀终端有限公司 | Method for sound pick-up by terminal device by means of bluetooth peripheral, and terminal device |
CN114466283A (en) * | 2022-02-08 | 2022-05-10 | 维沃移动通信有限公司 | Audio acquisition method and device, electronic equipment and peripheral component method |
CN117319291A (en) * | 2023-11-27 | 2023-12-29 | 深圳市海威恒泰智能科技有限公司 | Low-delay network audio transmission method and system |
CN117319291B (en) * | 2023-11-27 | 2024-03-01 | 深圳市海威恒泰智能科技有限公司 | Low-delay network audio transmission method and system |
Also Published As
Publication number | Publication date |
---|---|
CN111370012B (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111370012B (en) | Bluetooth voice audio acquisition method and system | |
US9406296B2 (en) | Two way automatic universal transcription telephone | |
US20160269844A1 (en) | Stereo headset, terminal, and audio signal processing methods thereof | |
CN106101743B (en) | Panoramic video recognition methods and device | |
WO2015154282A1 (en) | Call device and switching method and device applied thereto | |
WO2015131743A1 (en) | Incoming call processing method, device and terminal | |
CN103795834A (en) | Recording method capable of uploading conversation recording file of smart phone and dedicated recording apparatus | |
KR20070098128A (en) | Apparatus and method for storing/calling telephone number in a mobile station | |
CN203747882U (en) | Test device used for efficiently detecting audio performance of IP telephone | |
CN205378147U (en) | Can realize TV box of video conversation | |
CN111190568A (en) | Volume adjusting method and device | |
CN101345938A (en) | Mobile phone terminal television receiver based on broadcast network and its application method | |
CN104883450A (en) | Communication device and communication method for enhancing voice reception capacity | |
WO2018064883A1 (en) | Method and device for sound recording, apparatus and computer storage medium | |
CN210380947U (en) | Distributed communication device | |
CN108551514A (en) | A kind of telephone device of complete acoustic control | |
CN103595951A (en) | Audio frequency input state processing method, sending end equipment and receiving end equipment | |
CN101854574A (en) | Microphone circuit in dual-mode dual-standby mobile terminal and implementation method thereof | |
CN107222634B (en) | Incoming call control method and device, storage medium and electronic equipment | |
CN212696059U (en) | Network telephone terminal and system based on WiFi | |
CN109361890A (en) | A kind of video call system | |
CN107436747A (en) | Terminal application program control method and device, storage medium and electronic equipment | |
CN113115290B (en) | Method for receiving audio data | |
KR100749748B1 (en) | Mobile station with control communication channel and its method | |
CN206461761U (en) | A kind of synchronous translation apparatus based on directional audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |