CN110351419B

CN110351419B - Intelligent voice system and voice processing method thereof

Info

Publication number: CN110351419B
Application number: CN201810298616.XA
Authority: CN
Inventors: 陶永耀
Original assignee: Actions Technology Co Ltd
Current assignee: Actions Technology Co Ltd
Priority date: 2018-04-04
Filing date: 2018-04-04
Publication date: 2021-08-24
Anticipated expiration: 2038-04-04
Also published as: CN110351419A

Abstract

The invention provides an intelligent voice system and a voice processing method thereof, comprising a Bluetooth terminal and intelligent equipment; the Bluetooth terminal comprises a microphone array, a voice preprocessing device, a first transmission device and a playback device; the intelligent equipment comprises a second transmission device and an intelligent processor; the voice preprocessing device comprises a storage device and an encoding device, wherein the encoding device is connected with the microphone array and used for encoding a first audio signal acquired by the microphone array, encoding the first audio signal through the encoding device and then storing the encoded first audio signal in the storage device, and sending the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established. The system can store the voice awakening information and the uploading information when the mobile phone service is switched, so that smooth interactive experience can be obtained when the service is switched.

Description

Intelligent voice system and voice processing method thereof

Technical Field

The invention relates to the technical field of multimedia, in particular to an intelligent voice system and a voice processing method thereof.

Background

With the advent of voice human-computer interaction interfaces, more and more products are required to interact with intelligent voice. The intelligent voice interaction products in the current market are all products based on wifi, but wifi power consumption is large, and portability is poor. Moreover, the development of smart phones has formed a portable computing center. Therefore, an intelligent voice interaction product which relies on the Bluetooth technology to realize portability can be an important development trend. The traditional Bluetooth and the mobile phone forward voice adopt a classic Bluetooth mode, so that the problem of frequent switching of the classic Bluetooth mode of the mobile phone exists, and the experience is poor.

The Audio transmission of the classic bluetooth is realized by two bluetooth profiles, HFP (Hands-free Profile) and A2DP (Advanced Audio Distribution Profile, bluetooth Audio transmission model protocol). HFP is used in a telephone call scenario featuring real-time two-way voice communication, while ad2p is used in a listening stereo music scenario featuring one-way audio push. The two bluetooth audio communications occupy the classic bluetooth channel and in the current implementation are switched from one to the other depending on the scene. Therefore, the current voice requirement of bluetooth audio products is to collect voice data by switching the mode from a2dp scene listening to songs to hfp scene, which has two problems, 1, it takes time to establish a new system link, it needs to wait for approximately 2 seconds, and the experience is very poor. 2, the voice data needs to be collected and transmitted immediately after a collection instruction is sent, and the former data is lost due to the old mode switching mode.

A new speech processing technique is needed to address the deficiencies of the prior art.

Disclosure of Invention

Based on the above problems, the invention provides an intelligent voice system, which can store voice awakening information and uploading information while switching the mobile phone service, so that smooth interactive experience can be obtained when the service is switched.

The embodiment of the invention is realized in such a way that the intelligent voice system comprises a Bluetooth terminal and intelligent equipment; the Bluetooth terminal comprises a microphone array, a voice preprocessing device, a first transmission device and a playback device; the intelligent device comprises a second transmission device and an intelligent processor; the first transmitting device comprises a first HFP communicating device and a first A2DP communicating device, the second transmitting device comprises a second HFP communicating device and a second A2DP communicating device; the voice preprocessing device comprises a storage device and an encoding device, wherein the encoding device is connected with the microphone array and is used for encoding a first audio signal acquired by the microphone array, encoding the first audio signal through the encoding device and storing the encoded first audio signal in the storage device, and sending the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established; the playback device is connected to the first A2DP communication device for receiving a second audio signal transmitted by the second A2DP communication device via the first A2DP communication device.

Further, the encoding device further comprises a PCM encoding device and an audio encoding device, the PCM encoding device is configured to PCM encode the first audio signal acquired by the microphone array and store the PCM encoded first audio signal in the storage device, and when the first HFP communication device is connected to the second HFP communication device, the audio encoding device further performs audio encoding on the PCM encoded first audio signal and transmits the PCM encoded first audio signal to the second HFP communication device through the first HFP communication device.

Further, the encoding device further includes an audio encoding device, the audio encoding device is configured to perform audio encoding on the first audio signal acquired by the microphone array and store the first audio signal in the storage device, and when the first HFP communication device is connected to the second HFP communication device, the audio encoded first audio signal is transmitted to the second HFP communication device through the first HFP communication device.

Furthermore, the intelligent voice system further comprises a voice cloud server, and the voice cloud server is in remote communication with the intelligent device and acquires the first audio signal sent by the intelligent device, so as to process the first audio signal.

Further, data transmission is carried out between the voice cloud server and the intelligent device through a wireless network.

Further, the microphone array is an analog microphone array or a digital microphone array, and the microphone array comprises 1-8 microphones.

Further, the intelligent device is a smart phone, a tablet computer, a smart television or a smart set-top box.

Further, the voice preprocessing apparatus further includes:

the awakening device is connected with the microphone array and used for awakening the voice preprocessing device and the first transmission device;

the noise reduction device is connected between the microphone array and the first transmission device and is used for carrying out noise reduction processing on the acquired audio signals;

the beam forming device is connected with the microphone array and used for enhancing the voice acquisition in a specific direction;

and the echo cancellation device is connected between the noise reduction device and the first transmission device and is used for carrying out echo cancellation processing on the acquired audio signal.

Further, the smart device further includes:

the awakening device is used for awakening the voice preprocessing device and the first transmission device;

the noise reduction device is used for carrying out noise reduction processing on the acquired audio signal;

and the beam forming device is used for enhancing the voice acquisition in a specific direction.

According to another aspect of the embodiments of the present invention, the present invention further provides a voice processing method for use in an intelligent voice system, which enables the system to store voice wake-up information and upload information while switching services of a mobile phone, so that smooth interactive experience can be obtained while switching services.

The embodiment of the invention is realized in such a way that a voice processing method used in an intelligent voice system comprises the following steps:

(1) the microphone array acquires a first audio signal and sends the first audio signal to the voice preprocessing device; (2) the voice preprocessing device encodes the first audio signal and stores the first audio signal in the storage device; (3) transmitting the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established; (4) and the intelligent equipment processes the first audio signal and then returns a control signal to the voice preprocessing device.

Further, the steps further include: (201) the method comprises the steps that a first audio signal acquired by a microphone array is subjected to PCM (pulse code modulation) coding and then stored in a storage device; (202) further audio encoding the PCM encoding of the first audio signal after the first HFP communication device is connected to the second HFP communication device; (203) transmitting the audio-encoded first audio signal to a second HFP communication device through a first HFP communication device.

Further, the steps further include: (204) performing audio coding on a first audio signal acquired by the microphone array and storing the audio signal in the storage device; (205) and after the first HFP communication device is connected with the second HFP communication device, transmitting the first audio signal subjected to audio coding to the second HFP communication device through the first HFP communication device.

By adopting the technical scheme, the method has the following beneficial effects: and storing the awakened voice coding buffer on the Bluetooth terminal until the HFP passage is established, and transmitting the voice to an HFP service channel of the system. The coding format adopted for the voice coding buffer storage can be a PCM format, and can also be a cvsd, msbc and other voice formats, so that the method is suitable for the native assistant products of the mobile phone system, does not influence the use experience of the assistant, and has similar effect to the native assistant directly using the mobile phone. On the original bluetooth audio channel of A2DP, the speech information that needs the transmission after switching HFP sends to the cell-phone end, can improve the experience decline that leads to when the bluetooth audio of A2DP switches.

Drawings

FIG. 1 is a block diagram of an intelligent speech system provided in accordance with one embodiment of the present invention;

FIG. 2 is a block diagram of an intelligent speech system according to another embodiment of the present invention;

fig. 3 is a flowchart of a speech processing method in an intelligent speech system according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides an intelligent voice system, and referring to fig. 1, a block diagram of a structure of the intelligent voice system provided in the embodiment of the present invention is provided, where the intelligent voice system includes a bluetooth terminal 1 and an intelligent device 2; the Bluetooth terminal 1 comprises a microphone array 101, a voice preprocessing device 102, a first transmission device 103 and a playback device 104; the intelligent device 2 comprises a second transmission device 202 and an intelligent processor 201; the first transmitting device 103 comprises a first HFP communication device 1032 and a first A2DP communication device 1031, the second transmitting device 202 comprises a second HFP communication device 2022 and a second A2DP communication device 2021; the speech preprocessing device 102 comprises a storage device 1021 and an encoding device 1022, wherein the encoding device 1022 is connected to the microphone array 101, and is configured to encode a first audio signal acquired by the microphone array 101, store the first audio signal in the storage device 1021 after being encoded by the encoding device 1022, and transmit the first audio signal in the storage device 1021 to the second HFP communication device 2022 through the first HFP communication device 1032 when the HFP communication connection is established; the playback device 104 is connected to the first A2DP communication device 1031 for receiving the second audio signal transmitted by the second A2DP communication device 2021 via the first A2DP communication device 1031.

The intelligent device is an intelligent mobile phone, a tablet personal computer, an intelligent television or an intelligent set top box. The following description will be made in detail by taking a mobile phone as an example.

Specifically, when the mobile phone terminal is connected to a bluetooth terminal (e.g., a bluetooth sound box) in an A2DP mode and plays a song, the mobile phone terminal performs a communication connection with a first A2DP communication device of the bluetooth terminal through a second A2DP communication device to transmit the song (i.e., the second audio signal) of the mobile phone terminal to the bluetooth terminal, and the song is played by a playback device of the bluetooth terminal, where the connection established between the two devices is an A2DP protocol. When the Bluetooth terminal receives a voice control request, namely a first audio signal, the collected first audio signal is buffered immediately, the first audio signal is coded through a coding device in the voice preprocessing device and stored in a storage device, then the Bluetooth terminal is in HFP communication connection with the mobile phone, and when the connection is completed, the first audio signal data buffered in the storage device is sent to the mobile phone end for processing, so that the received voice control signal cannot be lost before the HFP communication connection is waited.

The embodiment of the present invention is specifically described below with a specific example, when the mobile phone end is connected to the bluetooth terminal and the bluetooth terminal plays a song in the mobile phone, the connection is established between the mobile phone end and the bluetooth terminal through the A2DP transmission device, the second audio data is transmitted, the audio data is unidirectionally transmitted from the mobile phone end to the bluetooth terminal, and the playback device of the bluetooth terminal decodes the audio data and plays the audio data. At this time, when the user sends a voice instruction to the bluetooth terminal: when the multi-array microphone of the bluetooth terminal receives the voice instruction, firstly, the voice instruction signal "please play the song of dunlijun", is encoded and stored in the storage device, and meanwhile, the communication link between the mobile phone and the bluetooth terminal is switched to the HFP mode from the A2DP mode, when the transmission link switching is completed, the bluetooth terminal sends the first audio signal stored in the storage device to the mobile phone terminal through the HFP communication mode, the intelligent processor of the mobile phone terminal processes the instruction and feeds the instruction back to the bluetooth terminal, and the bluetooth terminal receives the instruction fed back to start playing the song of dunlijun. In the above process, if the voice instruction has been stopped due to an excessively short instruction time without waiting for the HFP communication link to be established in the conventional manner, the voice instruction cannot be picked up. It will be appreciated that if the voice command is a relatively simple command, for example: playing, stopping, increasing the volume, reducing the volume, answering the call, hanging up the call and the like, and directly carrying out identification processing by the Bluetooth terminal. Only voice commands are complex, such as: please help me find where the nearest gasoline station is? Today is the weather? And when the instruction is received, the instruction needs to be transmitted to a mobile phone end or a voice cloud server for processing.

In the embodiment of the present invention, there are two ways to encode the first audio signal, encode the first audio signal by the encoding apparatus 1022, and store the encoded first audio signal in the storage apparatus 1021, and the two ways will be further described in detail below.

Another embodiment is provided, on the basis of the above embodiment, in which the encoding device further includes a PCM encoding device and an audio encoding device, the PCM encoding device is configured to PCM encode the first audio signal acquired by the microphone array and store the PCM encoded first audio signal in the storage device, and after the first HFP communication device is connected to the second HFP communication device, the audio encoding device further performs audio encoding on the PCM encoded first audio signal and transmits the PCM encoded first audio signal to the second HFP communication device through the first HFP communication device. Specifically, firstly, the acquired first audio signal is PCM encoded by using a PCM encoding device, and is buffered, and after HFP connection is established, data is acquired from the buffered PCM data and is encoded into a cvsd or msbc format, and the encoded data is transmitted to an HFP interface of the intelligent terminal through an HFP communication device.

In another embodiment, based on the above embodiment, the encoding device further includes an audio encoding device, where the audio encoding device is configured to perform audio encoding on the first audio signal acquired by the microphone array and store the first audio signal in the storage device, and when the first HFP communication device is connected to the second HFP communication device, the audio encoded first audio signal is transmitted to the second HFP communication device through the first HFP communication device. The difference with the above described embodiment is that the encoding means does not require PCM encoding means, but directly audio encodes the captured first audio signal. The first audio signal is subjected to audio coding to obtain cvsd or msbc after being collected, the coded voice is buffered, after HFP is established, the coded first audio signal is sent to a transmission link and sent to a mobile phone terminal for processing.

The storage device may be an SRAM inside a chip of the voice preprocessing device, or may use a storage medium outside the chip, such as an SRAM of a bluetooth terminal, DDR or NAND FLASH.

The invention provides another embodiment, when the intelligent processor at the mobile phone end cannot meet the requirement of voice recognition processing, the voice recognition function of the voice cloud server 3 needs to be further used, the embodiment of the invention further comprises the voice cloud server on the basis of the system, and the voice cloud server is in remote communication with the intelligent equipment and acquires the first audio signal sent by the intelligent equipment for processing the first audio signal. And data transmission is carried out between the voice cloud server and the intelligent equipment through a wireless network. According to the size of the voice calculation amount, generally, simple voice processing can be completed at the mobile phone end intelligent processor, the voice recognition function application of most front-end equipment can be met, and the voice processing can be completed by using the voice cloud server under the condition that the mobile phone end intelligent processor cannot meet the operation or processing.

According to the embodiment of the invention, the microphone array is an analog microphone array or a digital microphone array, and usually the microphone array comprises 1-8 microphones, which is a standard far-field voice acquisition configuration, and usually 2 microphones are used to form the microphone array, and only 1 microphone may be used in some quiet situations.

In another embodiment of the present invention, the voice pre-processing apparatus further comprises:

a wake-up unit 1025, connected to the microphone array, for waking up the voice preprocessing unit and the first transmission unit;

a noise reduction device 1023 connected between the microphone array and the first transmission device for performing noise reduction processing on the acquired audio signal;

a beam forming device 1024 connected to the microphone array for enhancing the voice collection in a specific direction;

and an echo cancellation device 1026, connected between the noise reduction device and the first transmission device, for performing echo cancellation processing on the acquired audio signal.

In order to provide further advanced speech processing, as shown in fig. 2, the speech preprocessing unit 102 further comprises a wake-up unit 1025, connected to the microphone array 101, for waking up the speech preprocessing unit 102 and the first transmission unit 103. The speech preprocessing device 102 further includes a noise reduction device 1023 connected between the microphone array 101 and the first transmission device 103 for performing noise reduction processing on the acquired audio signal. The voice preprocessing device 103 further includes a beam forming device 1024 connected to the microphone array 101 for enhancing the voice collection of a specific direction by the microphone array 101. The speech preprocessing device 102 further comprises an echo cancellation device 1026, connected between the noise reduction device 1023 and the encoding device 1035, for performing echo cancellation processing on the acquired first audio signal. The awakening device is used for voice signals collected by a microphone array, determining an algorithm for starting voice awakening according to energy or characteristics (zero crossing point detection, spectrum analysis and the like) of human voice, comparing input voice with a maximum likelihood algorithm of a large batch of training sequences in advance, determining whether the voice input is an awakening word, and starting subsequent processing if the voice input is the awakening word. The beam forming device 1024 is used for determining the input direction of the sound signal relative to the microphone array according to the time delay and the phase difference of the voice data of each microphone when the multi-microphone voice is input, and determining the parameters of the noise reduction device according to the information. The noise reducer 1023 strengthens or weakens signals in different directions according to noise reduction parameters of a beam forming algorithm or a preset noise reduction directional diagram curve, and highlights the signal strength in the latest direction. Meanwhile, according to the frequency spectrum difference of human voice and environmental sound (periodic noise, music) and the difference of time domain correlation, the signal is processed in frequency domain or time domain, and the human voice is extracted and enhanced from background sound or noise. When the playback module exists, the echo cancellation device 1026 adds a predetermined or predicted transfer function to the playback decoded data, and cancels the reflected part of the sound emitted by the loudspeaker in the data collected by the microphone, so as to obtain a clean voice without echo.

It can be understood that, when the voice pre-processing device has insufficient computing power and is not suitable for performing the above processing, the wake-up device, the noise reduction device, and the beam forming device may be disposed in the smart device 2 for processing, so as to reduce the computation of the voice pre-processing device 102. The smart device further includes: the awakening device is used for awakening the voice preprocessing device and the first transmission device; the noise reduction device is used for carrying out noise reduction processing on the acquired audio signal; and the beam forming device is used for enhancing the voice acquisition in a specific direction. The above apparatus may be provided in an intelligent processor of an intelligent device.

The embodiment of the present invention is realized as follows, and as shown in fig. 3, a speech processing method for use in an intelligent speech system includes the following steps: (S101) the microphone array acquires a first audio signal and sends the first audio signal to the voice preprocessing device; (S102) the voice preprocessing means encodes the first audio signal and stores the first audio signal in the storage means; (S103) transmitting the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established; (S104) the intelligent equipment processes the first audio signal and then returns a control signal to the voice preprocessing device.

The above steps may further include two encoding modes, the first is: the method comprises the steps that a first audio signal acquired by a microphone array is subjected to PCM (pulse code modulation) coding and then stored in a storage device; further audio encoding the PCM encoding of the first audio signal after the first HFP communication device is connected to the second HFP communication device; transmitting the audio-encoded first audio signal to a second HFP communication device through a first HFP communication device. Specifically, firstly, the acquired first audio signal is PCM encoded by using a PCM encoding device, and is buffered, and after HFP connection is established, data is acquired from the buffered PCM data and is encoded into an audio format of cvsd or msbc, and the audio format is transmitted to an HFP interface of the intelligent terminal through an HFP communication device.

The second way is: performing audio coding on a first audio signal acquired by the microphone array and storing the audio signal in the storage device; and after the first HFP communication device is connected with the second HFP communication device, transmitting the first audio signal subjected to audio coding to the second HFP communication device through the first HFP communication device. The difference with the above described embodiment is that the encoding means does not require PCM encoding means, but directly audio encodes the captured first audio signal. The first audio signal is subjected to audio coding to be in a cvsd or msbc format after being collected, the coded voice is buffered, after HFP is established, the coded first audio signal is sent to a transmission link and sent to an intelligent terminal for processing.

The following description will be made in detail by taking a mobile phone as an example.

Specifically, when the mobile phone terminal is connected to the bluetooth terminal in the A2DP mode and plays a song, the mobile phone terminal performs communication with the first A2DP communication device of the bluetooth terminal through the second A2DP communication device to transmit the song (i.e., the second audio signal) of the mobile phone terminal to the bluetooth terminal and play the song by the playback device of the bluetooth terminal, and at this time, the connection established between the two is the A2DP protocol. A microphone array of the Bluetooth terminal acquires a first audio signal and sends the first audio signal to a voice preprocessing device; after the voice preprocessing device carries out preprocessing, the first audio signal is encoded and then cached in a storage device, and transmission to a mobile phone end is waited; and after the mobile phone terminal is in communication connection with the HFP of the Bluetooth terminal, transmitting the first audio signal subjected to audio coding to the mobile phone terminal through HFP communication. And the mobile phone end intelligent processor identifies the first audio signal and then returns a control signal to the voice preprocessing device, and the voice preprocessing device acquires the returned control signal and then controls the Bluetooth terminal according to the first audio signal.

The invention stores the awakened voice code on the Bluetooth terminal until the HFP passage is established, and then transmits the voice to the HFP service channel of the system. The encoding format which can be buffered can be pcm, cvsd and msbc. Therefore, the method is suitable for the native assistant product of the mobile phone system, does not influence the use experience of the assistant, and has similar effect to the native assistant directly using the mobile phone. On the original bluetooth audio channel of A2DP, the speech information that needs the transmission after switching HFP sends to the cell-phone end, can improve the experience decline that leads to when the bluetooth audio of A2DP switches.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An intelligent voice system is characterized by comprising a Bluetooth terminal and intelligent equipment; the Bluetooth terminal comprises a microphone array, a voice preprocessing device, a first transmission device and a playback device; the intelligent device comprises a second transmission device and an intelligent processor; the first transmitting device comprises a first HFP communicating device and a first A2DP communicating device, the second transmitting device comprises a second HFP communicating device and a second A2DP communicating device; the voice preprocessing device comprises a storage device and an encoding device, wherein the encoding device is connected with the microphone array and is used for encoding a first audio signal acquired by the microphone array, encoding the first audio signal through the encoding device and storing the encoded first audio signal in the storage device, and sending the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established; the playback device is connected with the first A2DP communication device and is used for receiving a second audio signal transmitted by the second A2DP communication device through the first A2DP communication device;

wherein,

the encoding device further comprises a PCM encoding device and an audio encoding device, wherein the PCM encoding device is used for PCM encoding a first audio signal acquired by the microphone array and storing the first audio signal in the storage device, and when the first HFP communication device is connected with the second HFP communication device, the audio encoding device further performs audio encoding on the PCM encoding of the first audio signal into a cvsd or msbc format and transmits the cvsd or msbc format to the second HFP communication device through the first HFP communication device;

or, the encoding device further comprises an audio encoding device, the audio encoding device is used for carrying out audio encoding on the first audio signal acquired by the microphone array into a cvsd or msbc format and then storing the first audio signal in the storage device, and when the first HFP communication device is connected with the second HFP communication device, the first audio signal which is carried out audio encoding into the cvsd or msbc format is transmitted to the second HFP communication device through the first HFP communication device.

2. The intelligent voice system according to claim 1, further comprising a voice cloud server, wherein the voice cloud server is in remote communication with the intelligent device and obtains the first audio signal sent by the intelligent device for processing the first audio signal.

3. The intelligent voice system according to claim 2, wherein data transmission is performed between the voice cloud server and the intelligent device through a wireless network.

4. The intelligent voice system of any one of claims 1 to 3, wherein the microphone array is an analog microphone array or a digital microphone array, and the microphone array comprises 1 to 8 microphones.

5. The intelligent voice system according to any one of claims 1 to 3, wherein the intelligent device is a smart phone, a tablet computer, a smart television or a smart set-top box.

6. The intelligent voice system according to any one of claims 1 to 3, wherein the voice preprocessing device further comprises:

7. The intelligent voice system according to any one of claims 1 to 3, wherein the intelligent device further comprises:

8. A speech processing method for use in an intelligent speech system, comprising the steps of:

(1) the microphone array acquires a first audio signal and sends the first audio signal to the voice preprocessing device;

(2) the voice preprocessing device encodes the first audio signal and stores the first audio signal in a storage device;

(3) transmitting the first audio signal in the storage device to the second HFP communication device through the first HFP communication device after the HFP communication connection is established;

(4) the intelligent equipment processes the first audio signal and then returns a control signal to the voice preprocessing device;

wherein the step (2) further comprises:

(201) the method comprises the steps that a first audio signal acquired by a microphone array is subjected to PCM (pulse code modulation) coding and then stored in a storage device;

(202) after the first HFP communication device is connected with the second HFP communication device, further carrying out audio coding on the PCM code of the first audio signal into a cvsd or msbc format;

(203) transmitting the first audio signal audio-encoded into cvsd or msbc format through the first HFP communicator to the second HFP communicator;

or, the step (2) further comprises:

(204) performing audio coding on a first audio signal acquired by the microphone array into a cvsd or msbc format and storing the cvsd or msbc format in the storage device;

(205) when the first HFP communication device is connected with the second HFP communication device, the first audio signal that is audio-encoded into cvsd or msbc format is transmitted to the second HFP communication device through the first HFP communication device.