WO2020010963A1

WO2020010963A1 - Voice handover method, apparatus, terminal, and computer-readable storage medium

Info

Publication number: WO2020010963A1
Application number: PCT/CN2019/089623
Authority: WO
Inventors: 杨柳
Original assignee: 中兴通讯股份有限公司
Priority date: 2018-07-10
Filing date: 2019-05-31
Publication date: 2020-01-16
Also published as: CN110708417B; CN110708417A

Abstract

Disclosed by the present application are a voice-call voice handover method, apparatus, terminal, and computer-readable storage medium. The method comprises: obtaining first PCM voice data of a receiving terminal during switching of an audio device, and storing said data in a cache; obtaining second PCM voice data of the receiving terminal at the same time as the first PCM voice data after the audio device is successfully handed over; processing the first PCM voice data and the second PCM voice data according to a preset PCM voice data processing strategy to obtain processed third PCM voice data, and transmitting said third PCM voice data to the handed-over audio device.

Description

Voice switching method, device, terminal and computer-readable storage medium

Cross-reference to related applications

This application is based on a Chinese patent application with an application number of 201810752172.2 and an application date of July 10, 2018, and claims the priority of the Chinese patent application. The entire content of this Chinese patent application is incorporated herein by reference.

Technical field

This application relates to, but is not limited to, the field of mobile terminals.

Background technique

During the daily voice call, the user will switch from the hand-held state to the hands-free state during the call out of necessity. When the user switches from the handheld state to the hands-free state while the mobile terminal is away from the user's ear, the user cannot normally listen to the downlink pulse code modulation (PCM) voice data. The specific switching time will vary depending on the operating speed and operating habits of different users, and is generally greater than 2 seconds. When the callee is talking continuously, downlink PCM voice data may not be obtained or lost during the handover process. The longer the switchover time, the more downlink PCM voice data is lost, which will affect the user's reception of the voice information of the callee. Affect the user experience.

Therefore, it is necessary to propose a new voice switching method to solve the above problems.

Summary of the invention

In view of this, embodiments of the present application provide a voice switching method, device, terminal, and computer-readable storage medium for a voice call.

According to an aspect of the embodiments of the present application, a voice switching method provided for a mobile terminal includes:

Acquiring and approaching the first PCM voice data received by the receiving end during the switching process of the voice processing audio device and storing it in the buffer;

Acquiring and approaching the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first PCM voice data and the second PCM voice data are switched at the switching time The playback is complete.

According to another aspect of the embodiments of the present application, a voice switching device is provided, which is applied to the voice switching method. The voice switching device includes: an acquisition module, a cache module, and a processing module, where:

The obtaining module is configured to obtain first PCM voice data received by a receiving end during a switching process of an audio device and second PCM voice data having the same duration as the first PCM voice data after the audio device is successfully switched;

The buffer module is configured to buffer the first PCM voice data and the PCM voice data in the approaching voice processing process;

The processing module is configured to perform approximate speech processing on the first PCM speech data and the second PCM speech data, so that the first PCM speech data and the second PCM speech data are played within the switching time. .

According to another aspect of the present application, a terminal is provided, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, and the computer program is executed by the processor. At this time, the steps of the voice switching method provided in the embodiments of the present application are implemented.

According to another aspect of the present application, a computer-readable storage medium is provided, where the computer-readable storage medium stores a program of the voice switching method, and when the program of the voice switching method is executed by a processor, The steps of implementing the voice switching method provided in the embodiments of the present application are implemented.

Compared with related technologies, a method, device, terminal, and computer-readable storage medium for a voice call provided by the embodiments of the present application acquire and approach the first PCM voice data received by the receiving end during the voice processing audio device switching process. And stored in the buffer; acquiring and approaching the second PCM voice data of the same length as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first PCM voice data and the second PCM voice data The PCM voice data is played within the switching time to achieve seamless switching. Through the above-mentioned technical methods for approaching voice processing, it can be solved that when the voice device is switched during a call, all downlink PCM voice data information can be completely transmitted and played, and seamlessly switched, so that the user will not lose any voice information during the device switching process. The understanding of the user's dialogue information is enhanced, and the user experience is enhanced.

The implementation, functional characteristics and advantages of the purpose of this application will be further described with reference to the embodiments and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a hardware structure of a mobile terminal that implements various embodiments of the present application; FIG.

FIG. 2 is a structural diagram of a communication network system according to an embodiment of the present application; FIG.

3 is a schematic flowchart of a voice switching method for a voice call according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for seamlessly switching voices during an audio device switching process by using an approximation method combining buffering and voice processing according to an embodiment of the present application; FIG.

5 is a schematic structural diagram of a voice switching device for a voice call according to an embodiment of the present application;

6 is a schematic structural diagram of a terminal according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of a voice switching method for a voice call according to an embodiment of the present application.

detailed description

In order to make the technical problems, technical solutions, and beneficial effects to be more clearly understood in the present application, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not intended to limit the application.

In the following description, the use of suffixes such as "module", "component", or "unit" for indicating elements is merely for the benefit of the description of the present application, and it does not have a specific meaning itself. Therefore, "modules," "components," or "units" can be used in combination.

It should be noted that the terms “first” and “second” in the specification and claims of this application are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.

The terminal can be implemented in various forms. For example, the terminals described in this application may include mobile phones, tablets, laptops, palmtop computers, Personal Digital Assistants (PDAs), Portable Media Players (PMPs), navigation devices, Mobile terminals such as wearable devices, smart bracelets, pedometers, and fixed terminals such as digital TVs, desktop computers, etc.

In the subsequent description, a mobile terminal will be taken as an example for explanation. Those skilled in the art will understand that, in addition to the elements specifically used for mobile purposes, the configuration according to the embodiment of the present application can also be applied to a fixed type terminal.

Please refer to FIG. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing the embodiments of the present application. The mobile terminal 100 may include a radio frequency (RF) unit 101, a WiFi module 102, an audio output unit 103, and audio. / Video (A / V) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, power supply 111, and other components. Those skilled in the art can understand that the structure of the mobile terminal shown in FIG. 1 does not constitute a limitation on the mobile terminal. The mobile terminal may include more or fewer components than shown in the figure, or some components may be combined, or different components. Layout.

The following describes each component of the mobile terminal in detail with reference to FIG. 1:

The RF unit 101 may be configured to receive and transmit signals during transmission and reception of information or during a call. Specifically, the downlink information of the base station is received and processed by the processor 110; in addition, uplink data is transmitted to the base station. Generally, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The above wireless communication can use any communication standard or protocol, including but not limited to Global System (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access 2000 (Code Division Multiple Access 2000 (CDMA2000), Wideband Code Division Multiple Access (WCDMA), Time Division Synchronous Code Division Multiple Access (Time Division-Synchronous Code Division, Multiple Access, TD-SCDMA), Frequency Division Duplex Long-Term Evolution (Frequency Division Duplexing-Long Terminal Evolution (FDD-LTE)) and Time Division Duplex Long-Term Evolution (Time Division Duplexing-Long Terminal Evolution (TDD-LTE)).

WiFi is a short-range wireless transmission technology. The mobile terminal can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 102. It provides users with wireless broadband Internet access. Although FIG. 1 shows the WiFi module 102, it can be understood that it does not belong to the necessary configuration of the mobile terminal, and can be omitted as needed without changing the essence of the invention.

The audio output unit 103 may receive the RF unit 101 or the WiFi module 102 or store it in the memory 109 when the mobile terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like. The audio data is converted into audio signals and output as sound. Moreover, the audio output unit 103 may also provide audio output (for example, a call signal receiving sound, a message receiving sound, etc.) related to a specific function performed by the mobile terminal 100. The audio output unit 103 may include a speaker, a buzzer, and the like.

The A / V input unit 104 is configured to receive an audio or video signal. The A / V input unit 104 may include a graphics processing unit (Graphics Processing Unit, GPU) 1041 and a microphone 1042. The graphics processor 1041 pairs static images obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode or The image data of the video is processed. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the RF unit 101 or the WiFi module 102. The microphone 1042 can receive sound (audio data) via the microphone 1042 in an operation mode such as a telephone call mode, a recording mode, a voice recognition mode, and can process such sound into audio data. The processed audio (voice) data can be converted into a format that can be transmitted to a mobile communication base station via the RF unit 101 in the case of a telephone call mode and output. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to eliminate (or suppress) noise or interference generated during the process of receiving and transmitting audio signals.

The mobile terminal 100 further includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can close the display panel 1061 and the display panel 1061 when the mobile terminal 100 moves to the ear. At least one of the backlights. As a type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary. It can be configured as an application that recognizes the attitude of the mobile phone (such as horizontal and vertical screen switching, (Related games, magnetometer attitude calibration), vibration recognition-related functions (such as pedometer, tap), etc .; as for the mobile phone, the fingerprint sensor, pressure sensor, iris sensor, molecular sensor, gyroscope, barometer, hygrometer can also be configured , Thermometer, infrared sensor and other sensors, will not repeat them here.

The display unit 106 is configured to display information input by the user or information provided to the user. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.

The user input unit 107 may be configured to receive inputted numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 107 may include a touch panel 1071 and other input devices 1072. Touch panel 1071, also known as touch screen, can collect user's touch operations on or near it (such as the user using a finger, stylus, etc. any suitable object or accessory on touch panel 1071 or near touch panel 1071 Operation), and drive the corresponding connection device according to a preset program. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and sends it To the processor 110, and can receive the command sent by the processor 110 and execute it. In addition, various types such as resistive, capacitive, infrared, and surface acoustic wave can be used to implement the touch panel 1071. In addition to the touch panel 1071, the user input unit 107 may also include other input devices 1072. Specifically, the other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, which are not limited herein. .

In an embodiment, the touch panel 1071 may cover the display panel 1061. When the touch panel 1071 detects a touch operation on or near the touch panel 1071, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event. The type of touch event provides corresponding visual output on the display panel 1061. Although in FIG. 1, the touch panel 1071 and the display panel 1061 are implemented as two independent components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated. The implementation of the input and output functions of the mobile terminal is not specifically limited here.

The interface unit 108 functions as an interface through which at least one external device can connect with the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port configured to connect a device with an identification module, and audio input / output (I / O) port, video I / O port, headphone port, and more. The interface unit 108 may be configured to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 100 or may be configured to connect the mobile terminal 100 and the external Transfer data between devices.

The memory 109 may be configured to store software programs and various data. The memory 109 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc .; Data (such as audio data, phone book, etc.) created by the use of mobile phones. In addition, the memory 109 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The processor 110 is a control center of the mobile terminal, and uses various interfaces and lines to connect various parts of the entire mobile terminal. The processor 110 runs or executes software programs and / or modules stored in the memory 109, and calls data stored in the memory 109. , Perform various functions of the mobile terminal and process data, so as to monitor the mobile terminal as a whole. The processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, etc. The processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110.

The mobile terminal 100 may further include a power source 111 (such as a battery) for supplying power to various components. Preferably, the power source 111 may be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption management through the power management system. And other functions.

Although not shown in FIG. 1, the mobile terminal 100 may further include a Bluetooth module and the like, and details are not described herein again.

In order to facilitate understanding of the embodiments of the present application, the communication network system on which the mobile terminal of the present application is based is described below.

Please refer to FIG. 2. FIG. 2 is a structural diagram of a communication network system according to an embodiment of the present application. The communication network system is a general mobile communication technology LTE system. The LTE system includes user equipment (User Equipment, UE 201), Evolved UMTS Terrestrial Radio Access Network (E-UTRAN) 202, Evolved Packet Core Network (EPC) 203, and IP service 204 of the operator.

Specifically, the UE 201 may be the foregoing terminal 100, and details are not described herein again.

E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022. The eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (such as an X2 interface), the eNodeB 2021 is connected to the EPC203, and the eNodeB 2021 can provide UE201 to EPC203 access.

EPC203 may include Mobility Management Entity (MME) 2031, Home Subscriber Server (HSS) 2032, other MME 2033, Serving Gateway (SGW) 2034, Packet Data Network Gateway (PDN GateWay) , PGW) 2035 and Policy and Charging Function (Function PCRF) 2036 and so on. Among them, MME2031 is a control node that processes signaling between UE201 and EPC203, and provides bearer and connection management. The HSS2032 is configured to provide some registers to manage functions such as the home location register (not shown in the figure), and holds some user-specific information about service characteristics, data rates, and so on. All user data can be sent through SGW2034. PGW2035 can provide UE 201 IP address allocation and other functions. PCRF2036 is a policy and charging control policy decision point for service data flows and IP bearer resources. It performs functions for policy and charging. Units (not shown) select and provide available policy and billing control decisions.

The IP service 204 may include the Internet, an intranet, an IP Multimedia Subsystem (IMS), or other IP services.

Although the above is described by taking the LTE system as an example, those skilled in the art should know that the embodiments of the present application are not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and New network systems in the future are not limited here.

Based on the above-mentioned mobile terminal hardware structure and communication network system, various embodiments of the method of the present application are proposed.

Please refer to Figure 3. The embodiment of the present application provides a voice switching method for a voice call, which should be configured as a mobile terminal and includes:

S1. Acquire and approach the first PCM voice data received by the receiving end during the voice processing audio device switching process and store the first PCM voice data in the buffer;

S2. Acquire and approach the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first PCM voice data and the second PCM voice data Playback is complete within the switch time, enabling seamless switch.

In an embodiment, the approximation speech processing is: approximating the playback time of PCM speech data by using an approximation method in combination with speech processing to obtain the approximation PCM speech data.

In an embodiment, before step S1 of acquiring and approaching the first PCM voice data received by the receiving end during the process of acquiring and approaching the voice processing audio device switching and storing the first PCM voice data in the buffer, the method further includes: detecting the voice call process The step of switching the audio device includes: the proximity sensor detects the movement of the mobile terminal during a voice call, and if it detects that the mobile terminal has moved, it is determined that the audio device is switched.

Wherein, the speech processing is to change only the rate of speech and keep the intonation and semantics of the speech unchanged, and the speech processing is divided into two stages of speech decomposition and speech synthesis. Wherein: the speech decomposition phase completes the framing of the original PCM speech data, and the decomposed frames are used for speech synthesis processing; let the frame length be N and the frame shift (distance between two adjacent frames) be Sa; the speech synthesis In the phase, according to the shift factor a = Ss / Sa, the frame shift Sa of the speech decomposition phase is changed to the frame shift Ss = Sa * a of the speech synthesis phase. Specifically, the position of the first frame of the speech decomposition phase is maintained, Frame, so that the frame shift Sa becomes Ss, and a preliminary synthesized frame can be obtained.

In an embodiment, in step S1, the step of acquiring and approaching the first PCM voice data received by the receiving end during the switching of the voice processing audio device and storing the first PCM voice data in the buffer includes:

S11. During the voice call, the proximity sensor detects that the mobile terminal is moving and audio device switching occurs, such as switching between the handset and the speaker, and records the first time stamp T1 when the mobile terminal is about to leave the human ear;

S12. Record the second time stamp T2 when the touch screen of the mobile terminal is switched by clicking the audio device, for example, the second time stamp T2 when the touch screen of the mobile terminal is clicked to switch from the handset mode to the speaker mode;

S13. Acquire the first PCM voice data received by the receiving end within the audio device switching time T; wherein the audio device switching time T is a difference between the second time stamp T2 and the first time stamp T1, that is, T = T2-T1. At the same time, the audio device switching time T is also the playback time of the first PCM voice data stored in the buffer during that period;

S14. Send the obtained first PCM voice data to a buffer, and store the buffer in the buffer.

S15. Approach the voice to process the first PCM voice data to obtain the first third PCM voice data.

In an embodiment, in step S2, the acquiring and approximating the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first The steps of playing the PCM voice data and the second PCM voice data within the switching time include:

S21. Play the N-th third PCM voice data;

S22. Acquire the second Nth PCM voice data received by the receiving end after the audio processing audio device is successfully switched during the playing time;

S23: Approach the voice to process the N-th second PCM voice data to obtain the N + 1-th third PCM voice data;

Steps S21 to S23 are executed in a loop until the approximate playback time is shorter than the preset playback time. At this time, N is equal to the number of playback times when the approximate playback time is shorter than the preset playback time. The first PCM voice data and N times are described. The playback of the second PCM voice data is completed within the switching time, where N is an integer greater than or equal to 1.

Please refer to Figure 4. FIG. 4 is a schematic diagram of a method for seamlessly switching voices during an audio device switching process by using an approximation method combining buffering and voice processing according to an embodiment of the present application.

In FIG. 4, the switching of the audio device of the mobile terminal (for example, switching of the handset to the speaker) occurs at time T [T1, T2]. At this time, the first PCM voice data is obtained, and time T is [T2, T3]. Then, the receiving end plays the same time as the device switching time, and at this time, the second PCM voice data is obtained.

The approximation method combined with buffering and speech processing to approximate the playback time is as follows:

Within time T [T1, T2]: complete the first two PCM voice data buffers within the time T [T1, T2] and the two stages of speech decomposition and speech synthesis using the above-mentioned speech processing. After buffering and speech processing, the original playback The PCM voice data (first PCM voice data) with time T [T1, T2] is processed as PCM voice data PCM-T / 2 with playback time Tx of T / 2. At this time, the PCM voice data PCM-T / 2 is For the first and third PCM voice data obtained after processing, at this time, the playback time is reduced by half compared to the original playback time this time, and is reduced to T / 2 of the original playback time.

Within time T / 2 [T2, T2 + T / 2]: complete the playback of the above PCM voice data PCM-T / 2 (that is, send the above PCM voice data PCM-T / 2 to the switched audio device for playback), At the same time when the playback is completed, the PCM voice data within the time T / 2 [T2, T2 + T / 2] (in this case, the PCM voice data includes the second PCM voice data of T / 2) is buffered and the voice decomposition using the above voice processing is performed And speech synthesis. After buffering and speech processing, the PCM voice data with original playback time T / 2 [T2, T2 + T / 2] is processed into PCM voice data with playback time Tx T / 4. T / 4. At this time, the PCM voice data PCM-T / 4 is the second and third PCM voice data obtained after processing. At this time, the playback time is reduced by half compared to the original playback time and reduced to the original playback time. T / 4.

Within time T / 4 [T2 + T / 2, T2 + T / 2 + T / 4]: complete the playback of the above PCM voice data PCM-T / 4 (ie send the above PCM voice data PCM-T / 4 to switch The next audio device to play), while completing the playback, complete the PCM voice data within the time T / 4 [T2 + T / 2, T2 + T / 2 + T / 4] (At this time, the PCM voice data includes (T / 4 second PCM voice data) buffering and speech decomposition and speech synthesis using the above-mentioned speech processing, the original playback time is T / 4 [T2 + T / 2, T2 + T / 2 + T / 4] PCM voice data processing is PCM voice data PCM-T / 8 with playback time Tx of T / 8. At this time, PCM voice data PCM-T / 8 is the third time Three PCM voice data. At this time, the playback time is reduced by half compared to the original playback time and reduced to T / 8 of the original playback time.

… Approximating in sequence until the playback time Tx is less than the preset playback time Tu, the approximation ends.

For example, if the audio device switching time is 2 seconds, in step S1 above, the first PCM voice data during the 2-second audio device switching has been acquired and buffered, and the 2-second PCM voice data is subjected to approximate speech processing. The third PCM voice data with the first playback time of 1 second is obtained. After the audio device is successfully switched, the first second PCM voice data is acquired while playing the first third PCM voice data (1 second playback is complete). The original playback time of the first second PCM voice data is For 1 second, the second PCM voice data for the first time is obtained, and the second PCM voice data for the second time is processed after the first second PCM voice data is approximated to a speech time of 0.5 second.

Acquire the second second PCM voice data while playing the second third PCM voice data (0.5 second playback is complete), the original second playback time of the second second PCM voice data is 0.5 seconds, and the second time is acquired The third and third PCM voice data of the second PCM voice data are subjected to approximate voice processing for the second and second PCM voice data so that the playback time is 0.25 seconds.

Acquire the third second PCM voice data while playing the third third PCM voice data (0.25 second playback is complete). The original playback time of the third second PCM voice data is 0.25 seconds, and the third time is acquired. The fourth third PCM speech data is processed by the second PCM speech data, and the third second PCM speech data is subjected to approximate speech processing so that the playback time is 0.125 seconds.

... approximating in sequence until the playback time Tx is less than the preset playback time Tu (for example, 50 milliseconds), the approximation ends.

The first PCM voice data playback starts at T / 2 and the playback time Tx is T / 2; the second PCM voice data playback starts at T / 2 + T / 4 and the playback time Tx is T / 4; the third The second PCM voice data playback starts at T / 2 + T / 4 + T / 8, and the playback time Tx is T / 8; and so on, at T / 2 + T / 4 + T / 8 ... + T / 2 ^ N completes the Nth PCM voice data playback, and the playback time Tx is T / 2 ^ N. It can be seen that the playback time of each PCM audio data is proportional, the first term is T / 2, and the common ratio is 1/2. According to the following proportional series summation formula, when N tends to infinity, PCM voice data The total playback time is T.

In the formula, T _N is the total playing time, 1/2 is the common ratio between each playing time, and N is the number of playing times. When N approaches infinity, the value of T _N is T.

Through the above-mentioned approximation method combined with buffering and voice processing, when the number of playbacks approaches infinity, all PCM voice data within 2T [T1, T3] without delay and loss can be realized within time T, and seamless switching can be realized.

In the actual playback process, when the playback time Tx is shorter than the preset playback time Tu (for example, 50 milliseconds), the approximation is stopped, and the approximation process ends. Because the playback time Tx is less than the preset playback time Tu (such as 50 milliseconds), the user can hardly subjectively feel that the PCM voice data has not been acquired or lost.

In an embodiment of the present application, there is a certain requirement for the switching time T of the audio device, and the switching time T of the audio device is required to be between a preset minimum switching time Tmin (such as 0.5 seconds) and a preset maximum switching time Tmax (such as 5 seconds). (The preset minimum switching time Tmin <T <the preset maximum switching time Tmax), when the audio device switching time T is greater than the preset maximum switching time Tmax, the audio device switching time is too long, and the PCM voice data information to be processed is relatively large, The user is advised to re-ask the interlocutor to request a repeat. If the switching time T of the audio device is shorter than the preset minimum switching time Tmin, the switching time of the audio device is very short, and the user may hardly obtain the PCM voice data information during this switching operation, and there is no need to perform seamless switching at this time.

Through the above method provided in the embodiment of the present application, the first PCM voice data with a playing time of T (audio device switching time) and the second PCM voice data with the same playing time of T can pass through the embodiments of the present application. The processed third PCM voice data with the playback time T is obtained by approaching the voice buffer processing, and the playback time is reduced by half compared to the original. It can solve the problem that when the voice device is switched during a call, all downlink PCM voice data information can be completely transmitted and played, and seamlessly switched, so that the user will not lose any voice information during the device switching process, and enhance the user's understanding of the dialogue information. Enhanced user experience.

Please refer to Figure 5. The embodiment of the present application provides a voice switching device for a voice call, which should be configured as a mobile terminal. The voice switching device 300 includes a detection module 301, an acquisition module 302, a cache module 303, and a processing module 304, of which:

The detection module 301 is configured to detect an audio device switch during a voice call;

The obtaining module 302 is configured to obtain first PCM voice data received by a receiving end during a switching process of an audio device and second PCM voice data having the same length as the first PCM voice data after the audio device is successfully switched;

The buffer module 303 is configured to buffer the first PCM voice data and the PCM voice data during the approaching voice processing;

The processing module 304 is configured to perform approximate speech processing on the buffered first PCM speech data and the second PCM speech data after the audio device is successfully switched, so that the first PCM speech data and the second The PCM voice data is played within the switching time.

In one embodiment, the detection module 301 is a proximity sensor.

It should be noted that the foregoing device embodiments and method embodiments belong to the same concept. For specific implementation processes, refer to the method embodiments, and the technical features in the method embodiments are correspondingly applicable in the device embodiments, and are not repeated here.

The technical solution of the present application is further described in detail in combination with the application examples below.

In the embodiment of the present application, an example is described in which a voice call is seamlessly switched from a handset to a speaker during a call.

Please refer to Figure 7. An application embodiment of the present application provides a voice switching method for a voice call, which is applied to a mobile terminal and includes:

S701. Detect audio device switching during a voice call. The proximity sensor detects the movement of the mobile terminal during a voice call. If a movement of the mobile terminal is detected, it is determined that an audio device switching situation has occurred.

S702. During a voice call, it is detected that the mobile terminal moves, and audio device switching occurs, that is, switching between the handset and the speaker, and records the first time stamp T1 when the mobile terminal is about to leave the human ear.

S703. Record the second time stamp T2 when the touch screen of the mobile terminal is switched by clicking the audio device, that is, record the second time stamp T2 when the touch screen of the mobile terminal is clicked to switch from the handset mode to the speaker mode;

S704. Acquire the first PCM voice data within the audio device switching time T and store it in the buffer; wherein the audio device switching time T is a difference between the second time stamp T2 and the first time stamp T1. That is, T = T2-T1. At the same time, the audio device switching time T is also the playing time of the first PCM voice data stored in the buffer during this period.

S705: Approach the voice to process the first PCM voice data to obtain the first third PCM voice data.

S706. Play the first third PCM voice data, obtain the first second PCM voice data within the playback time, and approach the voice processing the first second PCM voice data to obtain the second third PCM voice data.

S707. Play the N-th third PCM voice data, obtain the N-th second PCM voice data within the playback time, and approximate the voice to process the N-th second PCM voice data to obtain the N + 1-th third PCM voice data. , Where N> = 2.

S708. After approximation combined with speech processing, compare the approximated playback time with the preset playback time. If the approximated playback time is less than the preset playback time, stop the approximation and go to S709, otherwise, go to S707 and continue the approximation playback. time.

S709. The approaching ends.

In addition, an embodiment of the present application further provides a terminal. As shown in FIG. 6, the terminal 900 includes: a memory 902, a processor 901, and a processor stored in the memory 902 and operable on the processor 901. Or multiple computer programs, the memory 902 and the processor 901 are coupled together through a bus system 903, and the one or more computer programs are executed by the processor 901 to implement a method provided by an embodiment of the present application The following steps of the voice switching method for a voice call:

That is, when the processor 901 runs a computer program, the steps of the method of the embodiment of the present invention are implemented.

The methods disclosed in the embodiments of the present application may be applied to the processor 901, or implemented by the processor 901. The processor 901 may be an integrated circuit chip and has a signal processing capability. In the implementation process, each step of the foregoing method may be completed by using an integrated logic circuit of hardware in the processor 901 or an instruction in the form of software. The processor 901 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. The processor 901 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application. A general-purpose processor may be a microprocessor or any conventional processor. In combination with the steps of the method disclosed in the embodiments of the present application, the steps may be directly implemented by a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium. The storage medium is located in the memory 902. The processor 901 reads the information in the memory 902 and completes the steps of the foregoing method in combination with its hardware.

It can be understood that the memory 902 in the embodiment of the present application may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), and an erasable programmable read-only memory (PROM). , EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash (Memory) or other memory technologies, CD-ROM read-only Memory (Compact Disk Read-Only Memory (CD-ROM), Digital Video Disk (DVD) or other optical disk storage, magnetic box, magnetic tape, disk storage or other magnetic storage devices; volatile memory can be random Random Access Memory (RAM). By way of example but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Random Access Random Memory. Access Memory (SSRAM), Dynamic Random Access Memory (Dynamic Random Access Memory) mory (DRAM), synchronous dynamic random access memory (Synchronous Random Access Memory, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced synchronous dynamic random access memory (DDRSDRAM) Access memory (Enhanced, Dynamic, Random, Access, Memory, ESDRAM), Synchronous Link Dynamic Random Access Memory (SyncLink, Random, Access Memory, SLDRAM), Direct Memory Bus Random Access Memory (Direct Rambus, Random Access Memory, DRRAM). The memories described in the embodiments of the present application are intended to include, but not limited to, these and any other suitable types of memories.

It should be noted that the above-mentioned terminal embodiments and method embodiments belong to the same concept. For specific implementation processes, refer to the method embodiments, and the technical features in the method embodiments are correspondingly applicable in the terminal embodiments, and are not repeated here.

In addition, in an exemplary embodiment, an embodiment of the present application further provides a computer storage medium, specifically a computer-readable storage medium, such as a memory 902 including a computer program, where the computer storage medium stores a voice of a voice call. One or more programs of the switching method. When one or more programs of the voice switching method of the voice call are executed by the processor 901 to implement the following steps of the voice switching method of the voice call provided by the embodiment of the present application:

That is, when one or more programs of the voice switching method of the voice call are executed by the processor 901, the method provided in the embodiment of the present application is implemented.

It should be noted that the foregoing program switching method embodiment and method embodiment of a voice switching method for a voice call on a computer-readable storage medium belong to the same concept. For specific implementation processes, see the method embodiment, and the technical features in the method embodiment are described above. The embodiments of the computer-readable storage medium are correspondingly applicable, and are not repeated here.

A method, device, terminal, and computer-readable storage medium for a voice call provided by the present application, obtain and approach the first PCM voice data received by a receiving end during a voice processing audio device switching process, and store it in a buffer; obtain and Approach the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first PCM voice data and the second PCM voice data are played within the switching time Done for seamless switching. Through the above-mentioned technical methods for approaching voice processing, it can be solved that when the voice device is switched during a call, all downlink PCM voice data information can be completely transmitted and played, and seamlessly switched, so that the user will not lose any voice information during the device switching process. The understanding of the user's dialogue information is enhanced, and the user experience is enhanced.

It should be noted that, in this article, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, It also includes other elements not explicitly listed, or elements inherent to such a process, method, article, or device. Without more restrictions, an element limited by the sentence "including a ..." does not exclude that there are other identical elements in the process, method, article, or device that includes the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the superiority or inferiority of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods in the above embodiments can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware, but in many cases the former is better. Implementation. Based on such an understanding, the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM, magnetic disk, The CD-ROM) includes several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the specific implementations described above, and the specific implementations described above are only schematic and not restrictive. Those of ordinary skill in the art at Under the enlightenment of this application, many forms can be made without departing from the scope of the present application and the scope of protection of the claims, and these all fall into the protection of this application.

Claims

A voice switching method includes:

Acquiring and approaching the first PCM voice data received by the receiving end during the switching process of the voice processing audio device and storing it in the buffer;

Acquiring and approaching the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, so that the first PCM voice data and the second PCM voice data are switched at the switching time The playback is complete.
The method according to claim 1, wherein before the step of acquiring and approaching the first PCM voice data received by the receiving end during the process of switching the voice processing audio device and storing the first PCM voice data in the buffer, the method further comprises: detecting Audio device switching during a voice call.
The method according to claim 1, wherein the step of obtaining and approximating the first PCM voice data received by the receiving end during the switching of the voice processing audio device and storing it in the buffer comprises:

When the audio device switching occurs, the first time stamp T1 at which the mobile terminal is about to leave the human ear is recorded;

Record the second time stamp T2 when the touch screen of the mobile terminal is switched by clicking the audio device;

Acquiring the first PCM voice data within the audio device switching time T and storing it in a buffer; wherein, the audio device switching time T = T2-T1;

The first PCM speech data is processed by approaching the speech to obtain the first third PCM speech data.
The method according to claim 3, wherein the acquiring and approximating the second PCM voice data of the same duration as the first PCM voice data received by the receiving end after the voice processing audio device is successfully switched, makes the first PCM The step of playing the voice data and the second PCM voice data within the switching time includes:

Play N-th third PCM voice data;

Acquiring the Nth second PCM voice data received by the receiving end after the switching of the audio processing audio device is successful within the playing time;

Approximate the speech and process the Nth second PCM speech data to obtain the N + 1th third PCM speech data;

The above steps are performed in a loop until N is equal to the number of playback times when the playback time is shorter than the preset playback time, where N is an integer greater than or equal to 1.
The method according to any one of claims 1 to 4, wherein the approximation speech processing is: approximating the playback time of PCM speech data by using an approximation method in combination with speech processing to obtain the approximation PCM speech data.
The method according to claim 5, wherein the speech processing is divided into two stages of speech decomposition and speech synthesis; wherein:

In the speech decomposition phase, the framing of the original PCM speech data is completed, and the decomposed frames are used for speech synthesis processing; the frame length is set to N and the frame shift is Sa;

In the speech synthesis stage, the position of the first frame of the speech decomposition stage is maintained. After each frame is moved, the frame shift Sa of the speech decomposition stage is changed to the frame shift Ss = Sa of the speech synthesis stage according to the shift factor a = Ss / Sa. * a.
A voice switching device includes: an acquisition module, a cache module, and a processing module, wherein:

The obtaining module is configured to obtain first PCM voice data received by a receiving end during a switching process of an audio device and second PCM voice data having the same duration as the first PCM voice data after the audio device is successfully switched;

The buffer module is configured to buffer the first PCM voice data and the PCM voice data in the approaching voice processing process;

The processing module is configured to perform approximate speech processing on the first PCM speech data and the second PCM speech data, so that the first PCM speech data and the second PCM speech data are played within the switching time. .
The device according to claim 7, wherein the device further comprises a detection module configured to detect an audio device switch during a voice call.
A terminal includes: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the computer program is executed by the processor, any one of claims 1 to 6 is implemented. Steps of a speech switching method.
A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the voice switching method according to any one of claims 1 to 6 are implemented.