WO2020078267A1 - 在线翻译过程中的语音数据处理方法及装置 - Google Patents

在线翻译过程中的语音数据处理方法及装置 Download PDF

Info

Publication number
WO2020078267A1
WO2020078267A1 PCT/CN2019/110556 CN2019110556W WO2020078267A1 WO 2020078267 A1 WO2020078267 A1 WO 2020078267A1 CN 2019110556 W CN2019110556 W CN 2019110556W WO 2020078267 A1 WO2020078267 A1 WO 2020078267A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
terminal
data packet
attribute value
threshold
Prior art date
Application number
PCT/CN2019/110556
Other languages
English (en)
French (fr)
Inventor
张鑫
闫伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020078267A1 publication Critical patent/WO2020078267A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2242/00Special services or facilities
    • H04M2242/12Language recognition, selection or translation arrangements

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular, to a method and device for processing voice data in an online translation process.
  • Two users who make a call through the terminal can perform online translation to translate their voices into the other's native language when the two parties' native languages are different, so as to eliminate the language barrier between the two parties.
  • the terminal when the voice data collected by terminal 1 includes invalid voice data (for example, audio data collected by the terminal when the user has not spoken for a period of time during the call) during the call between the two parties, the terminal still uploads the voice data To the translation server to enable the translation server to translate the voice data. Since providers of translation services usually charge according to the volume or duration of voice data, when the terminal uploads invalid voice data to the translation server, the terminal consumes more Traffic, thereby increasing call costs.
  • invalid voice data for example, audio data collected by the terminal when the user has not spoken for a period of time during the call
  • the embodiments of the present application provide a voice data processing method and device in an online translation process, which can save the traffic consumption of online translation.
  • an embodiment of the present application provides a voice data processing method in an online translation process, which is applied to a scenario where a terminal performs voice communication.
  • the method includes: the terminal obtains a first voice data packet, and the first voice data packet includes Voice data within a preset time period; and when the terminal determines that there is no valid voice data in the first voice data packet, the terminal stops uploading voice data to the translation server.
  • the first voice data packet may be determined If there is valid voice data in the first voice data, if there is no valid voice data in the first voice data packet, the terminal stops uploading voice data to the translation server. Since the terminal can determine whether to stop uploading voice data to the translation server, it can make the terminal more intelligent And when there is no valid voice data in the first voice data packet, the terminal does not need to upload voice data to the translation server anymore, so that the data consumption of online translation can be saved.
  • the method for the terminal to determine whether valid voice data exists in the first voice data packet may include: the terminal determines whether the first voice data packet satisfies the first attribute value of the first voice data packet A preset condition, the preset condition includes one or more of the following conditions: a discrete condition, a continuity condition, and a compactness condition; the first attribute value of the first voice data packet does not satisfy the preset condition, then the terminal It is determined that there is no valid voice data in the first voice data packet; the first attribute value of the first voice data packet satisfies the foregoing preset condition, and then the terminal determines that there is valid voice data in the first voice data packet.
  • the voice data processing method in the online translation process further includes: the terminal determines that there is valid voice data in the first voice data packet, and the terminal continues to upload the voice data to the translation server to translate The server continues to translate the voice data uploaded by the terminal to ensure that the terminal smoothly communicates with the voice.
  • the terminal after the terminal stops sending voice data to the translation server, when the terminal obtains a new voice data packet (called a second voice data packet), the terminal determines that there is valid voice data in the second voice data packet , The terminal restores the connection with the translation server and starts uploading the second voice data packet to the translation server.
  • a second voice data packet a new voice data packet
  • the voice data processing method in the online translation process further includes: the terminal sends a first voice data packet to the translation server; and the terminal receives the translation of the first voice data packet from the translation server Results; the terminal determines that the first voice data packet meets the first condition, the terminal stops uploading voice data to the translation server; wherein, the first condition includes one or more of the following conditions: there is no valid in the first voice data packet Voice data; the translation result of the first voice data packet is empty.
  • the terminal when the terminal determines that there is no valid voice data in the first voice data packet (that is, the first attribute value of the first voice data packet does not satisfy the preset condition), the terminal may also combine the translation result returned by the translation server, Determine whether to stop uploading voice data to the translation server.
  • the method for the terminal to stop uploading voice data to the translation server may include: the terminal displays a first prompt box; and the terminal receives the user's first operation on the first prompt box, and in response to the first operation, breaks Open the connection with the translation server.
  • the terminal may prompt the user whether the connection with the translation server needs to be disconnected, so that the user experience can be improved.
  • the method for the terminal to stop uploading voice data to the translation server may include: the terminal maintains a connection with the translation server, and the terminal stops sending voice data to the translation server.
  • the terminal may continue to maintain a connection with the translation server, but the terminal stops uploading voice data to the translation server. In this case, the terminal will continue to acquire the voice data packet sent by the user, and determine whether the first attribute value of the newly acquired voice data packet meets the preset condition.
  • the preset conditions include discrete conditions, continuity conditions, and compactness conditions; if the first attribute value of the first voice data packet does not satisfy the discrete conditions and does not satisfy the continuity conditions, and If the compactness condition is not satisfied, the terminal determines that there is no valid voice data in the first voice data packet.
  • the terminal may determine the first voice more accurately There is no valid voice data in the data packet.
  • the first attribute value of the first voice data packet includes the signal-to-noise ratio of each data frame of the first voice data packet.
  • the above-mentioned discrete conditions include: the variance of the first attribute value of the first voice data packet is greater than the variance threshold.
  • the variance of the first attribute value of the first voice data packet when the variance of the first attribute value of the first voice data packet is greater than the variance threshold (that is, the discrete condition is satisfied), it means that the first attribute value of the first voice data packet has discreteness, and further explains the first There is valid voice data in the voice data packet (that is, the user has issued a voice); when the variance of the first attribute value of the first voice data packet is less than or equal to the variance threshold (that is, the discrete condition is not met), it means that the first voice data packet The first attribute value does not have discreteness, which further indicates that there is no valid voice data in the first voice data packet (that is, the user does not utter voice).
  • the continuity count of the first attribute value of the first voice data packet when the continuity count of the first attribute value of the first voice data packet is greater than the continuity threshold (that is, the continuity condition is satisfied), it means that the first attribute value has continuity, which further explains that the first voice data packet There is valid voice data (that is, the user has voice); when the continuity count of the first attribute value is less than or equal to the continuity threshold (that is, the continuity condition is not met), it means that the first attribute value does not have continuity, and further explains the first There is no valid voice data in the voice data packet (that is, the user has not spoken).
  • the first attribute value of the first voice data packet includes N, and N is an integer greater than or equal to 1.
  • the first attribute value begins to redetermine the continuity count.
  • the discrete count of the first attribute value is greater than the discrete threshold, that is, the first i first attribute values do not have continuity, it means that there is no valid voice data corresponding to the first i first attribute values For voice data, therefore, the continuity count of the first attribute value and the discrete count of the first attribute value are cleared, and then continue to determine whether the remaining Ni first attribute values have continuity.
  • the above introduction of the discrete count and discrete threshold of the first attribute value can more accurately determine whether the first attribute value has continuity.
  • the compactness count of the first attribute value of the first voice data packet when the compactness count of the first attribute value of the first voice data packet is greater than the compactness threshold (that is, the compactness condition is satisfied), it means that the first attribute value has compactness, and further explains the first There is valid voice data in a voice data packet (that is, the user has issued a voice); when the compactness count of the first attribute value is less than or equal to the compactness threshold (that is, the compactness condition is not met), it means that the first attribute value is not With compactness, it further explains that there is no valid voice data in the first voice data packet (that is, the user does not utter voice).
  • an embodiment of the present application provides a terminal, including: one or more processors, a memory, a communication interface, and a microphone; the memory, the communication interface are coupled to the processor; the microphone is used to capture voice data; and the memory is used Store computer program code; the computer program code includes computer instructions, and when the processor executes the computer instructions, the processor is used to control the microphone to obtain a first voice data packet, the first voice data packet includes a preset time period Voice data; the processor is also used to stop uploading voice data to the translation server if it is determined that there is no valid voice data in the first voice data packet.
  • the above processor is specifically configured to determine whether the first attribute value of the first voice data packet acquired by the microphone meets a preset condition, and the preset condition includes one or more of the following conditions: discreteness Condition, continuity condition, compactness condition; the first attribute value of the first voice data packet does not satisfy the preset condition, it is determined that there is no valid voice data in the first voice data packet; the first voice data packet If an attribute value meets the preset condition, it is determined that there is valid voice data in the first voice data packet.
  • the above processor is further used to determine that there is valid voice data in the first voice data packet, and continue to upload voice data to the translation server through the communication interface.
  • the above processor is further used to send the first voice data packet to the translation server through the communication interface; and to receive the translation result of the first voice data packet from the translation server; the above processor is also used to When the voice data packet meets the first condition, stop uploading voice data to the translation server; where the first condition includes one or more of the following conditions: No valid voice data exists in the first voice data packet; the first voice data The translation result of the package is empty.
  • the terminal provided by the embodiment of the present application further includes a touch screen; the above processor is also used to control the touch screen to display a first prompt box; the above processor is also used to receive a first prompt box displayed by the user on the touch screen The first operation; in response to the first operation, disconnects the translation server.
  • the above processor is also used to maintain the connection with the translation server and stop sending voice data to the translation server.
  • the preset conditions include discrete conditions, continuity conditions, and compactness conditions; the processor is specifically used when the first attribute value of the first voice data packet does not satisfy the discrete conditions, and If the continuity condition is not satisfied and the compactness condition is not satisfied, it is determined that there is no valid voice data in the first voice data packet.
  • the first attribute value of the first voice data packet includes the signal-to-noise ratio of each data frame of the first voice data packet.
  • the above-mentioned discrete conditions include: the variance of the first attribute value of the first voice data packet is greater than the variance threshold.
  • an embodiment of the present application provides a computer storage medium including computer instructions.
  • the terminal executes any one of the first aspect and various possible designs thereof A method for processing voice data during online translation.
  • an embodiment of the present application provides a computer program product that, when the computer program product runs on a computer, causes the computer to perform the online translation process described in any one of the first aspect and various possible designs thereof Voice data processing method in
  • FIG. 1 is a schematic hardware diagram of a terminal according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram 1 of an example of a voice communication scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram 2 of an example of a voice communication scenario provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram 1 of a voice data processing method in an online translation process provided by an embodiment of this application;
  • FIG. 5 is a second schematic diagram of a voice data processing method in an online translation process provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of an example of a display interface provided by the prior art
  • FIG. 7 is a schematic diagram 3 of a voice data processing method in an online translation process provided by an embodiment of this application.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.
  • Embodiments of the present application provide a method and device for processing voice data in an online translation process, which can be applied to the process of voice communication between two terminals, and the terminal collects the voice activity detection (VAD) technology. Whether there is valid voice data in the voice data of, and then determine whether to upload the voice data to the translation server. Specifically, after the terminal acquires voice data within a preset time period (hereinafter referred to as first voice data packets), the terminal determines that there is no valid voice data (that is, real voice data) in the first voice data packet, and the terminal stops The translation server uploads the voice data.
  • first voice data packets the terminal determines that there is no valid voice data (that is, real voice data) in the first voice data packet, and the terminal stops The translation server uploads the voice data.
  • the terminal may determine whether the first attribute value of the first voice data packet meets the preset condition; if the first attribute value of the first voice data packet does not satisfy the preset condition, the first voice There is no valid voice data in the data packet, and the upload of voice data to the translation server is stopped, that is, the terminal does not upload voice data to the translation server subsequently.
  • the preset condition includes one or more of the following conditions: discrete condition, continuity condition, and compactness condition.
  • the terminal continues or starts uploading voice data to the translation server.
  • the terminal determines that the voice data packet obtained by it does not have valid voice data, it stops uploading the voice data to the translation server.
  • the terminal is more intelligent, and the data consumption of online translation can be saved.
  • the terminal in the embodiments of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (PC), a wearable terminal (such as a smart watch), a tablet computer, and augmented reality (augmented reality (AR) ⁇ virtual Reality (virtual reality, VR) equipment, in-vehicle computers, etc., the following embodiments do not specifically limit the specific form of the terminal.
  • a portable computer such as a mobile phone
  • a notebook computer such as a personal computer (PC)
  • a wearable terminal such as a smart watch
  • a tablet computer such as augmented reality (augmented reality (AR) ⁇ virtual Reality (virtual reality, VR) equipment, in-vehicle computers, etc.
  • AR augmented reality
  • VR virtual reality
  • FIG. 1 illustrates a schematic structural diagram of a terminal 100 provided by an embodiment of the present application.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, key 190, motor 191, indicator 192, camera 193, display screen 194, And a subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 100.
  • the terminal 100 may include more or less components than shown, or combine some components, or split some components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor. (image) signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural-network processing unit (NPU) Wait.
  • image image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the above controller may be the nerve center and command center of the terminal 100.
  • the controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. The repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • Interfaces can include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit, sound, I2S) interface, pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous transceiver (universal asynchronous) receiver / transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input / output (GPIO) interface, subscriber identity module (SIM) interface, and / Or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input / output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the terminal 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, to realize the function of answering the phone call through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to realize the function of answering the call through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 to peripheral devices such as the display screen 194 and the camera 193.
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI) and so on.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the terminal 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the terminal 100.
  • the GPIO interface can be configured via software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor 110 to the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 may be used to connect a charger to charge the terminal 100, or may be used to transfer data between the terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones.
  • the interface can also be used to connect other terminals, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the terminal 100.
  • the terminal 100 may also use different interface connection methods in the foregoing embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the terminal through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and / or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be disposed in the processor 110.
  • the power management module 141 and the charging management module 140 may also be set in the same device.
  • the wireless communication function of the terminal 100 can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 2G / 3G / 4G / 5G and the like applied to the terminal 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive the electromagnetic wave from the antenna 1, filter and amplify the received electromagnetic wave, and transmit it to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor and convert it to electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be transmitted into a high-frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110, and may be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), and global navigation satellite systems that are applied to the terminal 100. (global navigation system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS global navigation system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives the electromagnetic wave via the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor 110.
  • the wireless communication module 160 may also receive the signal to be transmitted from the processor 110, frequency-modulate it, amplify it, and convert it to electromagnetic waves through the antenna 2 to radiate it out.
  • the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global mobile communication system (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long-term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology, etc.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a beidou navigation system (BDS), and a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and / or satellite-based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS beidou navigation system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation system
  • the terminal 100 implements a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connecting the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the terminal 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the terminal 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP processes the data fed back by the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also optimize the algorithm of image noise, brightness and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be set in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and projects it onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (charge coupled device, CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CCD charge coupled device
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the terminal 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 100 is selected at a frequency point, the digital signal processor is used to perform Fourier transform on the energy at the frequency point.
  • Video codec is used to compress or decompress digital video.
  • the terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (moving picture experts, MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 moving picture experts, MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent recognition of the terminal 100, such as image recognition, face recognition, voice recognition, and text understanding.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to achieve expansion of the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one function required application programs (such as sound playback function, image playback function, etc.) and so on.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the terminal 100 and the like.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (UFS), and so on.
  • UFS universal flash memory
  • the terminal 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and also used to convert analog audio input into digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal 100 may listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also known as "handset" is used to convert audio electrical signals into sound signals.
  • the terminal 100 answers a call or voice message, it can answer the voice by holding the receiver 170B close to the ear.
  • Microphone 170C also known as “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a person's mouth, and input a sound signal to the microphone 170C.
  • the terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C. In addition to collecting sound signals, it may also implement a noise reduction function. In other embodiments, the terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the headset interface 170D is used to connect wired headsets.
  • the headphone jack 170D may be a USB jack 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and the American Telecommunications Industry Association (cellular telecommunications industry association of the United States, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA American Telecommunications Industry Association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may be a parallel plate including at least two conductive materials. When force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the terminal 100 determines the intensity of the pressure according to the change in capacitance.
  • the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the terminal 100 may calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the terminal 100.
  • the angular velocity of the terminal 100 about three axes i.e., x, y, and z axes
  • the gyro sensor 180B can be used for shooting anti-shake.
  • the gyro sensor 180B detects the shaking angle of the terminal 100, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to counteract the shaking of the terminal 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the terminal 100 calculates the altitude by using the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the terminal 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the terminal 100 may detect the opening and closing of the clamshell according to the magnetic sensor 180D.
  • characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of the terminal, and be used in applications such as horizontal and vertical screen switching and pedometers.
  • the distance sensor 180F is used to measure the distance.
  • the terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting scenes, the terminal 100 may use the distance sensor 180F to measure distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the terminal 100 emits infrared light outward through the light emitting diode.
  • the terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100.
  • the terminal 100 can use the proximity light sensor 180G to detect that the user is holding the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense the brightness of ambient light.
  • the terminal 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the terminal 100 can use the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take a picture of the fingerprint, and answer the call with the fingerprint.
  • the temperature sensor 180J is used to detect the temperature.
  • the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 performs to reduce the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature. In some other embodiments, when the temperature is below another threshold, the terminal 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.
  • Touch sensor 180K also known as "touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch sensor 180K and the display screen 194 constitute a touch screen, also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the terminal 100, which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human body part.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive a blood pressure beating signal.
  • the bone conduction sensor 180M may also be provided in the earphone and combined into a bone conduction earphone.
  • the audio module 170 may parse out the voice signal based on the vibration signal of the vibrating bone block of the voice part acquired by the bone conduction sensor 180M to realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M to implement the heart rate detection function.
  • the key 190 includes a power-on key, a volume key, and the like.
  • the key 190 may be a mechanical key. It can also be a touch button.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.
  • the motor 191 may generate a vibration prompt.
  • the motor 191 can be used for vibration notification of incoming calls and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminder, receiving information, alarm clock, game, etc.
  • Touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate a charging state, a power change, and may also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the terminal 100.
  • the terminal 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the terminal 100 interacts with the network through the SIM card to realize functions such as call and data communication.
  • the terminal 100 uses eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
  • a user 210 uses a terminal 200 to communicate with a user 310 using a terminal 300, that is, the terminal 200 and the terminal 300
  • a terminal 300 that is, the terminal 200 and the terminal 300
  • voice communication taking terminal 200 as the terminal in the embodiment of the present application as an example, assume that the native language of the holder of terminal 200 (ie user 210) is English, and the native language of the holder of terminal 300 (ie user 310) is Chinese
  • the voice data collected by the terminal 200 may include uplink voice data (voice data sent by the local user (user 210)) and downlink voice data (voice data sent by the opposite user (user 310)), and the terminal 200 may control the translation of the uplink Voice data or downlink voice data, and the translation result is displayed on the terminal 200 in text or broadcast by voice.
  • the user 210 issues voice data 1 “Are You Mr? Wang” (the voice data 1 is uplink voice data), and the terminal 200 used by the user 210 collects the voice data 1 and uploads the voice data to the translation server 400.
  • the translation server 400 translates the voice data 1
  • the translated voice data or text data is sent to the terminal 200, so that the terminal 000 plays the translated voice data to the receiver through the terminal 200 "You are Mr. Wang "?”
  • user 310 hears the translated voice data
  • user 310 responds to voice data 2 "No, I am not!
  • terminal 200 collects voice data 2 (the voice data 2 is downlink voice data) Then upload it to the translation server 400.
  • the translation server 400 After the translation server 400 translates the voice data 2, the translated voice data or text data is sent to the terminal 200, so that the terminal 200 plays the translated voice through the receiver of the terminal 200 The data is "No, I am not! In this way, the terminal 200 and the terminal 300 conduct subsequent calls according to the above-mentioned procedure.
  • FIG. 3 is a schematic diagram of a communication scenario applied by another voice data processing method in an online translation process provided by an embodiment of the present application.
  • the user 210 uses the terminal 200 and the user 310 using the terminal 300 to perform voice information, that is, the terminal 200 and the terminal 300 perform
  • voice information that is, the terminal 200 and the terminal 300 perform
  • both terminal 200 and terminal 300 have the function of recognizing or playing the native language of the other party
  • both terminal 200 and terminal 300 can control the translation of their respective upstream voice data, and the execution of terminal 200 and terminal 300 during the online translation process
  • the action is similar.
  • the terminal 200 assume that the native language of the user 210 is English and the native language of the user 310 is Chinese.
  • the user 210 emits voice data 1 “Are you Mr. Wang?”.
  • the terminal 200 used by the user 210 collects After the voice data 1, the voice data 1 is uploaded to the translation server 400. After the translation server 400 translates the voice data 1, the translated voice data or text data is sent to the terminal 200, so that the terminal 200 passes the terminal 200. The receiver of the player plays the translated voice data "Are you Mr. Wang?". After the user 310 hears the translated voice data, the user 310 responds to the voice data 2 "No, I am not! After the voice data 2, the voice data 2 is uploaded to the translation server 400. After the translation server 400 translates the voice data 2, the translated voice data or text data is sent to the terminal 300, so that the terminal 300 passes the terminal 300 ’s The receiver plays the translated voice data "No, I am not!. In this way, the terminal 200 and the terminal 300 make subsequent calls according to the above process
  • Embodiments of the present application provide a voice data processing method in an online translation process, which can be applied to two terminals for voice communication.
  • the voice data processing in the online translation process may include S101-S104:
  • the terminal obtains a first voice data packet.
  • the terminal may be a local device in the two terminals that perform voice communication, and may be applied to any terminal.
  • the first voice data packet includes voice data within a preset time period.
  • the microphone 170C also called "microphone" of the terminal can acquire voice data.
  • the voice data may be uplink voice data or downlink voice data.
  • the voice data is referred to as voice data.
  • the voice data includes voice data sent by a user (local user or opposite user) and Environmental noise around the terminal, etc.
  • the microphone of the terminal may acquire voice data within a preset time period (for example, 1 minute or 40 seconds) to form a first voice data packet.
  • the terminal determines whether valid voice data exists in the first voice data packet.
  • the terminal determines that there is no valid voice data in the first voice data packet, and the terminal stops uploading the voice data to the translation server.
  • the terminal when there is no valid voice data in the first voice data packet, the terminal does not need to upload the voice data to the translation server, so that the data consumption of online translation can be saved.
  • the terminal determines that there is valid voice data in the first voice data packet, and the terminal continues to upload voice data to the translation server.
  • the terminal when there is valid voice data in the first voice data packet, the terminal will continue to upload voice data to the translation server, so that the translation server continues to translate the voice data uploaded by the terminal to ensure that the terminal smoothly performs voice communication.
  • the terminal after the terminal stops sending voice data to the translation server, when the terminal obtains a new voice data packet (called a second voice data packet), if the terminal determines the second voice data packet If there is valid voice data in, the terminal restores the connection with the translation server and starts to upload the second voice data packet to the translation server.
  • a second voice data packet a new voice data packet
  • the terminal may also determine whether to stop uploading voice data to the translation server based on the translation result of the first voice data packet returned by the translation server.
  • the voice data processing method in the online translation process provided by the embodiment of the present application may further include S105-S106:
  • the terminal sends the first voice data packet to the translation server.
  • the terminal receives the translation result of the first voice data packet from the translation server.
  • the execution order of the above S105-S106 and S102 may not be limited, that is, the terminal may execute S102 first, and then execute S105-S106, or the terminal executes S105-S106 first, and then executes S102, and Or the terminal executes S102 and S105-S106 at the same time.
  • the terminal may determine whether to stop uploading voice data to the translation server through the following S107:
  • the terminal determines that the first voice data packet meets the first condition, and the terminal stops uploading voice data to the translation server, where the first condition includes one or more of the following conditions: No valid voice data exists in the first voice data packet ; The translation result of the first voice data packet is empty.
  • the terminal when the terminal determines that there is no valid voice data in the first voice data packet, the terminal also combines the translation result returned by the translation server to determine whether to stop uploading the voice data to the translation server.
  • the terminal when the first There is no valid voice data in a voice data packet, and the translation result returned by the translation server is empty (that is, there is no translation result), further indicating that there is no valid voice data in the first voice data packet, which can improve the accuracy of detecting valid voice data ; Further, the terminal no longer uploads voice data to the translation server, which can save the data consumption of online translation and reduce the user's call cost.
  • the terminal stops uploading the voice data to the translation server includes S1031-S1032:
  • S1031 The terminal displays a first prompt box.
  • the terminal when the terminal determines that the first voice data packet does not satisfy the preset condition, the terminal displays a first prompt box on the display screen of the terminal, and the first prompt box is used to prompt the user whether to disconnect the terminal and translate Connection between servers.
  • the terminal receives the user's first operation on the first prompt box, and in response to the first operation, disconnects the translation server.
  • the terminal may receive the user's first operation on the first prompt box, where the first operation is an operation that the user triggers the terminal to disconnect from the translation server.
  • the first operation may be any operation such as a user's click operation, double-click operation, or long-press operation on the first prompt box. Then, in response to the user's first operation on the first prompt box, the terminal disconnects the terminal from the translation server.
  • the screen of the terminal 200 displays the first A prompt box 500
  • the content in the prompt box 500 includes: “whether to disconnect the connection with the translation server” and two buttons “yes” and “no", the user can click the button “in the first prompt box” "Yes”, so that the terminal 200 disconnects the translation server in response to the user's operation, or the user can click "No" in the first prompt box to refuse to disconnect the translation server.
  • the terminal stops uploading the voice data to the translation server includes S1033:
  • the terminal maintains the connection with the translation server, and the terminal stops sending voice data to the translation server
  • the terminal when the terminal determines that there is no valid voice data in the first voice data packet, the terminal may continue to maintain a connection with the translation server, but the terminal stops uploading voice data to the translation server. It can be understood that in this case, the terminal will continue to obtain voice data packets sent by the user.
  • Embodiments of the present application provide a method for processing voice data in an online translation process.
  • the first voice can be determined Whether there is valid voice data in the data package. If there is valid voice data in the first voice data package, the terminal stops uploading voice data to the translation server. Since the terminal can determine whether to stop uploading voice data to the translation server, it can make the terminal more It is intelligent, and when there is no valid voice data in the first voice data packet, the terminal does not need to upload voice data to the translation server, so that the data consumption of online translation can be saved.
  • the above-mentioned voice data processing method in the online translation process is implemented at the application layer of the terminal, which has relatively low hardware performance requirements for the terminal and has better applicability.
  • the above terminal may determine whether there is valid voice data in the first voice data packet through S201-S203:
  • the terminal determines whether the first attribute value of the first voice data packet meets a preset condition, and the preset condition includes one or more of the following conditions: a discrete condition, a continuity condition, and a compactness condition.
  • the first attribute value of the first voice data packet includes the first attribute value of each data frame of the first voice data packet.
  • the terminal samples the first voice data packet according to a certain frequency to obtain N (N is an integer greater than or equal to 1) frame voice data, and then the terminal extracts The first attribute value of each frame of voice data in the first voice data packet, that is, the first attribute value includes N attribute values.
  • the first attribute value of the first voice data packet may be the signal-to-noise ratio of the voice data or the intensity of the voice data (such as the amplitude of the voice data).
  • a data frame in a first voice data packet referred to as a first data frame
  • the first attribute value of which is the signal-to-noise ratio of voice data as an example, the following formula (1) can be used to determine each The first attribute value of a frame of speech data:
  • L p is the first attribute value of the first data frame
  • p rms is the signal loudness of the first data frame
  • p ref is the noise intensity of the first data frame
  • F s is the sampling frequency of the first voice data packet.
  • formula (1) is used to obtain the first attribute values of the N data frames in the first voice data packet, thereby obtaining the N first attribute values of the first voice data packet.
  • the foregoing discreteness condition includes that the variance of the first attribute value of the first voice data packet is greater than the variance threshold.
  • the variance of the first attribute value of the first voice data packet can be calculated according to the following formula (2):
  • ⁇ 2 is the variance of the first attribute value of the first voice data packet
  • x i is the first attribute value of the i-th data frame
  • is the average value of the first attribute values of the N data frames.
  • the variance of the first attribute value of the first voice data packet is greater than the variance threshold (that is, the discrete condition is satisfied)
  • There is valid voice data that is, the user has issued a voice
  • the variance of the first attribute value of the first voice data packet is less than or equal to the variance threshold (that is, the discrete condition is not met)
  • the first attribute of the first voice data packet is explained The value does not have discreteness, which further indicates that there is no valid voice data in the first voice data packet (that is, the user does not utter voice).
  • the continuity count of the first attribute value of the first voice data packet is greater than the continuity threshold (that is, the continuity condition is satisfied)
  • it indicates that the first attribute value has continuity which further indicates that there is a valid value in the first voice data packet.
  • Voice data that is, the user has voice
  • the continuity count of the first attribute value is less than or equal to the continuity threshold (that is, the continuity condition is not met)
  • the compactness count of the first attribute value of the first voice data packet is greater than the compactness threshold (that is, the compactness condition is satisfied)
  • the compactness count of the first attribute value is less than or equal to the compactness threshold (that is, the compactness condition is not met)
  • the compactness count of the first attribute value is less than or equal to the compactness threshold (that is, the compactness condition is not met)
  • the first attribute value of the first voice data packet does not satisfy the preset condition, and the terminal determines that there is no valid voice data in the first voice data packet.
  • the first attribute value of the first voice data packet does not satisfy the preset condition, which means that the first attribute value does not satisfy one or more of the above three conditions, and specifically includes the following Table 1
  • the terminal determines that there is no valid voice data in the first voice data packet, that is, the terminal determines that the user has not spoken, and the terminal subsequently stops uploading the voice data to the translation server.
  • the terminal determines that there is no valid voice data in the first voice data packet, the terminal continues to obtain voice data, but the terminal stops uploading voice data to the translation server, for example, the terminal determines that the first attribute value of the first voice data packet does not satisfy According to the above preset condition, when the microphone of the terminal obtains the second voice data packet, the terminal no longer uploads the second voice data packet to the translation server, so that the traffic consumption of the terminal during online translation can be saved. It can be understood that the terminal still determines whether valid voice data exists in the second voice data according to the above S201.
  • the terminal may determine the first voice more accurately There is no valid voice data in the data packet.
  • the first attribute value of the first voice data packet meets a preset condition, and then the terminal determines that there is valid voice data in the first voice data packet.
  • the first attribute value of the first voice data packet includes N, N is an integer greater than or equal to 1, and the foregoing terminal determines the first value of the first voice data packet
  • the method for continuously counting attribute values may specifically include S301-S306:
  • the terminal determines whether the i-th first attribute value is greater than a threshold of the first attribute value.
  • the terminal increments the continuity count of the first attribute value by one.
  • the terminal determines whether the discreteness count of the first attribute value is less than or equal to the discreteness threshold.
  • the terminal when the discrete count of the first attribute value is less than or equal to the discrete threshold, i is increased by 1, that is, the terminal continues to determine whether the i + 1th first attribute value is greater than the first attribute The threshold value (return to S301).
  • the terminal when the discrete count of the first attribute value is greater than the discrete threshold, the terminal clears the continuity count of the first attribute value, and also clears the discrete count of the first attribute value, Then the terminal re-determines the continuity count of the first attribute value from the next (i + 1) first attribute value.
  • the terminal determines that the first 15 first attribute values of the 100 first attribute values are all greater than the threshold of the first attribute , It can be known that the current continuity count of the first attribute value is 15; if the terminal determines that the 16th first attribute value is less than the threshold of the first attribute value, the discrete count of the first attribute value is 1, if the first attribute value The discrete threshold of is set to 8, because the discrete count of the first attribute value is less than the discrete threshold, the terminal continues to determine whether the 17th first attribute value is greater than the threshold of the first attribute value, if the 17th first attribute If the value is greater than the threshold of the first attribute value, it can be known that the continuity count of the first attribute value is updated to 16, and so on.
  • the discrete count of is updated to 10, because the discrete count of the first attribute value is greater than the discrete threshold, in this case, the terminal clears the continuity count of the first attribute value and the discrete count of the first attribute value) ,
  • the terminal from the 27th Start the first attribute value, according to the above first method of re-determining the continuity count value of the first attribute of the voice packets.
  • the discrete count of the first attribute value when the discrete count of the first attribute value is greater than the discrete threshold, that is, the first i first attribute values do not have continuity, it means that there is no valid voice data corresponding to the first i first attribute values For voice data, therefore, the continuity count of the first attribute value and the discrete count of the first attribute value are cleared, and then continue to determine whether the remaining Ni first attribute values have continuity.
  • the above introduction of the discrete count and discrete threshold of the first attribute value can more accurately determine whether the first attribute value has continuity.
  • the terminal may also adjust one or more of the various thresholds involved in the voice data processing method in the online translation process, for example, the threshold of the first attribute value, the variance threshold, the continuous One or more of a sex threshold, a discrete threshold, or a tightness threshold. Specifically, the terminal adjusts each threshold according to whether the first attribute value of the first voice data packet determined by the terminal satisfies the preset condition and the translation result returned by the translation server.
  • the various thresholds involved in the voice data processing method in the online translation process for example, the threshold of the first attribute value, the variance threshold, the continuous One or more of a sex threshold, a discrete threshold, or a tightness threshold.
  • the terminal adjusts each threshold according to whether the first attribute value of the first voice data packet determined by the terminal satisfies the preset condition and the translation result returned by the translation server.
  • the terminal determines that the first attribute value of the first voice data packet does not satisfy the above-mentioned tightness condition, that is, the tightness count is less than or equal to the tightness threshold (note that the terminal determines There is no valid voice data in the first voice data packet), and the translation result returned by the translation server is not empty (that is, there is valid voice data in the first voice data packet), which can be seen, probably because the above tightness condition is too strict
  • the terminal causes the terminal to determine that there is no valid voice data in the first voice data packet. Based on this, the terminal can reduce the tightness threshold so that the tightness count is greater than the adjusted tightness threshold, so that the terminal determines the first voice
  • the first attribute value of the data packet satisfies the compactness condition.
  • the terminal determines that the first attribute value of the first voice data packet meets the above-mentioned preset condition, that is, the compactness count is greater than the compactness threshold (indicating that the terminal determines that there is valid voice data in the first voice data packet), and the translation server returns The translation result is empty (that is, there is no valid voice data in the first voice data packet). It can be seen that the terminal may determine that there is valid voice data in the first voice data packet because the above tightness condition is too loose. Based on this, the terminal The tightness threshold may be increased so that the tightness count is less than or equal to the adjusted tightness threshold, so that the terminal determines that the first attribute value of the first voice data packet does not satisfy the tightness condition.
  • the above-mentioned terminal includes a hardware structure and / or a software module corresponding to each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed by hardware or computer software driven hardware depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the embodiments of the present application.
  • the above-mentioned terminals may be divided into function modules according to the above method examples.
  • each function module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a division of logical functions. In actual implementation, there may be another division manner.
  • FIG. 8 shows a possible structural schematic diagram of the terminal involved in the foregoing embodiment.
  • the terminal 1000 includes: a processing module 1001, a storage module 1002, a voice capture module 1003, a communication module 1004, and a display module 1005 .
  • the processing module 1001 is used to control the voice capture module 1003 to obtain the first voice data packet.
  • the processing module 1001 is used to support the terminal to execute S101 in the foregoing embodiment, and the processor is further used to determine whether the first voice data packet When there is valid voice data, the upload of voice data to the translation server is stopped, and the processing module 1001 is used to support the terminal to execute S103 in the foregoing embodiment.
  • the storage module 1002 may be used to buffer the voice data acquired through the voice capture module 1003.
  • the processing module 1001 is also used to determine whether the first voice data packet satisfies the existence of valid voice data. For example, the processing module 1001 is used to support the terminal to perform S102 in the foregoing embodiment.
  • the foregoing processing module 1001 is specifically configured to determine whether the first attribute value of the first voice data packet meets a preset condition, and the preset condition includes one or more of the following conditions: discreteness Condition, continuity condition, compactness condition; and the first attribute value of the first voice data packet does not satisfy the preset condition, it is determined that there is no valid voice data in the first voice data packet; or, the first voice data packet If an attribute value meets the preset condition, it is determined that there is valid voice data in the first voice data packet.
  • the processing module 1001 is used to support the terminal to execute S201-S203 in the foregoing embodiment.
  • the processing module 1001 is also used to determine that there is valid voice data in the first voice data packet, and continues to upload the voice data to the translation server through the communication module 1004.
  • the processing module 1001 is used to support the terminal to execute S104 in the foregoing embodiment.
  • the processing module 1001 is further configured to send the first voice data packet to the translation server through the communication module 1004; and receive the translation result of the first voice data packet from the translation server.
  • the processing module 1001 is used to support the terminal to execute the above S105 and S106 in the embodiment.
  • the processing module 1001 is further configured to stop uploading voice data to the translation server when the first voice data packet meets the first condition.
  • the first condition includes one or more of the following conditions: There is valid voice data, and the translation result of the first voice data packet is empty.
  • the processing module 1001 is used to support the terminal to execute S107 in the foregoing embodiment.
  • the above processing module 1001 can also be used to control the display module 1005 to display the first prompt box, and receive the first operation of the user on the first prompt box displayed by the display module 1005; in response to the first operation, disconnect the translation server Connection, for example, the processing module 1001 is used to support the terminal to execute S1031-S1032 in the above embodiment.
  • the above processing module 1001 is also used to maintain the connection with the translation server and stop sending voice data to the translation server, and the processing module 1001 is used to support the terminal to execute S1033 in the above embodiment.
  • the foregoing preset conditions include discrete conditions, continuity conditions, and compactness conditions; the foregoing processing module 1001 is specifically configured to not satisfy the discrete conditions in the first voice data packet, and not When the continuity condition is satisfied and the compactness condition is not satisfied, it is determined that there is no valid voice data in the first voice data packet.
  • the terminal 1000 includes but is not limited to the unit modules listed above.
  • the terminal 1000 may further include a receiving module and a sending module.
  • the receiving module is used to receive data or instructions sent by other terminals.
  • the sending module is used to send data or instructions to other terminals.
  • the specific functions that can be achieved by the above functional units also include but are not limited to the functions corresponding to the method steps described in the above examples. For the detailed description of other units of the terminal 1000, refer to the detailed description of the corresponding method steps. Examples are not repeated here.
  • the processing module 1001 may be a processor or a controller, for example, it may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (application-specific integrated circuit, ASIC), field programmable gate array (FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of the present application.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of DSP and microprocessor, and so on.
  • the communication module may be a transceiver, a transceiver circuit, or a communication interface.
  • the storage module 1002 may be a memory.
  • the processing module 1001 is a processor (the processor 110 shown in FIG. 1), the storage module 1002 is a memory (the internal memory 121 shown in FIG. 1), and the voice capture module 1003 may include a microphone (as shown in FIG. Microphone 170), the communication module 1004 may be the mobile communication module 150 or the wireless communication module 160 shown in FIG. 1, and the communication module 1004 may be collectively referred to as a communication interface.
  • the display module 1005 is a touch screen (including the display screen 194 shown in FIG. 1, in which a display panel and a touch panel are integrated).
  • the terminal provided by the embodiment of the present application may be the terminal 100 shown in FIG. 1. Among them, the above processor, communication interface, touch screen, memory, microphone, etc. may be coupled together via a bus.
  • An embodiment of the present application further provides a computer storage medium that stores computer program code, and when the processor executes the computer program code, the terminal executes any of FIG. 4, FIG. 5, or FIG. 7
  • the related method steps of the method implement the method in the above embodiment.
  • An embodiment of the present application also provides a computer program product, which, when the computer program product runs on a computer, causes the computer to execute relevant method steps in any of the drawings of FIG. 4, FIG. 5, or FIG. 7 to implement the method.
  • the terminal 1000, the computer storage medium, or the computer program product provided in the embodiments of the present application are used to perform the corresponding methods provided above, and therefore, for the beneficial effects that can be achieved, refer to the corresponding methods provided above The beneficial effects in will not be repeated here.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the modules or units is only a division of logical functions.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application may be essentially or part of the contribution to the existing technology or all or part of the technical solutions may be embodied in the form of software products, which are stored in a storage medium
  • several instructions are included to enable a device (which may be a single-chip microcomputer, chip, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application.
  • the foregoing storage media include various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

提供一种在线翻译过程中的语音数据处理方法及装置,应用于终端(200,300)进行语音通信,涉及通信技术领域,能够节省在线翻译的流量消耗。方法包括:终端(200,300)获取第一语音数据包(S101),第一语音数据包包括预设时间段内的语音数据;并且终端(200,300)确定第一语音数据包中不存在有效语音数据,终端(200,300)停止向翻译服务器(400)上传语音数据(S103)。

Description

在线翻译过程中的语音数据处理方法及装置
本申请要求于2018年10月15日提交中国国家知识产权局,申请号为CN 201811199111.4、发明名称为“在线翻译过程中的语音数据处理方法及装置”的中国专利申请,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及通信技术领域,尤其涉及一种在线翻译过程中的语音数据处理方法及装置。
背景技术
通过终端进行通话的两个用户,当通话双方的母语不同时,可以进行在线翻译将各自的语音翻译成对方的母语,以消除通话双方的语言障碍。
目前,在双方通话的过程中,终端1采集到的语音数据中包括无效的语音数据(例如通话过程中用户在一段时间内未讲话时终端采集的音频数据)时,终端仍然将该语音数据上传至翻译服务器以使得翻译服务器对该语音数据进行翻译,由于提供翻译服务的提供商通常按照语音数据的流量或时长进行收费,因此当终端将无效的语音数据上传至翻译服务器会导致终端消耗更多的流量,从而增加通话成本。
发明内容
本申请实施例提供一种在线翻译过程中的语音数据处理方法及装置,能够节省在线翻译的流量消耗。
为达到上述目的,本申请实施例采用如下技术方案:
第一方面,本申请实施例提供一种在线翻译过程中的语音数据处理方法,应用于终端进行语音通信的场景中,该方法包括:终端获取第一语音数据包,该第一语音数据包包括预设时间段内的语音数据;并且终端确定该第一语音数据包中不存在有效语音数据时,终端停止向翻译服务器上传语音数据。
本申请实施例提供的在线翻译过程中的语音数据处理方法,在两个终端进行语音通信的过程中,对于其中的一个终端,该终端获取第一语音数据包之后,可以确定第一语音数据包中是否存在有效语音数据,若第一语音数据包中不存在有效语音数据,则终端停止向翻译服务器上传语音数据,由于终端可以确定是否停止向翻译服务器上传语音数据,因此,能够使得终端更加智能化,并且在第一语音数据包不存在有效语音数据时,终端无需再向翻译服务 器上传语音数据,如此,能够节省在线翻译的流量消耗。
在一种可能的设计中,上述终端确定该第一语音数据包中是否存在有效语音数据的方法可以包括:终端根据该第一语音数据包的第一属性值,确定第一语音数据包是否满足预设条件,该预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件;第一语音数据包的第一属性值不满足该预设条件,则终端确定第一语音数据包中不存在有效语音数据;第一语音数据包的第一属性值满足上述预设条件,则终端确定第一语音数据包中存在有效语音数据。
一种可能的设计中,本申请实施例提供的在线翻译过程中的语音数据处理方法还包括:终端确定第一语音数据包中存在有效语音数据,并且终端继续向翻译服务器上传语音数据,从而翻译服务器继续对终端上传的语音数据进行翻译,保证终端顺利地进行语音通信。
一种可能的设计中,终端停止向翻译服务器发送语音数据之后,当终端获取到新的语音数据包(称为第二语音数据包)时,终端确定该第二语音数据包中存在有效语音数据,则终端恢复与翻译服务器之间的连接,并且开始向翻译服务器上传该第二语音数据包。
一种可能的设计中,本申请实施例提供的在线翻译过程中的语音数据处理方法还包括:终端向翻译服务器发送第一语音数据包;并且终端从翻译服务器接收该第一语音数据包的翻译结果;终端确定第一语音数据包满足第一条件,则终端停止向翻译服务器上传语音数据;其中,上述第一条件包括以下条件中的一个或多个:上述第一语音数据包中不存在有效语音数据;第一语音数据包的翻译结果为空。本申请实施例中,终端确定第一语音数据包中不存在有效语音数据(即第一语音数据包的第一属性值不满足预设条件)时,终端还可以结合翻译服务器返回的翻译结果,确定是否停止向翻译服务器上传语音数据。
一种可能的设计中,当第一语音数据包中不存在有效语音数据,并且翻译服务器返回的翻译结果为空(即没有翻译结果),进一步说明第一语音数据包中不存在有效语音数据,如此,能够提升检测有效语音数据的正确率;进一步的,终端不再向翻译服务器上传语音数据,能够节省在线翻译的流量消耗,降低用户的通话费用。
一种可能的设计中,上述终端停止向翻译服务器上传语音数据的方法可以包括:终端显示第一提示框;并且终端接收用户对该第一提示框的第一操作,响应于第一操作,断开与翻译服务器之间的连接。本申请实施例中,终端确定第一语音数据包中不存在有效语音数据时,终端可以向用户提示是否需要断开与翻译服务器之间的连接,如此,可以提升用户体验。
一种可能的设计中,上述终端停止向翻译服务器上传语音数据的方法可以包括:终端保持与翻译服务器之间的连接,并且终端停止向翻译服务器发送语音数据。本申请实施例中,当终端确定该第一语音数据包中不存在有效语音数据时,终端可以继续保持与翻译服务器之间的连接,但终端停止向翻译服务器上传语音数据。在这种情况下,终端还是会继续获取用 户发出的语音数据包,并且确定新获取的语音数据包的第一属性值是否满足预设条件。
一种可能的设计中,上述预设条件包括离散性条件、连续性条件以及紧致性条件;若第一语音数据包的第一属性值不满足离散性条件,且不满足连续性条件,且不满足紧致性条件,则终端确定第一语音数据包中不存在有效语音数据。本申请实施例中,当第一语音数据包的第一属性值不满足上述离散性条件,且不满足连续性条件,且不满足紧致性条件时,终端可以更加准确地确定该第一语音数据包中不存在有效语音数据。
一种可能的设计中,第一语音数据包的第一属性值包括第一语音数据包的各个数据帧的信噪比。
一种可能的设计中,上述离散性条件包括:第一语音数据包的第一属性值的方差大于方差阈值。本申请实施例中,上述第一语音数据包的第一属性值的方差大于方差阈值(即满足离散性条件)时,说明第一语音数据包的第一属性值具有离散性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一语音数据包的第一属性值的方差小于或者等于方差阈值(即不满足离散性条件)时,说明第一语音数据包的第一属性值不具备离散性,进一步说明第一语音数据包中不存在有效语音数据(即用户未发出语音)。
一种可能的设计中,上述连续性条件包括:第一语音数据包的第一属性值连续性计数大于连续性阈值,该连续性阈值满足:T c=θ c×F s,其中,T c为连续性阈值,θ c为连续性系数,F s为第一语音数据包的采样频率。本申请实施例中,第一语音数据包的第一属性值的连续性计数大于连续性阈值(即满足连续性条件)时,说明第一属性值具有连续性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一属性值的连续性计数小于或者等于连续性阈值(即不满足连续性条件)时,说明第一属性值不具备连续性,进一步说明第一语音数据包中不存在有效语音数据(即用户未发出语音)。
上述第一语音数据包的第一属性值包括N个,N为大于或者等于1的整数,终端确定第一语音数据包的第一属性值的连续性计数的方法包括:终端确定第i个第一属性值是否大于第一属性值的阈值,1≤i≤N-1;若第i个第一属性值大于第一属性值的阈值,则终端将第一属性值的连续性计数增加1;若第i个第一属性值小于或者等于第一属性值的阈值,则第一属性值的离散性计数增加1;在离散性计数小于或者等于离散性阈值的情况下,确定第i+1个第一属性值是否大于第一属性值的阈值,该离散性阈值满足:T d=θ d×F s,其中,T d为离散性阈值,θ d为离散性系数,F s为第一语音数据包的采样频率;若第一属性值的离散性计数大于离散性阈值,则将第一属性值的连续性计数和第一属性值的离散性计数清零,该终端从第i+1个第一属性值开始重新确定连续性计数。本申请实施例中,当第一属性值的离散性计数大于 离散性阈值,即前i个第一属性值不具备连续性,说明在前i个第一属性值对应的语音数据中不存在有效语音数据,因此将第一属性值的连续性计数与第一属性值的离散性计数清零,再继续确定剩下的N-i个第一属性值是否具备连续性。上述引入第一属性值的离散性计数和离散性阈值,能够更加准确地确定第一属性值是否具有连续性。
一种可能的设计中,上述紧致性条件包括:第一语音数据包的第一属性值的紧致性计数大于紧致性阈值,该紧致性阈值满足:T i=θ i×N,其中,T i为紧致性阈值,θ i为紧致性系数,N为第一语音数据包中包括的第一属性值的数量,该第一属性值的紧致性计数为大于第一属性值的阈值的第一属性值的数量。本申请实施例中,上述第一语音数据包的第一属性值的紧致性计数大于紧致性阈值(即满足紧致性条件)时,说明第一属性值具有紧致性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一属性值的紧致性计数小于或者等于紧致性阈值(即不满足紧致性条件)时,说明第一属性值不具备紧致性,进一步说明第一语音数据包中不存在有效语音数据(即用户未发出语音)。
第二方面,本申请实施例提供一种终端,包括:一个或多个处理器、存储器、通信接口以及麦克风;该存储器、通信接口与处理器耦合;该麦克风用于捕获语音数据;该存储器用于存储计算机程序代码;该计算机程序代码包括计算机指令,当处理器执行上述计算机指令时,该处理器,用于控制麦克风获取第一语音数据包,该第一语音数据包包括预设时间段内的语音数据;该处理器,还用于确定第一语音数据包中不存在有效语音数据的情况下,停止向翻译服务器上传语音数据。
一种可能的设计中,上述处理器,具体用于确定麦克风获取的第一语音数据包的第一属性值是否满足预设条件,该预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件;该第一语音数据包的第一属性值不满足预设条件,则确定第一语音数据包中不存在有效语音数据;该第一语音数据包的第一属性值满足预设条件,则确定第一语音数据包中存在有效语音数据。
一种可能的设计中,上述处理器,还用于确定第一语音数据包中存在有效语音数据,并且通过通信接口继续向翻译服务器上传语音数据。
一种可能的设计中,上述处理器还用于通过通信接口向翻译服务器发送第一语音数据包;并且从翻译服务器接收第一语音数据包的翻译结果;上述处理器,还用于在第一语音数据包满足第一条件的情况下,停止向翻译服务器上传语音数据;其中,第一条件包括以下条件中的一个或多个:第一语音数据包中不存在有效语音数据;第一语音数据包的翻译结果为空。
一种可能的设计中,本申请实施例提供的终端还包括触摸屏;上述处理器,还用于控制触摸屏显示第一提示框;上述处理器,还用于接收用户对触摸屏显示的第一提示框的第一操 作;响应于第一操作,断开与翻译服务器之间的连接。
一种可能的设计中,上述处理器,还用于保持与翻译服务器之间的连接,并且停止向翻译服务器发送语音数据。
一种可能的设计中,上述预设条件包括离散性条件、连续性条件以及紧致性条件;上述处理器,具体用于在第一语音数据包的第一属性值不满足离散性条件,且不满足连续性条件,且不满足紧致性条件的情况下,确定第一语音数据包中不存在有效语音数据。
一种可能的设计中,第一语音数据包的第一属性值包括第一语音数据包的各个数据帧的信噪比。
一种可能的设计中,上述离散性条件包括:第一语音数据包的第一属性值的方差大于方差阈值。
一种可能的设计中,连续性条件包括:第一语音数据包的第一属性值的连续性计数大于连续性阈值,该连续性阈值满足:T c=θ c×F s,其中,T c为连续性阈值,θ c为连续性系数,F s为第一语音数据包的采样频率。其中,第一语音数据包的第一属性值包括N个,N为大于或者等于1的整数,上述处理器,还用于确定第i个第一属性值是否大于第一属性值的阈值,1≤i≤N-1;若第i个第一属性值大于第一属性值的阈值,则将第一属性值的连续性计数增加1;若第i个第一属性值小于或者等于第一属性值的阈值,则第一属性值的离散性计数增加1;上述处理器,还用于在第一属性值的离散性计数小于或者等于离散性阈值的情况下,确定第i+1个第一属性值是否大于第一属性值的阈值,该离散性阈值满足:T d=θ d×F s,其中,T d为离散性阈值,θ d为离散性系数,F s为第一语音数据包的采样频率;在第一属性值的离散性计数大于离散性阈值的情况下,将第一属性值的连续性计数和第一属性值的离散性计数清零,并且从第i+1个第一属性值开始重新确定第一属性值的连续性计数。
一种可能的设计中,上述紧致性条件包括:第一语音数据包的第一属性值的紧致性计数大于紧致性阈值,该紧致性阈值满足:T i=θ i×N,其中,T i为紧致性阈值,θ i为紧致性系数,N为第一语音数据包中包括的第一属性值的数量,该第一属性值的紧致性计数为大于第一属性值的阈值的第一属性值的数量。
第三方面,本申请实施例提供一种计算机存储介质,该计算机存储介质包括计算机指令,当该计算机指令在终端上运行时,使得终端执行上述第一方面及其各种可能的设计中任意之一所述的在线翻译过程中的语音数据处理方法。
第四方面,本申请实施例提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面及其各种可能的设计中任意之一所述的在线翻译过程中 的语音数据处理方法。
第二方面及其任一种设计方式所述的终端,以及第三方面所述的计算机存储介质、第四方面所述的计算机程序产品所带来的技术效果可参见上述第一方面及其不同设计方式所带来的技术效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种终端的硬件示意图;
图2为本申请实施例提供的一种语音通信场景实例示意图一;
图3为本申请实施例提供的一种语音通信场景实例示意图二;
图4为本申请实施例提供的在线翻译过程中的语音数据处理方法示意图一;
图5为本申请实施例提供的在线翻译过程中的语音数据处理方法示意图二;
图6为现有技术提供的一种显示界面实例示意图;
图7为本申请实施例提供的在线翻译过程中的语音数据处理方法示意图三;
图8为本申请实施例提供的一种终端的结构示意图。
具体实施方式
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。例如,多个处理单元是指两个或两个以上的处理单元;多个系统是指两个或两个以上的系统。
本申请实施例提供一种在线翻译过程中的语音数据的处理方法及装置,可以应用于两个终端进行语音通信的过程中,通过语音活性检测检测(voice activity detection,VAD)技术检测终端采集到的语音数据中是否存在有效语音数据,进而确定是否向翻译服务器上传语音数据。具体的,终端获取预设时间段内的语音数据(以下均称为第一语音数据包)之后,终端确定第一语音数据包中不存在有效语音数据(即真实的语音数据),终端停止向翻译服务器上传语音数据,具体的,终端可以确定该第一语音数据包的第一属性值是否满足预设条件;若第一语音数据包的第一属性值不满足预设条件,则第一语音数据包中不存在有效语音数据, 进而停止向翻译服务器上传语音数据,即终端后续不再向翻译服务器上传语音数据。其中,该预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件。
当然,当终端获取的第一语音数据包中存在有效语音数据时,终端继续或开始向翻译服务器上传语音数据。
综上所述,在两个终端进行语音通信的过程中,对于任意一个终端,该终端在确定其获取的语音数据包不存在有效语音数据时,停止向翻译服务器上传语音数据。通过本方案,使得终端更加智能化,并且能够节省在线翻译的流量消耗。
本申请实施例中的终端可以为便携式计算机(如手机)、笔记本电脑、个人计算机(personal computer,PC)、可穿戴终端(如智能手表)、平板电脑、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备、车载电脑等,以下实施例对该终端的具体形式不做特殊限制。
请参考图1,其示出本申请实施例提供一种终端100的结构示意图。其中,终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如,处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
上述控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110 中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现终端100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现终端100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现终端100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信 号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为终端100充电,也可以用于终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他终端,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在本申请另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过终端100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为终端供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150 的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有 源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏194,N为大于1的正整数。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储 器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端100可以设置至少一个麦克风170C。在另一些实施例中,终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动终端平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。终端100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,终端100根据压力传感器180A检测所述触摸操作强度。终端100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定终端100的运动姿态。在一些实施例中,可以通过陀 螺仪传感器180B确定终端100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测终端100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,终端100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。终端100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当终端100是翻盖机时,终端100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测终端100在各个方向上(一般为三轴)加速度的大小。当终端100静止时可检测出重力的大小及方向。还可以用于识别终端姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。终端100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,终端100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端100通过发光二极管向外发射红外光。终端100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端100附近有物体。当检测到不充分的反射光时,终端100可以确定终端100附近没有物体。终端100可以利用接近光传感器180G检测用户手持终端100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。终端100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测终端100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。终端100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,终端100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,终端100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端100对电池142加热,以避免低温导致终端100异常关机。在其他一些实施例中,当温度低于又一阈值时,终端100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸 传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和终端100的接触和分离。终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端100中,不能和终端100分离。
图2为本申请实施例提供的一种在线翻译过程中的语音数据处理方法所应用的通信场景实例示意图,用户210使用终端200与使用终端300的用户310进行语音通信,即终端200与终端300进行语音通信,以终端200为本申请实施例中的终端为例,假设终端200的持有者(即用户210)的母语为英语,终端300的持有者(即用户310)的母语为汉语,终端200采集到的语音数据可以包括上行语音数据(本端用户(用户210)发出的语音数据)和下行语音数据(对端用户(用户310)发出的语音数据),终端200可以控制翻译上行语音数据或 下行语音数据,并将翻译结果在终端200上以文本进行显示或语音进行播报。
结合图2,用户210发出语音数据1“Are you Mr Wang?”(该语音数据1为上行语音数据),该用户210使用的终端200采集到该语音数据1后将该语音数据上传至翻译服务器400,翻译服务器400将该语音数据1翻译完成之后,将该翻译后的语音数据或文本数据发送至终端200,从而终端000向通过该终端200的受话器播放翻译后的语音数据“您是王先生吗?”,用户310听到该翻译后的语音数据之后,用户310回应语音数据2“不,我不是!”,同理终端200采集到该语音数据2(该语音数据2为下行语音数据)之后将其上传至翻译服务器400,翻译服务器400将该语音数据2翻译完成之后,将该翻译后的语音数据或文本数据发送至终端200,从而终端200通过该终端200的受话器播放翻译后的语音数据“No,I am not!”,如此,终端200与终端300按照上述的过程进行后续的通话。
图3为本申请实施例提供的另一种在线翻译过程中的语音数据处理方法应用的通信场景示意图,用户210使用终端200与使用终端300的用户310进行语音信息,即终端200与终端300进行语音通信,若上述终端200与终端300均具有识别或播放对方母语的功能,则终端200和终端300均可以控制翻译各自的上行语音数据,并且终端200和终端300在在线翻译过程中的执行的动作类似。以终端200为例,假设用户210的母语为英语,用户310的母语为汉语,如图2所示,用户210发出语音数据1“Are you Mr Wang?”,该用户210使用的终端200采集到该语音数据1之后将该语音数据1上传至翻译服务器400,翻译服务器400将该语音数据1翻译完成之后,将该翻译后的语音数据或文本数据发送至终端200,从而终端200通过该终端200的受话器播放翻译后的语音数据“您是王先生吗?”,用户310听到该翻译后的语音数据之后,用户310回应语音数据2“不,我不是!”,同理终端300采集到该语音数据2之后将该语音数据2上传至翻译服务器400,翻译服务器400将该语音数据2翻译完成之后,将该翻译后的语音数据或文本数据发送至终端300,从而终端300通过该终端300的受话器播放翻译后的语音数据“No,I am not!”,如此,终端200与终端300按照上述的过程进行后续的通话。
本申请实施例提供一种在线翻译过程中的语音数据处理方法,可以应用于两个终端进行语音通信,如图4所示,该在线翻译过程中的语音数据处理处理可以包括S101-S104:
S101、终端获取第一语音数据包。
其中,该终端可以为上述进行语音通信的两个终端中的本端设备,可以应用于任意一个终端,该第一语音数据包包括预设时间段内的语音数据。
在终端(例如上述终端200)与另一个终端(例如上述终端300)进行语音通信的过程中,终端的麦克风170C(也称“话筒”)可以获取语音数据。该语音数据可以为上行语音数据,也可以为下行语音数据,为了便于描述,在以下实施例中统一简称为语音数据,该语音数据 包括用户(本端用户或对端用户)发出的语音数据和终端的周围的环境噪音等。本申请实施例中,终端的麦克风可以获取预设时间段(例如1分钟或40秒)内的语音数据,构成第一语音数据包。
S102、终端确定第一语音数据包中是否存在有效语音数据。
S103、终端确定第一语音数据包中不存在有效语音数据,则终端停止向翻译服务器上传语音数据。
本申请实施例中,第一语音数据包不存在有效语音数据时,终端无需再向翻译服务器上传语音数据,如此,能够节省在线翻译的流量消耗。
S104、终端确定第一语音数据包中存在有效语音数据,则终端继续向翻译服务器上传语音数据。
本申请实施例中,当第一语音数据包中存在有效语音数据,终端继续将向翻译服务器上传语音数据,从而翻译服务器继续对终端上传的语音数据进行翻译,保证终端顺利地进行语音通信。
需要说明的是,本申请实施例中,终端停止向翻译服务器发送语音数据之后,当终端获取到新的语音数据包(称为第二语音数据包)时,若终端确定该第二语音数据包中存在有效语音数据,则终端恢复与翻译服务器之间的连接,并且开始向翻译服务器上传该第二语音数据包。
可选的,终端还可以结合翻译服务器返回的第一语音数据包的翻译结果,确定是否停止向翻译服务器上传语音数据。如图5所示,本申请实施例提供的在线翻译过程中的语音数据处理方法还可以包括S105-S106:
S105、终端向翻译服务器发送第一语音数据包。
S106、终端从翻译服务器接收第一语音数据包的翻译结果。
需要说明的是,本申请实施例中,上述S105-S106与S102的执行顺序可以不作限制,即终端可以先执行S102,后执行S105-S106,或者终端先执行S105-S106,后执行S102,又或者终端同时执行S102和S105-S106。
本申请实施例中,终端可以通过下述S107确定是否停止向翻译服务器上传语音数据:
S107、终端确定第一语音数据包满足第一条件,则终端停止向翻译服务器上传语音数据,其中,第一条件包括以下条件中的一个或多个:第一语音数据包中不存在有效语音数据;第一语音数据包的翻译结果为空。
本申请实施例中,终端确定第一语音数据包不存在有效有语音数据时,终端还结合翻译服务器返回的翻译结果,确定是否停止向翻译服务器上传语音数据,在一种实现方式中,当第一语音数据包不存在有效有语音数据,并且翻译服务器返回的翻译结果为空(即没有翻译 结果),进一步说明第一语音数据包中不存在有效语音数据,能够提升检测有效语音数据的正确率;进一步的,终端不再向翻译服务器上传语音数据,能够节省在线翻译的流量消耗,降低用户的通话费用。
可选的,上述S103中,终端停止向翻译服务器上传语音数据包括S1031-S1032:
S1031、终端显示第一提示框。
本申请实施例中,当终端确定第一语音数据包不满足预设条件时,终端在该终端的显示屏上显示第一提示框,该第一提示框用于提示用户是否断开终端与翻译服务器之间的连接。
S1032、终端接收用户对第一提示框的第一操作,响应于该第一操作,断开与翻译服务器之间的连接。
具体的,终端可以接收用户对第一提示框的第一操作,该第一操作为用户触发终端断开与翻译服务器的连接的操作。例如,该第一操作可以是用户对第一提示框的单击操作、双击操作或者长按操作等任一种操作。然后终端响应于用户对第一提示框的第一操作,终端断开终端与翻译服务器之间的连接。
示例性的,如图6所示,终端200与终端300进行语音通信的过程中,终端200确定该终端200获取的第一语音数据包不满足预设条件时,终端200的显示屏上显示第一提示框500,该提示框500中的内容包括:“是否断开与翻译服务器之间的连接”以及“是”、“否”两个按钮,用户可以单击第一提示框中的按钮“是”,从而终端200响应于用户的操作,断开与翻译服务器之间的连接,或者用户也可以单击第一提示框中的“否”,拒绝断开与翻译服务器之间的连接。
可选的,上述S103中,终端停止向翻译服务器上传语音数据包括S1033:
S1033、终端保持与翻译服务器之间的连接,并且终端停止向翻译服务器发送语音数据
本申请实施例中,当终端确定该第一语音数据包中不存在有效语音数据时,终端可以继续保持与翻译服务器之间的连接,但终端停止向翻译服务器上传语音数据。可以理解的是,在这种情况下,终端还是会继续获取用户发出的语音数据包。
本申请实施例提供一种在线翻译过程中的语音数据的处理方法,在两个终端进行语音通信的过程中,对于其中的一个终端,该终端获取第一语音数据包之后,可以确定第一语音数据包中是否存在有效语音数据,若第一语音数据包中存在有效语音数据,则终端停止向翻译服务器上传语音数据,由于终端可以确定是否停止向翻译服务器上传语音数据,因此,能够使得终端更加智能化,并且在第一语音数据包不存在有效语音数据时,终端无需再向翻译服务器上传语音数据,如此,能够节省在线翻译的流量消耗。
进一步的,上述在线翻译过程中的语音数据处理方法,是在终端的应用层实现,对终端的硬件性能要求比较低,具有更好的适用性。
可选的,上述终端确定第一语音数据包中是否存在有效语音数据可以通过S201-S203实现:
S201、终端确定第一语音数据包的第一属性值是否满足预设条件,该预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件。
其中,第一语音数据包的第一属性值包括第一语音数据包的各个数据帧的第一属性值。
本申请实施例中,终端获取到第一语音数据包之后,终端按照一定的采用频率对第一语音数据包进行采样,得到N(N为大于或者等于1的整数)帧语音数据,然后终端提取第一语音数据包中每一帧语音数据的第一属性值,即第一属性值包括N个属性值。
具体的,第一语音数据包的采样频率为:F s=N/S,其中,F s为采样频率,N为采样点数(即上述第一语音数据包包括的数据帧的帧数),S为采样时长。
本申请实施例中,上述第一语音数据包的第一属性值可以为语音数据的信噪比或者语音数据的强度(例如语音数据的振幅)。以第一语音数据包中的一个数据帧(称为第一数据帧),第一属性值为语音数据的信噪比为例,可以采用下述公式(1)确定第一语音数据包中每一帧语音数据的第一属性值:
Figure PCTCN2019110556-appb-000001
其中,L p为第一数据帧的第一属性值,p rms为第一数据帧的信号响度,p ref为第一数据帧的噪音强度,F s为第一语音数据包的采样频率。
综上,采用公式(1)得到第一语音数据包中的N个数据帧的第一属性值,从而得到第一语音数据包的N个第一属性值。
本申请实施例中,上述离散性条件包括:第一语音数据包的第一属性值的方差大于方差阈值。第一语音数据包的第一属性值的方差可以根据下述公式(2)计算:
Figure PCTCN2019110556-appb-000002
其中,σ 2为第一语音数据包的第一属性值的方差,x i为第i个数据帧的第一属性值,μ为N个数据帧的第一属性值的平均值。
具体的,上述第一语音数据包的第一属性值的方差大于方差阈值(即满足离散性条件)时,说明第一语音数据包的第一属性值具有离散性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一语音数据包的第一属性值的方差小于或者等于方差阈值(即不满足离散性条件)时,说明第一语音数据包的第一属性值不具备离散性,进一步说明 第一语音数据包中不存在有效语音数据(即用户未发出语音)。
本申请实施例中,上述连续性条件包括:第一语音数据包的第一属性值的连续性计数大于连续性阈值,该连续性阈值满足:T c=θ c×F s,其中,T c为连续性阈值,θ c为连续性系数,F s为第一语音数据包的采样频率。
具体的,上述第一语音数据包的第一属性值的连续性计数大于连续性阈值(即满足连续性条件)时,说明第一属性值具有连续性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一属性值的连续性计数小于或者等于连续性阈值(即不满足连续性条件)时,说明第一属性值不具备连续性,进一步说明第一语音数据包中不存在有效语音数据(即用户未发出语音)。
本申请实施例中,上述紧致性条件包括:第一语音数据包的第一属性值的紧致性计数大于紧致性阈值,该紧致性阈值满足:T i=θ i×N,其中,T i为紧致性阈值,θ i为紧致性系数,该紧致性计数为上述N个第一属性值中大于第一属性值的阈值的第一属性值的数量。
具体的,上述第一语音数据包的第一属性值的紧致性计数大于紧致性阈值(即满足紧致性条件)时,说明第一属性值具有紧致性,进一步说明第一语音数据包中存在有效语音数据(即用户有发出语音);第一属性值的紧致性计数小于或者等于紧致性阈值(即不满足紧致性条件)时,说明第一属性值不具备紧致性,进一步说明第一语音数据包中不存在有效语音数据(即用户未发出语音)。
S202、第一语音数据包的第一属性值不满足预设条件,则终端确定第一语音数据包中不存在有效语音数据。
本申请实施例中,第一语音数据包的第一属性值不满足预设条件即指的是的第一属性值不满足上述三个条件中的一个或多个,具体包括下述表1中示例的几种情况,在这几种情况下,终端确定第一语音数据包中不存在有效语音数据,即终端确定用户未发出语音从而终端后续停止向翻译服务器上传语音数据。
表1
Figure PCTCN2019110556-appb-000003
结合表1,终端确定第一语音数据包中不存在有效语音数据时,终端继续获取语音数据,但是终端停止向翻译服务器上传语音数据,例如终端确定第一语音数据包的第一属性值不满足上述预设条件,当终端的麦克风获取到第二语音数据包时,终端不再向翻译服务器上传该第二语音数据包,如此,可以节省在线翻译过程中终端的流量消耗。可以理解的是,终端仍然根据上述S201确定该第二语音数据是否存在有效语音数据。
本申请实施例中,当第一语音数据包的第一属性值不满足上述离散性条件,且不满足连续性条件,且不满足紧致性条件时,终端可以更加准确地确定该第一语音数据包中不存在有效语音数据。
S203、第一语音数据包的第一属性值满足预设条件,则终端确定第一语音数据包中存在有效语音数据。
可选的,本申请实施例中,如图7所示,第一语音数据包的第一属性值包括N个,N为大于或者等于1的整数,上述终端确定第一语音数据包的第一属性值的连续性计数的方法具体可以包括S301-S306:
S301、终端确定第i个第一属性值是否大于第一属性值的阈值。
其中,1≤i≤N-1。
S302、若第i个第一属性值大于第一属性值的阈值,则终端将第一属性值的连续性计数增加1。
S303、若第i个第一属性值小于或者等于第一属性值的阈值,则第一属性值的离散性计数增加1。
S304、终端确定第一属性值的离散性计数是否小于或者等于离散性阈值。
S305、若第一属性值的离散性计数小于或者等于离散性阈值,i=i+1,返回执行上述S301。
本申请实施例中,在第一属性值的离散性计数小于或者等于离散性阈值的情况下,将i增加1,也就是说终端继续确定第i+1个第一属性值是否大于第一属性值的阈值(返回S301)。
S306、若第一属性值的离散性计数大于离散性阈值,则将第一属性值的连续性计数和第一属性值的离散性计数清零。
本申请实施例中,在第一属性值的离散性计数大于离散性阈值的情况下,终端将第一属性值的连续性计数清零,并且将第一属性值的离散性计数也清零,然后终端从下一个(即第i+1个)第一属性值开始重新确定第一属性值的连续性计数。
示例性的,假设第一语音数据包中包括100个数据帧,即第一属性值包括100个,终端确定该100个第一属性值中前15个第一属性值均大于第一属性的阈值,可知当前的第一属性值的连续性计数为15;若终端确定第16个第一属性值小于第一属性值的阈值,则第一属性值的离散性计数为1,若第一属性值的离散性阈值设置为8,由于该第一属性值的离散性计数小于离散性阈值,因此终端继续确定第17个第一属性值是否大于第一属性值的阈值,若第17个第一属性值大于第一属性值的阈值,可知第一属性值的连续性计数更新为16,依此类推,若第18-26个第一属性值均小于第一属性值的阈值,可知第一属性值的离散计数更新为10,由于该第一属性值的离散性计数大于离散性阈值,在这种情况下,终端将第一属性值的连续性计数和第一属性值的离散性计数清零),终端从第27个第一属性值开始,按照上述的方法重新确定第一语音数据包的第一属性值的连续性计数。
本申请实施例中,当第一属性值的离散性计数大于离散性阈值,即前i个第一属性值不具备连续性,说明在前i个第一属性值对应的语音数据中不存在有效语音数据,因此将第一属性值的连续性计数与第一属性值的离散性计数清零,再继续确定剩下的N-i个第一属性值是否具备连续性。上述引入第一属性值的离散性计数和离散性阈值,能够更加准确地确定第一属性值是否具有连续性。
可选的,本申请实施例中,终端还可以调整上述在线翻译过程中的语音数据处理方法中涉及到的各个阈值中的一个或多个,例如,第一属性值的阈值、方差阈值、连续性阈值、离散性阈值或紧致性阈值中的一个或多个。具体的,终端根据该终端确定的第一语音数据包的第一属性值是否满足预设条件的结果与翻译服务器返回的翻译结果,调整各个阈值。
示例性的,以上述紧致性阈值为例,若终端确定第一语音数据包的第一属性值不满足上述紧致性条件,即紧致性计数小于或者等于紧致性阈值(说明终端确定第一语音数据包中不存在有效语音数据),并且翻译服务器返回的翻译结果不为空(即说明第一语音数据包中存在有效语音数据),可见,可能是因为上述紧致性条件过于严苛导致终端确定第一语音数据包中不存在有效语音数据,基于此,终端可以将紧致性阈值减小,使得上述紧致性计数大于调整后的紧致性阈值,从而终端确定第一语音数据包的第一属性值满足紧致性条件。
若终端确定第一语音数据包的第一属性值满足上述预设条件,即紧致性计数大于紧致性阈值(说明终端确定第一语音数据包中存在有效语音数据),并且翻译服务器返回的翻译结果为空(即说明第一语音数据包中不存在有效语音数据),可见,可能是因为上述紧致性条件过于宽松导致终端确定第一语音数据包中存在有效语音数据,基于此,终端可以将紧致性阈值增加,使得上述紧致性计数小于或者等于调整后的紧致性阈值,从而终端确定第一语音数据包的第一属性值不满足紧致性条件。
可以理解的是,上述其他阈值的调整方法与上述紧致性阈值类似,本申请实施例不再一一列举。
可以理解的是,上述终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
本申请实施例可以根据上述方法示例对上述终端进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
示例性的,图8示出了上述实施例中所涉及的终端的一种可能的结构示意图,该终端1000包括:处理模块1001、存储模块1002、语音捕获模块1003、通信模块1004以及显示模块1005。其中,处理模块1001用于控制语音捕获模块1003获取第一语音数据包,例如,处理模块1001用于支持终端执行上述实施例中的S101,处理器还用于在确定第一语音数据包中不存在有效语音数据的情况下,停止向翻译服务器上传语音数据,处理模块1001用于支持终端执行上述实施例中的S103。存储模块1002可以用于缓存通过语音捕获模块1003获取的语音数据。该处理模块1001还用于确定该第一语音数据包是否满足存在有效语音数据例如,处理模块1001 用于支持终端执行上述实施例中的S102。
可选的,本申请实施例中,上述处理模块1001具体用于确定第一语音数据包的第一属性值是否满足预设条件,该预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件;并且第一语音数据包的第一属性值不满足预设条件,确定第一语音数据包中不存在有效语音数据;或者,第一语音数据包的第一属性值满足预设条件,则确定第一语音数据包中存在有效语音数据。例如,处理模块1001用于支持终端执行上述实施例中的S201-S203。
上述处理模块1001还用于确定第一语音数据包中存在有效语音数据,并且通过通信模块1004继续向翻译服务器上传语音数据,例如,处理模块1001用于支持终端执行上述实施例中的S104。
可选的,上述处理模块1001还用于通过通信模块1004向翻译服务器发送第一语音数据包;并且从翻译服务器接收第一语音数据包的翻译结果,例如,处理模块1001用于支持终端执行上述实施例中的S105和S106。该处理模块1001还用于在第一语音数据包满足第一条件的情况下,停止向翻译服务器上传语音数据,该第一条件包括以下条件中的一个或多个:第一语音数据包中不存在有效语音数据,并且第一语音数据包的翻译结果为空。例如处理模块1001用于支持终端执行上述实施例中的S107。
上述处理模块1001还可以用于控制显示模块1005显示第一提示框,并且接收用户对显示模块1005显示的第一提示框的第一操作;响应于第一操作,断开与翻译服务器之间的连接,例如,处理模块1001用于支持终端执行上述实施例中的S1031-S1032。
上述处理模块1001还用于保持与翻译服务器之间的连接,并且停止向翻译服务器发送语音数据,处理模块1001用于支持终端执行上述实施例中的S1033。
可选的,本申请实施例中,上述预设条件包括离散性条件、连续性条件以及紧致性条件;上述处理模块1001,具体用于在第一语音数据包不满足离散性条件,且不满足连续性条件,且不满足紧致性条件的情况下,确定第一语音数据包中不存在有效语音数据。
上述处理模块1001,还用于在确定第一语音数据包的第一属性值的连续性计数过程中,确定第i个第一属性值是否大于第一属性值的阈值,1≤i≤N-1;若第i个第一属性值大于第一属性值的阈值,则将第一属性值的连续性计数增加1;若第i个第一属性值小于或者等于第一属性值的阈值,则第一属性值的离散性计数增加1;该处理模块1001,还用于在离散性计数小于或者等于离散性阈值的情况下,确定第i+1个第一属性值是否大于第一属性值的阈值,该离散性阈值满足:T d=θ d×F s,其中,T d为离散性阈值,θ d为离散性系数,F s为第一语音数据包的采样频率;在离散性计数大于离散性阈值,将连续性计数和离散性计数清零,并且 从第i+1个第一属性值开始重新确定连续性计数,例如,处理模块1001用于支持终端执行上述实施例中的S301-S306。
当然,终端1000包括但不限于上述所列举的单元模块。例如,终端1000还可以包括接收模块和发送模块。接收模块用于接收其他终端发送的数据或者指令。发送模块用于向其他终端发送数据或者指令。并且,上述功能单元的具体所能够实现的功能也包括但不限于上述实例所述的方法步骤对应的功能,终端1000的其他单元的详细描述可以参考其所对应方法步骤的详细描述,本申请实施例这里不再赘述。
本申请实施例中,处理模块1001可以是处理器或控制器,例如可以是中央处理器(central processing unit,CPU),通用处理器,数字信号处理器(digital signal processor,DSP),专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块可以是收发器、收发电路或通信接口等。存储模块1002可以是存储器。
其中,处理模块1001为处理器(如图1所示的处理器110),存储模块1002为存储器(如图1所示的内部存储器121),语音捕获模块1003可以包括麦克风(如图1所示的麦克风170),通信模块1004可以为如图1所示的移动通信模块150或无线通信模块160,通信模块1004可以统称为通信接口。显示模块1005为触摸屏(包括图1所示的显示屏194,该显示屏194中集成了显示面板和触控面板)。本申请实施例所提供的终端可以为图1所示的终端100。其中,上述处理器、通信接口、触摸屏、存储器、麦克风等可以通过总线耦合在一起。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机程序代码,当上述处理器执行该计算机程序代码时,该终端执行图4、图5或图7任一附图中的相关方法步骤实现上述实施例中的方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行图4、图5或图7任一附图中的相关方法步骤实现上述实施例中的方法。
其中,本申请实施例提供的终端1000、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以使用硬件的形式实现,也可以使用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种在线翻译过程中的语音数据处理方法,其特征在于,应用于终端进行语音通信,所述方法包括:
    终端获取第一语音数据包,所述第一语音数据包包括预设时间段内的语音数据;
    所述终端确定所述第一语音数据包中不存在有效语音数据,所述终端停止向翻译服务器上传语音数据。
  2. 根据权利要求1所述的方法,其特征在于,所述终端确定所述第一语音数据包中是否存在有效语音数据,包括:
    所述终端确定所述第一语音数据包的第一属性值是否满足预设条件,所述预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件;
    所述第一语音数据包的第一属性值不满足所述预设条件,则所述终端确定所述第一语音数据包中不存在有效语音数据;或者
    所述第一语音数据包的第一属性值满足所述预设条件,则所述终端确定所述第一语音数据包中存在有效语音数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    所述终端确定所述第一语音数据包中存在有效语音数据;
    所述终端继续向所述翻译服务器上传语音数据。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:
    所述终端向所述翻译服务器发送所述第一语音数据包;
    所述终端从所述翻译服务器接收所述第一语音数据包的翻译结果;
    所述终端确定所述第一语音数据包满足第一条件,则所述终端停止向所述翻译服务器上传语音数据;其中,所述第一条件包括以下条件中的一个或多个:
    所述第一语音数据包中不存在有效语音数据;
    所述第一语音数据包的翻译结果为空。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述终端停止向翻译服务器上传语音数据,包括:
    所述终端显示第一提示框;
    所述终端接收用户对所述第一提示框的第一操作,响应于所述第一操作,断开与所述翻译服务器之间的连接。
  6. 根据权利要求1至4任一项所述的方法,其特征在于,所述终端停止向翻译服务器上 传语音数据,包括:
    所述终端保持与所述翻译服务器之间的连接,并且所述终端停止向所述翻译服务器发送语音数据。
  7. 根据权利要求2至6所述的方法,其特征在于,所述预设条件包括离散性条件、连续性条件以及紧致性条件;
    若所述第一属性值不满足所述离散性条件,且不满足所述连续性条件,且不满足所述紧致性条件,则所述终端确定所述第一语音数据包中不存在有效语音数据。
  8. 根据权利要求2至7任一项所述的方法,其特征在于,
    所述第一语音数据包的第一属性值包括所述第一语音数据包的各个数据帧的信噪比。
  9. 根据权利要求2至8任一项所述的方法,其特征在于,
    所述离散性条件包括:所述第一语音数据包的第一属性值的方差大于方差阈值。
  10. 根据权利要求2至8任一项所述的方法,其特征在于,
    所述连续性条件包括:所述第一语音数据包的第一属性值的连续性计数大于连续性阈值,所述连续性阈值满足:T c=θ c×F s,其中,T c为所述连续性阈值,θ c为连续性系数,F s为所述第一语音数据包的采样频率;
    其中,所述第一语音数据包的第一属性值包括N个,N为大于或者等于1的整数,确定所述第一属性值的连续性计数,包括:
    终端确定第i个第一属性值是否大于第一属性值的阈值,1≤i≤N-1;
    若所述第i个第一属性值大于第一属性值的阈值,则所述终端将所述第一属性值的连续性计数增加1;
    若所述第i个第一属性值小于或者等于所述第一属性值的阈值,则所述第一属性值的离散性计数增加1;
    在所述离散性计数小于或者等于离散性阈值的情况下,确定第i+1个第一属性值是否大于所述第一属性值的阈值,所述离散性阈值满足:T d=θ d×F s,其中,T d为所述离散性阈值,θ d为离散性系数,F s为所述第一语音数据包的采样频率;
    若所述离散性计数大于所述离散性阈值,则将所述连续性计数和离散性计数清零,所述终端从第i+1个第一属性值开始重新确定所述连续性计数。
  11. 根据权利要求2至8任一项所述的方法,其特征在于,
    所述紧致性条件包括:所述第一语音数据包的第一属性值的紧致性计数大于紧致性阈值,所述紧致性阈值满足:T i=θ i×N,其中,T i为所述紧致性阈值,θ i为紧致性系数,N为所述 第一语音数据包中包括的第一属性值的数量,所述紧致性计数为大于第一属性值的阈值的第一属性值的数量。
  12. 一种终端,其特征在于,所述终端包括:一个或多个处理器、存储器、通信接口以及麦克风;所述存储器、所述通信接口与所述处理器耦合;所述麦克风用于捕获语音数据;所述存储器用于存储计算机程序代码;所述计算机程序代码包括计算机指令,当所述处理器执行上述计算机指令时,
    所述处理器,用于控制所述麦克风获取第一语音数据包,所述第一语音数据包包括预设时间段内的语音数据;
    所述处理器,还用于确定所述第一语音数据包中不存在有效语音数据,停止向翻译服务器上传语音数据。
  13. 根据权利要求12所述的终端,其特征在于,
    所述处理器,具体用于确定所述麦克风获取的第一语音数据包的第一属性值是否满足预设条件,所述预设条件包括以下条件中的一个或多个:离散性条件、连续性条件、紧致性条件;所述第一语音数据包的第一属性值不满足所述预设条件,则确定所述第一语音数据包中不存在有效语音数据;或者,所述第一语音数据包的第一属性值满足所述预设条件,则所述确定所述第一语音数据包中存在有效语音数据。
  14. 根据权利要求12或13所述的终端,其特征在于,
    所述处理器,还用于确定所述第一语音数据包中存在有效语音数据,并且通过所述通信接口继续向所述翻译服务器上传语音数据。
  15. 根据权利要求12至14任一项所述的终端,其特征在于,
    所述处理器还用于通过所述通信接口向所述翻译服务器发送所述第一语音数据包;并且从所述翻译服务器接收所述第一语音数据包的翻译结果;
    所述处理器,还用于在确定所述第一语音数据包满足第一条件的情况下,停止向所述翻译服务器上传语音数据;其中,所述第一条件包括以下条件中的一个或多个:
    所述第一语音数据包中不存在有效语音数据;
    所述第一语音数据包的翻译结果为空。
  16. 根据权利要求12至15任一项所述的终端,其特征在于,所述终端还包括触摸屏;
    所述处理器,还用于控制所述触摸屏显示第一提示框;
    所述处理器,还用于接收用户对所述触摸屏显示的所述第一提示框的第一操作;响应于所述第一操作,断开与所述翻译服务器之间的连接。
  17. 根据权利要求12至15任一项所述的终端,其特征在于,
    所述处理器,还用于保持与所述翻译服务器之间的连接,并且停止向所述翻译服务器发送语音数据。
  18. 根据权利要求13至17任一项所述的终端,其特征在于,所述预设条件包括离散性条件、连续性条件以及紧致性条件;
    所述处理器,具体用于在所述第一属性值不满足所述离散性条件,且不满足所述连续性条件,且不满足所述紧致性条件的情况下,确定所述第一语音数据包中不存在有效语音数据。
  19. 根据权利要求13至18任一项所述的终端,其特征在于,
    第一语音数据包的第一属性值包括所述第一语音数据包的各个数据帧的信噪比。
  20. 根据权利要求13至19任一项所述的终端,其特征在于,
    所述离散性条件包括:所述第一语音数据包的第一属性值的方差大于方差阈值。
  21. 根据权利要求13至19任一项所述的终端,其特征在于,
    所述连续性条件包括:所述第一语音数据包的第一属性值的连续性计数大于连续性阈值,所述连续性阈值满足:T c=θ c×F s,其中,T c为所述连续性阈值,θ c为连续性系数,F s为所述第一语音数据包的采样频率;
    其中,所述第一语音数据包的第一属性值包括N个,N为大于或者等于1的整数;
    所述处理器,还用于确定第i个第一属性值是否大于第一属性值的阈值,1≤i≤N-1;若所述第i个第一属性值大于第一属性值的阈值,则将所述第一属性值的连续性计数增加1;若所述第i个第一属性值小于或者等于所述第一属性值的阈值,则所述第一属性值的离散性计数增加1;
    所述处理器,还用于在所述离散性计数小于或者等于离散性阈值的情况下,确定第i+1个第一属性值是否大于所述第一属性值的阈值,所述离散性阈值满足:T d=θ d×F s,其中,T d为所述离散性阈值,θ d为离散性系数,F s为所述第一语音数据包的采样频率;在所述离散性计数大于所述离散性阈值的情况下,将所述连续性计数和离散性计数清零,并且从第i+1个第一属性值开始重新确定所述连续性计数。
  22. 根据权利要求13至19任一项所述的终端,其特征在于,
    所述紧致性条件包括:所述第一语音数据包的第一属性值的紧致性计数大于紧致性阈值,所述紧致性阈值满足:T i=θ i×N,其中,T i为所述紧致性阈值,θ i为紧致性系数,N为所述第一语音数据包中包括的第一属性值的数量,所述紧致性计数为大于第一属性值的阈值的第一属性值的数量。
  23. 一种计算机存储介质,其特征在于,所述计算机存储介质包括计算机指令,当所述 计算机指令在终端上运行时,使得所述终端执行如权利要求1至11中任意一项所述的在线翻译过程中的语音数据处理方法。
  24. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至11中任意一项所述的在线翻译过程中的语音数据处理方法。
PCT/CN2019/110556 2018-10-15 2019-10-11 在线翻译过程中的语音数据处理方法及装置 WO2020078267A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811199111.4A CN109285563B (zh) 2018-10-15 2018-10-15 在线翻译过程中的语音数据处理方法及装置
CN201811199111.4 2018-10-15

Publications (1)

Publication Number Publication Date
WO2020078267A1 true WO2020078267A1 (zh) 2020-04-23

Family

ID=65176569

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/110556 WO2020078267A1 (zh) 2018-10-15 2019-10-11 在线翻译过程中的语音数据处理方法及装置

Country Status (2)

Country Link
CN (2) CN114999535A (zh)
WO (1) WO2020078267A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999535A (zh) * 2018-10-15 2022-09-02 华为技术有限公司 在线翻译过程中的语音数据处理方法及装置
CN110265061B (zh) * 2019-06-26 2021-08-20 广州三星通信技术研究有限公司 对通话语音进行实时翻译的方法及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462070A (zh) * 2013-09-19 2015-03-25 株式会社东芝 语音翻译系统和语音翻译方法
CN105185375A (zh) * 2015-08-10 2015-12-23 联想(北京)有限公司 一种信息处理方法和电子设备
US20160267075A1 (en) * 2015-03-13 2016-09-15 Panasonic Intellectual Property Management Co., Ltd. Wearable device and translation system
CN107153541A (zh) * 2017-04-20 2017-09-12 北京小米移动软件有限公司 浏览交互处理方法及装置
CN107885731A (zh) * 2017-11-06 2018-04-06 深圳市沃特沃德股份有限公司 语音翻译方法和装置
CN109285563A (zh) * 2018-10-15 2019-01-29 华为技术有限公司 在线翻译过程中的语音数据处理方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2561508A1 (en) * 2010-04-22 2013-02-27 Qualcomm Incorporated Voice activity detection
JP5242826B1 (ja) * 2012-03-22 2013-07-24 株式会社東芝 情報処理装置及び情報処理方法
CN104424956B9 (zh) * 2013-08-30 2022-11-25 中兴通讯股份有限公司 激活音检测方法和装置
CN104202321B (zh) * 2014-09-02 2017-10-03 上海天脉聚源文化传媒有限公司 一种声音录制的方法及装置
US10366173B2 (en) * 2016-09-09 2019-07-30 Electronics And Telecommunications Research Institute Device and method of simultaneous interpretation based on real-time extraction of interpretation unit
CN106710606B (zh) * 2016-12-29 2019-11-08 百度在线网络技术(北京)有限公司 基于人工智能的语音处理方法及装置
CN107146617A (zh) * 2017-06-15 2017-09-08 成都启英泰伦科技有限公司 一种新型语音识别设备及方法
CN107343113A (zh) * 2017-06-26 2017-11-10 深圳市沃特沃德股份有限公司 语音通话方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462070A (zh) * 2013-09-19 2015-03-25 株式会社东芝 语音翻译系统和语音翻译方法
US20160267075A1 (en) * 2015-03-13 2016-09-15 Panasonic Intellectual Property Management Co., Ltd. Wearable device and translation system
CN105185375A (zh) * 2015-08-10 2015-12-23 联想(北京)有限公司 一种信息处理方法和电子设备
CN107153541A (zh) * 2017-04-20 2017-09-12 北京小米移动软件有限公司 浏览交互处理方法及装置
CN107885731A (zh) * 2017-11-06 2018-04-06 深圳市沃特沃德股份有限公司 语音翻译方法和装置
CN109285563A (zh) * 2018-10-15 2019-01-29 华为技术有限公司 在线翻译过程中的语音数据处理方法及装置

Also Published As

Publication number Publication date
CN109285563A (zh) 2019-01-29
CN114999535A (zh) 2022-09-02
CN109285563B (zh) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2021000876A1 (zh) 一种语音控制方法、电子设备及系统
WO2021052214A1 (zh) 一种手势交互方法、装置及终端设备
CN110347269B (zh) 一种空鼠模式实现方法及相关设备
WO2020133183A1 (zh) 音频数据的同步方法及设备
WO2022100610A1 (zh) 投屏方法、装置、电子设备及计算机可读存储介质
CN112119641B (zh) 通过转发模式连接的多tws耳机实现自动翻译的方法及装置
CN111132234A (zh) 一种数据传输方法及对应的终端
WO2021190314A1 (zh) 触控屏的滑动响应控制方法及装置、电子设备
WO2021068926A1 (zh) 模型更新方法、工作节点及模型更新系统
WO2022022319A1 (zh) 一种图像处理方法、电子设备、图像处理系统及芯片系统
WO2021052408A1 (zh) 一种电子设备显示方法及电子设备
CN113676339B (zh) 组播方法、装置、终端设备及计算机可读存储介质
CN114221402A (zh) 终端设备的充电方法、装置和终端设备
CN114822525A (zh) 语音控制方法和电子设备
WO2022206825A1 (zh) 一种调节音量的方法、系统及电子设备
WO2022135144A1 (zh) 自适应显示方法、电子设备及存储介质
WO2022042768A1 (zh) 索引显示方法、电子设备及计算机可读存储介质
WO2020078267A1 (zh) 在线翻译过程中的语音数据处理方法及装置
CN113467747B (zh) 音量调节方法、电子设备及存储介质
WO2021204036A1 (zh) 睡眠风险监测方法、电子设备及存储介质
CN114120987B (zh) 一种语音唤醒方法、电子设备及芯片系统
CN111026285B (zh) 一种调节压力阈值的方法及电子设备
CN113467904A (zh) 确定协同模式的方法、装置、电子设备和可读存储介质
WO2024055881A1 (zh) 时钟同步方法、电子设备、系统及存储介质
WO2023020420A1 (zh) 音量显示方法、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19872775

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19872775

Country of ref document: EP

Kind code of ref document: A1