CN113053411B

CN113053411B - Voice data processing device, method, system and storage medium

Info

Publication number: CN113053411B
Application number: CN202010234648.0A
Authority: CN
Inventors: 杨立钢; 龚智辉
Original assignee: Shenzhen Ucloudlink New Technology Co Ltd
Current assignee: Shenzhen Ucloudlink New Technology Co Ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2024-01-16
Anticipated expiration: 2040-03-30
Also published as: CN113053411A

Abstract

The application is applicable to the technical field of computer communication, and provides voice data processing equipment, which comprises: a first interface unit and a data processing unit, the first interface unit being communicatively coupled to the data processing unit; the first interface unit is used for transmitting the first voice data and the second voice data between the data processing unit and the terminal equipment; a data processing unit for performing at least one of the following operations: converting the first voice data received through the first interface unit into second voice data, and sending the second voice data back to the terminal equipment through the first interface unit; the second voice data received through the first interface unit is converted into first voice data, and the first voice data is sent back to the terminal equipment through the first interface unit; the first voice data is input and output data of a voice processing program of the terminal equipment; the second voice data is input and output data of a call unit of the terminal equipment.

Description

Voice data processing device, method, system and storage medium

Technical Field

The application belongs to the technical field of computer communication, and particularly relates to equipment, a method, a system and a storage medium for processing voice data.

Background

Some smart phone operating systems do not provide Application program interfaces (Application Programming Interface, API) for Application programs (APP) to process voice signals of voice calls, so APP cannot implement deep control of voice calls, and cannot fully process call content.

Disclosure of Invention

Embodiments of the present application provide a device, a method, a system, and a storage medium for processing voice data, which can solve at least some of the above problems.

In a first aspect, an embodiment of the present application provides a voice data processing apparatus, including:

a first interface unit and a data processing unit,

the first interface unit is communicatively coupled with the data processing unit;

the first interface unit is used for transmitting first voice data and second voice data between the data processing unit and the terminal equipment;

the data processing unit is used for executing at least one of the following operations:

converting the first voice data received through the first interface unit into second voice data, and sending the second voice data back to the terminal equipment through the first interface unit; and, a step of, in the first embodiment,

converting second voice data received through the first interface unit into first voice data, and sending the first voice data back to the terminal equipment through the first interface unit;

The first voice data are input and output data of a voice processing program of the terminal equipment;

the second voice data is input and output data of a call unit of the terminal equipment.

It can be understood that the voice data processing device converts the first voice data sent by the call unit of the terminal device into the second voice data, sends the second voice data back to the terminal device, and receives and processes the second voice data by the voice processing program of the terminal device; or the voice data processing device converts the second voice data sent by the voice processing program of the terminal device into the first voice data, sends the first voice data back to the terminal device, and is received and processed by the call unit of the terminal device. The voice processing program can take over the call of the terminal equipment, so that the comprehensive and deep processing of the voice data of the call design is realized.

In a second aspect, an embodiment of the present application provides a method for processing voice data, including:

applied to voice data processing equipment;

the voice data processing device comprises a first interface unit and a data processing unit, the method comprising at least one of the following steps:

receiving first voice data sent by a terminal device through the first interface unit, converting the first voice data into second voice data by the data processing unit, and sending the second voice data back to the terminal device through the first interface unit; and, a step of, in the first embodiment,

Receiving second voice data sent by the terminal equipment through the first interface unit, converting the second voice data into first voice data by the data processing unit, and sending the first voice data back to the terminal equipment through the first interface unit;

In a third aspect, an embodiment of the present application provides a voice data processing apparatus, which is applied to a voice data processing device;

the voice data processing device comprises a first interface unit and a data processing unit, and the apparatus comprises at least one of the following modules:

the first data processing module is used for receiving first voice data sent by the terminal equipment through the first interface unit, converting the first voice data into second voice data by the data processing unit, and sending the second voice data back to the terminal equipment through the first interface unit; and, a step of, in the first embodiment,

the second data processing module is used for receiving second voice data sent by the terminal equipment through the first interface unit, converting the second voice data into first voice data by the data processing unit, and sending the first voice data back to the terminal equipment through the first interface unit;

In a fourth aspect, an embodiment of the present application provides a terminal device, including:

the device comprises a storage unit, a processing unit, a call unit, a second interface unit, and a voice processing program and a control program which are stored in the storage unit and can run on the processing unit;

the memory unit is communicatively coupled with the processing unit; the communication unit is communicatively coupled with the processing unit; the call unit is communicatively coupled with the second interface unit;

the communication unit is used for processing the first voice data; the voice processing program is used for processing second voice data;

the processing unit, when executing the control program, performs at least one of the following operations:

the first voice data output by the call unit is sent to voice data processing equipment through the second interface unit; the voice data processing device is used for converting the first voice data into second voice data and sending the second voice data back to the terminal device; receiving the second voice data through the second interface unit, and taking the second voice data as input data of the voice processing program; and, a step of, in the first embodiment,

Transmitting second voice data output by the voice processing program to voice data processing equipment through the second interface unit; the voice data processing device is used for converting the second voice data into first voice data and sending the first voice data back to the terminal device; and receiving the first voice data through the second interface unit, and taking the first voice data as input data of the call unit.

In a fifth aspect, embodiments of the present application provide a method for processing voice data,

the voice processing method is applied to the terminal equipment, and the terminal equipment comprises a call unit, a second interface unit and a voice processing program;

the method comprises at least one of the following operations:

In a sixth aspect, embodiments of the present application provide an apparatus for processing voice data,

the device comprises at least one of the following modules:

the third data processing module is used for sending the first voice data output by the call unit to the voice data processing equipment through the second interface unit; the voice data processing device is used for converting the first voice data into second voice data and sending the second voice data back to the terminal device; receiving the second voice data through the second interface unit, and taking the second voice data as input data of the voice processing program; and, a step of, in the first embodiment,

A fourth data processing module, configured to send second voice data output by the voice processing program to a voice data processing device through the second interface unit; the voice data processing device is used for converting the second voice data into first voice data and sending the first voice data back to the terminal device; and receiving the first voice data through the second interface unit, and taking the first voice data as input data of the call unit.

In a seventh aspect, embodiments of the present application provide a system for processing speech data, including:

the voice data processing apparatus described in the first aspect and the terminal apparatus described in the fourth aspect.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium, comprising: the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method steps of the first aspect described above.

In a ninth aspect, embodiments of the present application provide a computer program product for causing an electronic device to carry out the method steps of the first aspect described above when the computer program product is run on the electronic device.

It will be appreciated that the advantages of the second to ninth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a speech data processing system according to one embodiment of the present application;

fig. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a speech data processing system according to another embodiment of the present application;

FIG. 4 is a schematic diagram of a voice data processing system according to another embodiment of the present application;

FIG. 5 is a schematic diagram of a speech data processing system according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a voice data processing system according to another embodiment of the present application;

FIG. 7 is a flow chart of a method for voice data processing according to an embodiment of the present application;

FIG. 8 is a flow chart illustrating a method of voice data processing according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a call assistant according to another embodiment of the present application;

FIG. 10 is a schematic diagram of a GoIP terminal provided in one embodiment of the present application;

FIG. 11 is a schematic diagram of an automatic translator provided in one embodiment of the present application;

FIG. 12 is a flow chart illustrating a method of voice data processing according to another embodiment of the present application;

fig. 13 is a schematic diagram of a call background sound control device according to an embodiment of the present application;

fig. 14 is a flowchart of a method for processing voice data according to another embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Some smart phone operating systems do not provide an Application program interface (Application Programming Interface, API) for Applications (APP) to process voice signals of voice calls in both directions, for example, in some smart phones, only a phone APP built in by the phone manufacturer, which is dedicated to making a call, can process voice signals in a two-way voice call, and no API for other APPs to process voice signals in a voice call is provided. The voice processing APP cannot realize the deep control of voice call, namely cannot directly receive and process voice signals in the voice call, and cannot directly send the voice signals of the APP to the remote equipment through the mobile communication network voice call service. However, in some application scenarios, the user needs to perform overall processing on the voice signal in the voice call to meet the specific function required by the user, for example, performing various voice related processes such as recognition processing, natural voice processing, text-to-speech processing, audio codec, voice over IP (Voice over Internet Protocol, voIP) or recording and playing on the voice signal in the voice call. However, the built-in phone APP provided by the handset manufacturer is generally not able to provide the functionality for handling the application scenario described above. To solve the above drawbacks, embodiments of the present application provide a system for processing voice data, a voice data processing device, and a method for processing voice data.

Fig. 1 shows a system 01 for processing voice data according to an embodiment of the present application, where the system includes: a voice data processing device 10 and a terminal device 20.

As shown in fig. 1, the voice data processing apparatus includes a first interface unit 110 and a data processing unit 120;

the first interface unit 110 is communicatively coupled to the data processing unit 120;

the first interface unit may be a universal serial bus interface (Universal Serial Bus, USB) interface, including but not limited to a USB type-C, USB Mini-B, USB 3.0Micro B, micro-USB B, lightning interface; the first interface unit may also be a combination of a USBtype-C interface and an analog audio interface (TRS); the first interface unit may also be a combination of lightning Lighting and an analog audio interface (TRS); the first interface unit may also be a combination of a USB interface and a TRS interface.

The data processing unit comprises a processor subunit including a processor and a memory;

the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may in some embodiments be an internal storage unit of the speech data processing device, such as a hard disk or a memory of the speech data processing device. The memory may in other embodiments also be an external storage device of the speech data processing device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the speech data processing device. Further, the memory may also include both internal storage units and external storage devices of the voice data processing device 120. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, etc., such as program code for the computer program, etc. The memory may also be used to temporarily store data that has been output or is to be output.

The first interface unit 110 is configured to transmit first voice data and second voice data between the data processing unit 120 and the terminal device 20;

the first voice data is input/output data of a voice processing program 221 of the terminal device 20;

The second voice data is input/output data of the call unit 230 of the terminal device 20.

In some embodiments, the first voice data is an analog signal; the second voice data is a digital signal carried by a universal serial bus protocol.

In some embodiments, the first voice data is a digital audio signal, e.g., IIS signal, PCM signal; the second voice data is a digital signal carried by a universal serial bus protocol.

In some embodiments, the first voice data is a digital signal carried by a universal serial bus protocol; the second voice data is a digital signal carried by a universal serial bus protocol; the first voice data and the second voice data are distinguished through the identification information of the data packet carried by the universal serial bus.

The first voice data is data sent or received by a call unit of the terminal equipment, and is used for voice communication by the call unit through a mobile communication network. In some embodiments analog audio data, carried by analog signals; in some embodiments digital audio data carried by a digital audio protocol data stream; in some embodiments, are universal serial bus data packets whose data payload is voice data.

The second voice data is data sent or received by a voice processing program of the terminal device, and is used for the voice processing program to process the second voice data to complete a function preset by the voice processing program, such as various voice related processes including recognition processing, natural language processing, text-to-voice processing, audio encoding and decoding, voIP transmission or recording and playing, and in some embodiments, the second voice data is a universal serial bus data packet, and a data payload of the universal serial bus data packet is voice data; in some embodiments, the second voice data is subjected to the second protocol conversion to extract the voice data for processing by the voice processing program.

The data processing unit 120 is configured to perform at least one of the following operations:

converting the first voice data received through the first interface unit 110 into second voice data, and transmitting the second voice data back to the terminal device 20 through the first interface unit 110; and, a step of, in the first embodiment,

converting the second voice data received through the first interface unit 110 into first voice data, and transmitting the first voice data back to the terminal device 20 through the first interface unit 110;

The first interface subunit includes an analog signal path of a universal serial bus interface, or a TRS interface.

It can be understood that, in the case that the voice processing program of the terminal device has the authority to transmit and receive the second voice data through the second interface unit of the terminal device, the voice data processing device converts the first voice data sent by the call unit of the terminal device into the second voice data, sends the second voice data back to the terminal device, and the voice processing program of the terminal device receives and processes the second voice data; or the voice data processing device converts the second voice data sent by the voice processing program of the terminal device into the first voice data, sends the first voice data back to the terminal device, and is received and processed by the call unit of the terminal device. The voice processing program can take over the call of the terminal equipment, so that the comprehensive and deep processing of the voice data of the call design is realized.

Fig. 2 shows a terminal device provided in an embodiment of the present application, as shown in fig. 2, the terminal device 20 includes a storage unit 220, a processing unit 200, a call unit 230, a second interface unit 210, and a speech processing program 221 and a control program 222 that are stored in the storage unit 220 and can run on the processing unit; the communication unit is communicatively coupled with the processing unit and is used for processing first voice data; for a more convenient understanding of the working principle of the embodiments of the present application, in fig. 1, only part of the components of the terminal device 20 are shown.

The storage unit 220 is communicatively coupled to the processing unit 200; the telephony unit 230 is communicatively coupled to the processing unit 200; the call unit 230 is communicatively coupled to the second interface 210 unit;

the call unit 230 is configured to process the first voice data; the speech processing program 221 is configured to process the second speech data;

the processing unit 200, when executing the control program 222, performs at least one of the following operations:

transmitting the second voice data outputted from the voice processing program 221 to a voice data processing apparatus through the second interface unit; the voice data processing device is used for converting the second voice data into first voice data and sending the first voice data back to the terminal device; and receiving the first voice data through the second interface unit, and taking the first voice data as input data of the call unit. The voice processing program can be an APP which can independently realize the functions, or can be a combination of the APP and a functional module of an operating system of the terminal equipment.

The call unit 230 is used for realizing communication with a mobile communication network, converting voice data of the mobile communication network into first voice data and outputting the first voice data, or receiving the first voice data and transmitting the first voice data to a remote device through the mobile communication network. The call unit 230 includes, but is not limited to, a functional unit composed of a modem module and an audio codec module, and a person skilled in the art may implement the function of the call unit 230 according to actual situations.

The terminal includes, but is not limited to, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, and the like, and can perform voice communication service through a mobile communication network, and has an operating system, where voice processing software can be run, and the voice processing software has a device for receiving and transmitting second voice data authority.

On the basis of the embodiment shown in fig. 1, the voice data processing device provided in some embodiments of the present application, as shown in fig. 3, the first interface unit 110 of the voice data processing device 10 includes: a first interface subunit 111 and a second interface subunit 112;

the first interface subunit 111 is configured to transmit first voice data between the data processing unit and the terminal device;

The second interface subunit 112 is configured to transmit second voice data between the data processing unit and the terminal device.

In some embodiments, the first voice data is an analog signal; the second voice data is a digital signal.

The first interface subunit 111 is connected to the fifth interface subunit 211 of the terminal device through an analog signal cable; the second interface subunit 112 is connected to a sixth interface subunit 212 of the terminal device via a digital signal cable.

In some embodiments, the first interface subunit and the fifth interface subunit are analog signal channels of a universal serial bus interface, including but not limited to a voice signal channel of a usb type-C bus; the second interface subunit and the sixth interface subunit are digital signal channels of a universal serial bus interface, including but not limited to digital signal channels of a usb type-C bus.

In other embodiments, the first interface subunit and the fifth interface subunit are TRS interfaces, including but not limited to TRS 3.5mm interfaces; the second interface subunit and the sixth interface subunit are universal serial bus interfaces, including but not limited to digital signal interfaces such as USBtype-C, USB Mini-B, USB 3.0Micro B, micro-USB B, lighting interfaces and the like.

On the basis of the embodiment shown in fig. 3, as shown in fig. 4, in some embodiments, the data processing unit 120 of the voice data processing apparatus 10 includes: an analog data processing subunit 121 and a processor subunit 122;

the analog data processing subunit 121 is communicatively coupled to the processor subunit 122;

the analog data processing subunit is communicatively coupled with the first interface subunit;

the analog data processing subunit is configured to convert the first voice data received by the first interface subunit into a digital signal and send the digital signal to the processor subunit, or convert the digital signal output by the processor subunit into the first voice data and send the first voice data to the first interface subunit.

The analog processing subunit comprises an analog-to-digital conversion module, a digital-to-analog conversion module and an audio encoding and decoding module; the analog-to-digital conversion module is communicatively coupled with the analog signal interface subunit; the analog-to-digital conversion module is communicatively coupled with the audio encoding and decoding module; the digital-to-analog conversion module is in communication coupling with the analog signal interface subunit; the digital-to-analog conversion module is in communication coupling with the audio encoding and decoding module; the audio codec module is communicatively coupled with the processor subunit. The analog-to-digital conversion module is used for converting the first voice data into digital signals; the digital-to-analog conversion module is used for converting the digital signal output by the audio encoding and decoding module into first voice data; the audio coding module is used for conversion between digital signal coding and PCM signals recognizable by the processor subunit.

The processor subunit includes a processor and a memory.

As shown in fig. 5, in some embodiments, the first interface unit of the voice data processing apparatus 10 includes: a third interface subunit;

the third interface subunit is connected with a seventh interface subunit of the second interface unit of the terminal equipment through a digital signal cable. The third interface subunit and the seventh interface subunit include universal serial bus interfaces including but not limited to digital signal interfaces such as USBtype-C, USB Mini-B, USB 3.0Micro B, micro-USB B, lighting interfaces, etc.

The third interface subunit is configured to transmit first voice data between the data processing unit and the terminal device;

the third interface subunit is further configured to transmit second voice data between the data processing unit and the terminal device;

the first voice data and the second voice data are data carried by a universal serial bus protocol.

The voice data processing device performs at least one of the following operations through the third interface subunit:

and the data processing unit extracts first target data in the first voice data according to a first protocol, packages the first target data into second voice data according to a second protocol, and sends the second voice data back to the terminal equipment through the third interface subunit.

And receiving second voice data sent by the terminal equipment through the third interface subunit, extracting second target data in the second voice data by the data processing unit according to a second protocol, packaging the second target data into first voice data according to a first protocol, and sending the first voice data back to the terminal equipment through the third interface subunit.

On the basis of the embodiment shown in fig. 1, as shown in fig. 6, in some embodiments the first interface unit of the speech data processing device 10 further comprises a fourth interface subunit;

the fourth interface subunit is used for transmitting data between the data processing unit and the audio equipment; the fourth interface subunit is configured to transmit analog audio data.

The audio device comprises at least one of the following: the transmitting device and the monitoring device. The speech transmitting device includes, but is not limited to, a microphone or the like for receiving the user's speech. The monitoring device comprises, but is not limited to, devices such as headphones, speakers and the like for playing voice.

In an embodiment of the voice data processing device for mixing the currently processed first audio data by software, the fourth interface subunit is a TRS interface, and the analog processing subunit of the data processing unit is communicatively coupled to the TRS interface of the fourth interface subunit. In some embodiments, the first interface subunit includes a TRS interface, a data conversion module, an analog-to-digital conversion module, and an audio codec module, where the audio decoder module receives a digital audio signal, such as a PCM signal, output by the data processing unit, the data conversion module and the analog-to-digital conversion module are communicatively coupled to the audio codec module, respectively, for performing digital-to-analog conversion and analog-to-digital conversion, and the data conversion module and the analog-to-digital conversion module are electrically connected to the TRS interface, respectively.

In an embodiment adopting a hardware mixing module, the fourth interface subunit includes a fourth interface module and a mixing module; the fourth interface module is communicatively coupled with the microphone and the audio player; the sound mixing module is in communication coupling with the data processing unit; the fourth interface module is used for transmitting data between the sound mixing module and the audio equipment; the audio mixing module is used for carrying out audio mixing processing on the first audio data processed by the data processing unit. The mixing is to synthesize the voice signal sent by the APP and the voice data sent by the call module, and in some embodiments, synthesize the voice signal according to the time sequence to form a data stream of the third voice data, so as to facilitate the user to monitor.

Fig. 7 illustrates a method for voice processing according to an embodiment of the present application, which is applied to the voice data processing device 10 in the voice processing system 01 shown in fig. 1 and may be implemented by software and/or hardware of the voice data processing device. As shown in fig. 7, the method includes the step of performing at least one of steps S110 and S120. The specific implementation principle of each step is as follows:

s110, receiving first voice data sent by the terminal equipment through the first interface unit, converting the first voice data into second voice data by the data processing unit, and sending the second voice data back to the terminal equipment through the first interface unit.

S120, receiving second voice data sent by the terminal equipment through the first interface unit, converting the second voice data into first voice data by the data processing unit, and sending the first voice data back to the terminal equipment through the first interface unit.

In some embodiments, the converting the first voice data into the second voice data and the converting the second voice data into the first voice data may be a store-and-forward manner, for example, storing the first voice data into a second voice data buffer, and forwarding the data in the second voice data buffer as the second voice data; or storing the second voice data into the first voice data buffer memory, and forwarding the data in the first voice data buffer memory as the first voice data.

In some embodiments, the conversion of the first voice data into the second voice data and the conversion of the second voice data into the first voice data may be in a protocol conversion manner, for example, the first interface unit includes a third interface subunit including a universal serial bus interface including, but not limited to, USB type-C, USB Mini-B, USB 3.0Micro B, micro-USB B, lighting interface, and other digital signal interfaces. The first voice data and the second voice data are data carried by a universal serial bus protocol.

The data processing unit extracts first target data in the first voice data according to a first protocol, packages the first target data into second voice data according to a second protocol, and sends the second voice data back to the terminal equipment through the third interface subunit; and receiving second voice data sent by the terminal equipment through the third interface subunit, extracting second target data in the second voice data by the data processing unit according to a second protocol, packaging the second target data into first voice data according to a first protocol, and sending the first voice data back to the terminal equipment through the third interface subunit.

In some embodiments, for example, the terminal device employs the speech processing program 221 as a telephone recording program that needs to receive the voice data sent to the terminal device by the voice service, and then only step S110 needs to be performed, and referring to fig. 8, the data flow is shown as data flow (1) in fig. 8. The first voice data output by the call unit of the terminal device, namely the incoming call voice, is converted into second voice data through the voice data processing device and is sent back to the terminal device, so that the recording program can perform the content of the received incoming call voice, and the content is identified and recorded.

Fig. 8 is a data flow diagram of a method for voice data processing according to an embodiment of the present application.

In some embodiments, the terminal device adopts the speech processing program 221 as a telephone notification program, and the program needs to dial the remote device and convert the voice pre-stored in the telephone notification program or the pre-stored text into voice to send the voice to the remote device, for example, the voice notification of the verification code, then only step S120 needs to be executed, and referring to fig. 8, the data flow is shown as the data flow (2) in fig. 8. And converting the second voice data sent by the telephone notification program into first voice data through the voice data processing equipment, and sending the first voice data back to the terminal equipment, and sending the first voice data to the remote equipment through the mobile communication network after the call unit of the terminal equipment receives the first voice data.

In some embodiments, fig. 9 is a schematic diagram of a call assistant according to one embodiment of the present application; the speech processing program 221 employed by the terminal device 20 is a call assistant program that sends a prompt speech at the terminal device to answer a call and records speech sent by the remote device. The terminal equipment responds to the call and establishes call connection with the remote equipment, and the call assistant program sends the prompt voice to the voice data processing equipment in the form of second voice data through the second interface unit; the voice data processing equipment receives the second voice data through the first interface unit, converts the second voice data into first voice data and sends the first voice data to the terminal equipment through the first interface unit; after receiving the first voice data through the second interface unit, the communication unit of the terminal equipment transmits the prompt voice in the first voice data to the remote equipment through the mobile communication network.

The communication unit of the terminal equipment receives the voice data of the remote equipment and outputs first voice data, and the terminal equipment sends the first voice data to the voice data processing equipment through the second interface unit; the data processing unit of the voice data processing device receives the first voice data through the first interface unit, converts the first voice data into second voice data, and then sends the second voice data back to the terminal device through the first interface unit; after receiving the second voice data through the second interface unit, the call assistant program of the terminal device extracts and stores the voice data in the second voice data. Until the call assistant program stops receiving the second voice data transmitted by the voice data processing apparatus in response to the telephone on-hook signal of the terminal apparatus.

Referring to fig. 8, the flow direction of data includes two flow directions shown as a data flow (1) and a data flow (2) in fig. 8. Referring to the data flow (2) in fig. 8, after the terminal device receives a call, the call assistant program sends a prompt voice, for example, "hello, please leave a message", to the remote device, and the prompt voice is encoded by the call assistant and split into usb data packets, i.e., second voice data, which is sent to the voice data processing device through the second interface unit; the voice processing device receives the second voice data through the first interface unit, extracts data in the data packet according to the data type of the second voice data, converts the data into a digital audio signal coded by PCM (pulse code modulation), namely a first voice signal, and sends the first voice signal back to the terminal device through the first interface unit; after receiving the first voice signal, the terminal equipment takes the signal as an input signal of a call unit, and transmits the signal to the far-end terminal through the call unit. In some embodiments, the second voice data may also be converted into an analog voice signal, and sent to the call unit of the terminal device for processing as the first voice signal. Referring to the data stream (1) in fig. 8, a voice signal of a far-end device is received at a terminal device, and the voice signal is processed by a call unit to output first voice data. The first voice data may be analog signal data or digital signal data, depending on the specific implementation of the first interface unit and the second interface unit, reference may be made to the above embodiments. The terminal device sends the first voice data to the voice data processing device through the second interface unit, the voice data processing device converts the first voice data into universal serial bus data packets which are obtained by the call assistant program and can be processed, namely second voice data, the universal serial bus data packets are sent back to the terminal device through the first interface unit, the second voice data are received through the assistant program, and the voice data are extracted and recorded.

In some embodiments, the voice data processing device is configured to control the flow direction of the data stream through a connection configuration rule, which is a switching parameter, or a switching parameter matrix. Specifically, the on-off control of the data stream is realized through the on-off control of the conversion of the first voice data into the second voice data or through the on-off control of the conversion of the second voice data into the first voice data. For example, referring to fig. 8, the opening or closing of the data stream (1) or the data stream (2) may be controlled by a connection configuration rule.

FIG. 10 is a schematic diagram of a GoIP terminal provided in one embodiment of the present application; the GoIP terminal is a telephone gateway, and realizes communication between the mobile communication network and the VoIP platform. As shown in fig. 11, the voice processing procedure 221 adopted by the terminal device 20 is a VoIP access procedure, and the VoIP access procedure 221 of the GoIP terminal transmits and receives VoIP voice data to and from the VoIP platform 30 through the wireless lan. After receiving the VoIP voice data, the VoIP access program converts the VoIP voice data into second voice data; the VoIP access program sends the second voice data form of the VoIP voice data to the voice data processing equipment through the second interface unit; the voice data processing equipment receives the second voice data through the first interface unit, converts the second voice data into first voice data and sends the first voice data to the terminal equipment through the first interface unit; after receiving the first voice data through the second interface unit, the communication unit of the terminal equipment transmits the prompt voice in the first voice data to the remote equipment through the mobile communication network.

The communication unit of the terminal equipment receives the voice data of the remote equipment and outputs first voice data, and the terminal equipment sends the first voice data to the voice data processing equipment through the second interface unit; the data processing unit of the voice data processing device receives the first voice data through the first interface unit, converts the first voice data into second voice data, and then sends the second voice data back to the terminal device through the first interface unit; and the VoIP access program of the terminal equipment extracts voice data in the second voice data after receiving the second voice data through the second interface unit, converts the voice data into VoIP voice data and sends the VoIP voice data to the VoIP platform. Through the steps, the function of the VoIP telephone gateway is realized, and the VoIP voice service can be flexibly accessed.

FIG. 11 is a schematic diagram of an automatic translator according to an embodiment of the present application; as shown in fig. 11, the voice processing program 221 employed by the terminal device 20 is a voice recognition recording program, and the voice recognition recording program 221 of the terminal device converts text information input by the user into second voice data by recognizing the text information; the voice recognition recording program sends second voice data to the voice data processing equipment through the second interface unit; the voice data processing equipment receives the second voice data through the first interface unit, converts the second voice data into first voice data and sends the first voice data to the terminal equipment through the first interface unit; after receiving the first voice data through the second interface unit, the communication unit of the terminal equipment transmits the prompt voice in the first voice data to the remote equipment through the mobile communication network.

The communication unit of the terminal equipment receives the voice data of the remote equipment and outputs first voice data, and the terminal equipment sends the first voice data to the voice data processing equipment through the second interface unit; the data processing unit of the voice data processing device receives the first voice data through the first interface unit, converts the first voice data into second voice data, and then sends the second voice data back to the terminal device through the first interface unit; after receiving the second voice data through the second interface unit, the voice recognition recording program of the terminal equipment recognizes the second voice data to obtain text information corresponding to the voice content.

The automatic translation machine is realized through the steps, so that users in different languages can communicate in real time through telephones conveniently. Especially when the user is a language barrier user, the automatic translator provided by the embodiment of the application can be adopted to realize voice communication.

On the basis of the embodiment shown in fig. 7, the first interface unit includes: a first interface subunit and a second interface subunit;

the first interface unit is used for receiving first voice data sent by the terminal equipment, the data processing unit is used for converting the first voice data into second voice data, and the first interface unit is used for sending the second voice data back to the terminal equipment, and the method comprises the following steps:

receiving first voice data sent by a terminal device through the first interface subunit, converting the first voice data into second voice data by the data processing unit, and sending the second voice data back to the terminal device through the second interface subunit;

receiving, by the first interface unit, second voice data sent by the terminal device, where the data processing unit converts the second voice data into first voice data, and sends, by the first interface unit, the first voice data back to the terminal device, where the method includes:

And receiving second voice data sent by the terminal equipment through the second interface subunit, and converting the second voice data into first voice data by the data processing unit and sending the first voice data back to the terminal equipment through the first interface subunit.

It can be understood that the first voice data is an analog signal, or a digital audio signal, for example, a PCM signal, and the second voice data is a universal serial bus protocol data packet, and by converting the analog signal or the digital audio signal into the universal serial bus protocol data packet, or converting the universal serial bus protocol data packet into the analog signal or the digital audio signal, the APP can implement comprehensive and deep processing of the voice data in the voice call service by obtaining the authority of receiving and transmitting the universal serial bus protocol data packet through the second interface unit by the APP under the condition that the operating system of the mobile terminal does not provide the API for the APP.

On the basis of the embodiment shown in fig. 7, the first interface unit comprises a third interface subunit; the voice processing method comprises the following steps:

The data processing unit extracts first target data in the first voice data according to a first protocol, packages the first target data into second voice data according to a second protocol, and sends the second voice data back to the terminal equipment through the third interface subunit;

Wherein the first target data and the second target data include, but are not limited to, IIS or PCM digital audio encoded data.

In some embodiments, the first protocol and the second protocol may be a universal serial bus data transfer protocol, or other communication protocol that may transfer data over a universal serial bus.

In some embodiments, the first protocol is IIS or PCM digital audio coding data transmission protocol; the second protocol may be a universal serial bus data transfer protocol, or other communication protocol that may transfer data over a universal serial bus.

On the basis of the embodiment shown in fig. 7, in some embodiments, the first interface unit further comprises a fourth interface subunit; the method further comprises the steps of:

and determining whether to perform mixing processing and transmitting third voice data through the fourth interface subunit according to the connection rule configuration.

The fourth interface subunit is configured to connect to an audio device, including but not limited to a speech transmitter and a listening device.

In some embodiments, the mixing process is an operation of mixing the first voice data by a data processing unit to obtain third voice data; the first voice data is the first voice data received by the data processing unit from the terminal equipment or the first voice data obtained by converting the second voice data by the data processing unit.

In some embodiments, the fourth interface subunit receives speech data, i.e. the third speech data, input by the speech transmitting device, e.g. the microphone, and converts the speech data into digital audio signals via the analog processing subunit 121. According to the connection configuration rules, the digital audio signal is converted into first voice data and/or second voice data, and added into a data stream (1) and/or a data stream (2).

In some embodiments, the voice data processing device is configured to control the flow direction of the data stream through a connection configuration rule, which is a switching parameter, or a switching parameter matrix.

Fig. 12 is a schematic diagram of a data flow of a method for processing voice data according to an embodiment of the present application, and as shown in fig. 11, the connection rule configuration is used to determine the on-off state of the data flow (3). If the connection rule is configured that the data stream (3) is conducted, mixing processing is carried out on the data stream (1) or the data stream (2), and the data stream (3) is transmitted. The function of externally connecting the audio equipment is realized. If the connection rule is configured that the data stream (3) is turned off, the data stream (1) or the data stream (2) is not subjected to audio mixing processing, and the function of the external audio equipment is closed. Fig. 13 is a schematic diagram of a call background sound control device according to an embodiment of the present application; as shown in fig. 13, the voice processing program 221 employed by the terminal device 20 includes a voice processing subroutine 2211 and a background sound generating subroutine 2212.

The communication unit of the terminal equipment receives the voice data of the remote equipment and outputs first voice data, and the terminal equipment sends the first voice data to the voice data processing equipment through the second interface unit; the data processing unit of the voice data processing device receives the first voice data through the first interface unit, converts the first voice data into second voice data, and then sends the second voice data back to the terminal device through the first interface unit; after receiving the second voice data through the second interface unit, the voice processing subprogram of the terminal equipment performs preset processing on voice data in the second voice data; the voice data processing device determines whether to output the first voice signal to the audio playing device through the fourth interface unit according to the connection configuration rule when converting the first voice signal into the second voice signal.

The voice processing subprogram acquires voice data, in some embodiments, a user inputs third voice data to the voice data processing device through a first interface unit of the radio device, the voice data processing device determines whether to process the third voice data through the voice processing subprogram sent to the terminal device according to the connection rule configuration, and if the connection rule configuration is used for receiving the voice data of the radio device, the third voice data is converted into second voice data to be sent to the voice processing subprogram for processing. In some embodiments, the voice processing subroutine of the terminal device obtains the voice data by VoIP or the like according to a preset rule.

The voice processing subprogram determines whether to call the background sound generated by the background sound generating subprogram according to the user setting, wherein the background sound can be music background sound or natural environment background sound.

The voice processing subprogram mixes the second voice data with the background sound to generate mixed second voice data; the voice processing subprogram sends the second voice data to the voice data processing equipment through the second interface unit; the voice data processing equipment receives the second voice data through the first interface unit, converts the second voice data into first voice data and sends the first voice data to the terminal equipment through the first interface unit; after receiving the first voice data through the second interface unit, the communication unit of the terminal equipment transmits the prompt voice in the first voice data to the remote equipment through the mobile communication network. Through the steps, the function of adding background sound to the voice of the user is realized.

Corresponding to the method for processing voice data shown in fig. 7, the apparatus for processing voice data provided in the embodiment of the present application includes:

applied to voice data processing equipment;

It will be appreciated that various implementations and combinations of implementations and advantageous effects thereof in the above embodiments are equally applicable to this embodiment, and will not be described here again.

Fig. 14 shows a method for speech processing according to an embodiment of the present application, which is applied to the terminal device 20 in the speech processing system 01 shown in fig. 1 and may be implemented by software and/or hardware of the terminal device. As shown in fig. 14, the method includes the step of executing at least one of steps S210 and S220. The specific implementation principle of each step is as follows:

the terminal equipment comprises a call unit, a second interface unit and a voice processing program;

s210, the first voice data output by the call unit is sent to voice data processing equipment through the second interface unit; the voice data processing device is used for converting the first voice data into second voice data and sending the second voice data back to the terminal device; receiving the second voice data through the second interface unit, and taking the second voice data as input data of the voice processing program; and, a step of, in the first embodiment,

s220, sending the second voice data output by the voice processing program to voice data processing equipment through the second interface unit; the voice data processing device is used for converting the second voice data into first voice data and sending the first voice data back to the terminal device; and receiving the first voice data through the second interface unit, and taking the first voice data as input data of the call unit.

Corresponding to the method for processing voice data shown in fig. 14, the apparatus for processing voice data provided in the embodiment of the present application includes:

the device comprises at least one of the following modules:

It should be understood that the foregoing embodiments are all based on the same inventive concept, and thus, steps, modules, etc. of the embodiments may be replaced by combinations of steps, modules, etc. of the embodiments, which are not described herein.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

The present embodiments provide a computer program product which, when run on a speech data processing device or a terminal device, causes the speech data processing device or the terminal device to perform the steps that enable the implementation of the method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A voice data processing apparatus, comprising: a first interface unit and a data processing unit,

2. The device of claim 1, wherein the first interface unit comprises: a first interface subunit and a second interface subunit;

the first interface subunit is used for transmitting first voice data between the data processing unit and the terminal equipment;

the second interface subunit is configured to transmit second voice data between the data processing unit and the terminal device.

3. The apparatus of claim 2, wherein the data processing unit comprises: an analog data processing subunit and a processor subunit;

The analog data processing subunit is communicatively coupled with the processor subunit;

the analog data processing subunit is configured to convert the first voice data received by the first interface subunit into a digital signal, or convert the digital signal output by the processor subunit into the first voice data.

4. The device of claim 3, wherein the first interface subunit comprises a digital audio signal channel of a universal serial bus interface, an analog signal channel of a universal serial bus interface, or a TRS interface.

5. The device of claim 1, wherein the first interface unit comprises: a third interface subunit;

the third interface subunit is further configured to transmit second voice data between the data processing unit and the terminal device.

6. The device of claim 5, wherein the third interface subunit comprises a universal serial bus interface.

7. The apparatus of any of claims 1 to 6, wherein the first interface unit comprises a fourth interface subunit;

The fourth interface subunit is used for transmitting data between the data processing unit and the audio equipment;

the audio device includes at least one of a talker device and a listener device.

8. The apparatus of claim 7, wherein the fourth interface subunit comprises a fourth interface module and a mixing module;

the fourth interface module is communicatively coupled with the microphone and the audio player;

the sound mixing module is in communication coupling with the data processing unit;

the fourth interface module is used for transmitting data between the sound mixing module and the audio equipment;

the audio mixing module is used for carrying out audio mixing processing on the first audio data processed by the data processing unit.

9. A method of speech data processing, characterized by being applied to a speech data processing device;

10. The method of claim 9, wherein the first interface unit comprises: a first interface subunit and a second interface subunit;

11. The method of claim 9, wherein the first interface unit comprises a third interface subunit;

12. The method of claim 10 or 11, wherein the first interface unit further comprises a fourth interface subunit; the method further comprises the steps of:

and determining whether to perform audio mixing processing according to the connection rule configuration, and transmitting third voice data through the fourth interface subunit.

13. A terminal device, characterized by comprising a storage unit, a processing unit, a call unit, a second interface unit, and a speech processing program and a control program stored in the storage unit and operable on the processing unit;

14. A method for processing voice data, which is characterized by being applied to terminal equipment, wherein the terminal equipment comprises a call unit, a second interface unit and a voice processing program;

the method comprises at least one of the following operations:

15. A system for speech data processing, characterized by comprising a data processing device according to any of claims 1 to 8, or a terminal device according to claim 13.

16. A computer readable storage medium storing a computer program, which when executed by a processor implements the method of any one of claims 9 to 12, or the method of claim 14.