CN116456033A

CN116456033A - Voice data playing method and device in call, terminal and outbound robot

Info

Publication number: CN116456033A
Application number: CN202310444812.4A
Authority: CN
Inventors: 左嘉琪
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-07-18

Abstract

The application provides a voice data playing method and device in a call, a terminal and an outbound robot, and belongs to the technical field of computers. The method comprises the following steps: establishing communication connection with an outbound robot, wherein the outbound robot is used for playing voice data corresponding to a preset voice operation; displaying a call interface of the external calling robot, wherein the call interface comprises a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, and corresponding control prompt information is displayed at positions corresponding to a plurality of virtual keys of the virtual keyboard; in the communication process with the outbound robot, responding to the selected operation of any virtual key in the virtual keyboard, and sending a voice control instruction corresponding to the virtual key to the outbound robot so that the outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction. According to the scheme, the communication effect of the outbound robot is improved.

Description

Voice data playing method and device in call, terminal and outbound robot

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for playing voice data in a call, a terminal, and an outbound robot.

Background

The outbound robot is an automatic program for replacing manual batch dialing by using the human voice synthesized by AI (Artificial Intelligence ), can naturally communicate with a user in a multi-round dialogue, and is widely applied to the scenes of telephone consultation, product recommendation, business handling and the like.

Currently, the outbound robot can synthesize pre-written speech operation into speech data by using a speech synthesis technology, and play the speech data to the user in a conversation manner so as to realize conversation communication with the user. Because the voice data is generated by utilizing a voice synthesis technology, the voice speed, the volume and other information of the voice data are fixed, and different requirements of different users cannot be met, so that the communication effect of the externalized robot is poor.

Disclosure of Invention

The embodiment of the application provides a voice data playing method, a device, a terminal and an outbound robot in a call, so that the playing mode of voice data can meet the requirements of a listening user, and the call effect of the outbound robot is improved. The technical scheme is as follows:

in one aspect, a method for playing voice data in a call is provided, where the method includes:

establishing communication connection with an outbound robot, wherein the outbound robot is used for playing voice data corresponding to a preset speaking operation;

Displaying a call interface of the outbound robot, wherein the call interface comprises a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, corresponding control prompt information is displayed at positions corresponding to the virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instructions corresponding to the virtual keys;

in the conversation process with the outbound robot, responding to the selected operation of any virtual key in the virtual keyboard, and sending a voice control instruction corresponding to the virtual key to the outbound robot so that the outbound robot adjusts the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

In one possible implementation, the voice control command is a voice adjustment command, a volume adjustment command, a progress adjustment command, a pause play command, or a continue play command.

In one possible implementation manner, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard, different key information is used for indicating different voice adjustment modes, and the key information is used for indicating the outbound robot to adjust the playing of the voice data according to the voice adjustment modes.

On the other hand, a method for playing voice data in a call is provided, and the method comprises the following steps:

establishing call connection with a call terminal, and playing voice data corresponding to a preset call operation to the call terminal;

receiving a voice control instruction sent by the call terminal in the call process of the call terminal, wherein the voice control instruction is triggered based on a virtual keyboard of a call interface of the call terminal, different virtual keys in the virtual keyboard are used for triggering different voice control instructions, corresponding control prompt information is displayed at positions corresponding to a plurality of virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys;

and adjusting the play of the voice data according to the voice adjustment mode indicated by the voice control instruction.

In one possible implementation, the voice control instruction is a voice adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

based on the voice speed adjusting instruction, voice speed adjusting processing is carried out on voice data to be played, and processed voice data are obtained;

And sending the processed voice data to the call terminal.

In one possible implementation manner, the performing, based on the voice adjustment instruction, voice adjustment processing on the voice data to be played to obtain processed voice data includes:

the voice speed adjusting instruction is a voice speed accelerating instruction, the current voice speed is increased by a first step length to obtain a first voice speed, and voice data to be played is subjected to voice speed accelerating processing based on the first voice speed to obtain voice data matched with the first voice speed; or alternatively, the process may be performed,

the voice speed adjusting instruction is a voice speed slowing instruction, the current voice speed is reduced by the first step length to obtain a second voice speed, and voice data to be played is subjected to voice speed slowing processing based on the second voice speed to obtain voice data matched with the second voice speed.

In one possible implementation, the voice control instruction is a volume adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

based on the volume adjustment instruction, performing volume adjustment processing on the voice data to be played to obtain processed voice data;

And sending the processed voice data to the call terminal.

In one possible implementation manner, the performing, based on the volume adjustment instruction, volume adjustment processing on the voice data to be played to obtain processed voice data includes:

the volume adjustment instruction is a volume increase instruction, the current volume is increased by a second step length to obtain a first volume, and volume increase processing is carried out on voice data to be played based on the first volume to obtain voice data matched with the first volume; or alternatively, the process may be performed,

the volume adjustment instruction is a volume reduction instruction, the current volume is reduced by the second step length to obtain a second volume, and the volume reduction processing is performed on the voice data to be played based on the second volume to obtain voice data matched with the second volume.

In one possible implementation, the voice control instruction is a progress adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

determining a first playing progress based on the progress adjustment instruction;

and starting from the first playing progress, continuing to play the voice data.

In one possible implementation manner, the determining, based on the progress adjustment instruction, a first playing progress includes:

the progress adjusting instruction is a fast forward instruction, and the current playing progress of the voice data is increased by a third step length to obtain the first playing progress; or alternatively, the process may be performed,

the progress adjustment instruction is a reversing instruction, and the current playing progress of the voice data is reduced by a third step length to obtain the first playing progress; or alternatively, the process may be performed,

and the progress adjusting instruction is a replay instruction, and the current playing progress of the voice data is cleared to obtain the first playing progress.

In one possible implementation manner, the voice control instruction is a pause instruction, and the adjusting the playing of the voice data according to the voice adjustment manner indicated by the voice control instruction includes:

and pausing playing the voice data until receiving a continuous playing instruction, and continuously playing the voice data.

In a possible implementation manner, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard;

before the playing of the voice data is adjusted according to the voice adjustment mode indicated by the voice control instruction, the method further comprises:

And determining the voice adjustment mode corresponding to the key information carried by the voice control instruction as the voice adjustment mode indicated by the voice control instruction in the corresponding relation between the key information and the voice adjustment mode.

In one possible implementation manner, the preset speaking operation includes a control prompt sub-speaking operation and a service sub-speaking operation, and the playing the voice data corresponding to the preset speaking operation to the call terminal includes:

playing first voice sub-data corresponding to the control prompt sub-phone operation to the call terminal;

and after the first voice sub-data is played, playing second voice sub-data corresponding to the business sub-phone operation to the call terminal.

On the other hand, a voice data playing device in a call is provided, and the device comprises:

the connection establishment module is used for establishing call connection with the outbound robot, and the outbound robot is used for playing voice data corresponding to a preset call operation;

the display module is used for displaying a call interface of the outbound robot, the call interface comprises a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, corresponding control prompt information is displayed at positions corresponding to the virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instructions corresponding to the virtual keys;

And the sending module is used for responding to the selected operation of any virtual key in the virtual keyboard in the conversation process of the outbound robot, and sending a voice control instruction corresponding to the virtual key to the outbound robot so that the outbound robot can adjust the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

the connection establishment module is used for establishing call connection with the call terminal;

the playing module is used for playing the voice data corresponding to the preset voice operation to the call terminal;

The receiving module is used for receiving a voice control instruction sent by the call terminal in the call process of the call terminal, wherein the voice control instruction is triggered based on a virtual keyboard of a call interface of the call terminal, different virtual keys in the virtual keyboard are used for triggering different voice control instructions, corresponding control prompt information is displayed at positions corresponding to the virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys;

the playing module is also used for adjusting the playing of the voice data according to the voice adjusting mode indicated by the voice control instruction.

In one possible implementation, the voice control instruction is a voice adjustment instruction; the playing module comprises:

the processing unit is used for carrying out voice adjustment processing on the voice data to be played based on the voice adjustment instruction to obtain processed voice data;

and the sending unit is used for sending the processed voice data to the call terminal.

In one possible implementation manner, the speech speed adjustment instruction is a speech speed accelerating instruction, and the processing unit is used for increasing the current speech speed by a first step length to obtain a first speech speed, and performing speech speed accelerating processing on the speech data to be played based on the first speech speed to obtain the speech data matched with the first speech speed; or alternatively, the process may be performed,

The voice speed adjusting instruction is a voice speed slowing instruction, and the processing unit is used for reducing the current voice speed by the first step length to obtain a second voice speed, and based on the second voice speed, voice data to be played is subjected to voice speed slowing processing to obtain voice data matched with the second voice speed.

In one possible implementation, the voice control instruction is a volume adjustment instruction; the playing module comprises:

the processing unit is used for carrying out volume adjustment processing on the voice data to be played based on the volume adjustment instruction to obtain processed voice data;

In one possible implementation manner, the volume adjustment instruction is a volume increase instruction, and the processing unit is configured to increase a current volume by a second step to obtain a first volume, and perform volume increase processing on voice data to be played based on the first volume to obtain voice data matched with the first volume; or alternatively, the process may be performed,

the volume adjustment instruction is a volume reduction instruction, and the processing unit is configured to reduce the current volume by the second step length to obtain a second volume, and perform volume reduction processing on the voice data to be played based on the second volume to obtain voice data matched with the second volume.

In one possible implementation, the voice control instruction is a progress adjustment instruction; the playing module is used for determining a first playing progress based on the progress adjusting instruction; and starting from the first playing progress, continuing to play the voice data.

In one possible implementation manner, the progress adjustment instruction is a fast forward instruction, and the playing module is configured to increase the current playing progress of the voice data by a third step length to obtain the first playing progress; or alternatively, the process may be performed,

the progress adjustment instruction is a reversing instruction, and the playing module is used for reducing the current playing progress of the voice data by a third step length to obtain the first playing progress; or alternatively, the process may be performed,

the progress adjusting instruction is a replay instruction, and the playing module is used for resetting the current playing progress of the voice data to obtain the first playing progress.

In one possible implementation manner, the voice control instruction is a pause instruction, and the playing module is configured to pause playing the voice data until receiving a continue playing instruction, and continue playing the voice data.

In a possible implementation manner, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard; the apparatus further comprises:

And the determining module is used for determining the playing mode corresponding to the key information carried by the voice control instruction as the voice adjustment mode indicated by the voice control instruction in the corresponding relation between the key information and the voice adjustment mode.

In a possible implementation manner, the preset phone operation includes a control prompt phone operation and a service phone operation, and the playing module is configured to play first voice sub-data corresponding to the control prompt phone operation to the call terminal; and after the first voice sub-data is played, playing second voice sub-data corresponding to the business sub-phone operation to the call terminal.

In another aspect, a call terminal is provided, where the call terminal includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so as to implement the method for playing voice data in a call according to any one of the foregoing implementation manners.

In another aspect, a computer readable storage medium is provided, where at least one program code is stored, where the at least one program code is loaded and executed by a processor to implement a method for playing voice data in a call according to any of the above implementations.

In another aspect, a computer program product is provided, the computer program product comprising at least one program code loaded and executed by a processor to implement a method for playing voice data in a call as described in any of the above implementations.

The beneficial effects of the technical scheme provided by the embodiment of the application at least comprise:

the embodiment of the application provides a voice data playing method in a call, wherein a virtual keyboard is provided in a call interface of a call terminal in the call process with an outbound robot, the virtual keyboard not only comprises a plurality of virtual keys for triggering different voice control instructions, but also displays corresponding control prompt information at positions corresponding to the plurality of virtual keys, so that a user can accurately trigger corresponding voice control instructions according to the control prompt information to adjust the playing of voice data of the outbound robot, the mode of playing the voice data of the outbound robot can meet the requirements of the user, and the call effect of the outbound robot is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by embodiments of the present application;

fig. 2 is a flowchart of a method for playing voice data in a call according to an embodiment of the present application;

fig. 3 is a flowchart of a method for playing voice data in a call according to an embodiment of the present application;

fig. 4 is a flowchart of a method for playing voice data in a call according to an embodiment of the present application;

FIG. 5 is a key information encoding scheme provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a voice data playing device in a call according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a voice data playing device in a call according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a voice data playing device in a call according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a call terminal according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an outbound robot according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprising," "including," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The voice data playing method in the call provided by the embodiment of the application is executed by the call terminal and the external calling robot, and when the call terminal and the external calling robot are in a call, a user can control the playing mode of the voice data played by the external calling robot through the call terminal, for example, the speech speed of the external calling robot is accelerated, the volume of the external calling robot is increased, and the like, so that the call requirement of the user is met, and the call effect is improved.

Fig. 1 is a schematic diagram of an implementation environment provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment includes a call terminal 101 and an outbound robot 102, where the call terminal 101 and the outbound robot 102 may establish a call connection.

The call terminal 101 may be any terminal with a call function, such as a mobile phone, an intelligent watch, a tablet computer, etc., and the embodiment of the present application does not limit the call terminal 101. The outbound robot 102 is a device for playing voice data corresponding to a preset conversation. In this embodiment, when the call terminal 101 establishes a call connection with the outbound robot 102, the outbound robot 102 may call the call terminal 101, or the call terminal 101 may call the outbound robot 102. That is, the external calling robot 102 may be the party that is actively calling or the party that is being called, which is not limited in the embodiment of the present application.

In some embodiments, the call terminal 101 may establish a call connection with the outbound robot 102, and in the process of the call between the call terminal 101 and the outbound robot, the user may send a voice control instruction to the outbound robot 102 through the call terminal 101, so as to control the outbound robot to speed up, slow down, increase the volume, decrease the volume, fast forward, reverse, pause, etc., so that the call process more meets the needs of the user, and the call effect of the outbound robot is improved.

Fig. 2 is a flowchart of a method for playing voice data in a call according to an embodiment of the present application, where an executing body is taken as a call terminal for example in the embodiment of the present application. Referring to fig. 2, the method includes:

201. the call terminal establishes call connection with an outbound robot, and the outbound robot is used for playing voice data corresponding to a preset call operation.

202. The call terminal displays a call interface of the external call robot, wherein the call interface comprises a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, corresponding control prompt information is displayed at positions corresponding to the virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instructions corresponding to the virtual keys.

203. In the process of communicating with the outbound robot, the communication terminal responds to the selected operation of any virtual key in the virtual keyboard and sends a voice control instruction corresponding to the virtual key to the outbound robot so that the outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction.

According to the voice data playing method in the call, in the call process with the outbound robot, the virtual keyboard is provided in the call interface of the call terminal, the virtual keyboard not only comprises a plurality of virtual keys used for triggering different voice control instructions, but also displays corresponding control prompt information at positions corresponding to the plurality of virtual keys, so that a user can accurately trigger corresponding voice control instructions according to the control prompt information, the playing of voice data of the outbound robot is adjusted, the mode of playing the voice data of the outbound robot can meet the requirements of the user, and the call effect of the outbound robot is improved.

In one possible implementation, the voice control instruction is a speech adjustment instruction, a volume adjustment instruction, a progress adjustment instruction, a pause play instruction, or a continue play instruction.

In one possible implementation manner, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard, different key information is used for indicating different voice adjustment modes, and key information is used for indicating the outbound robot to adjust the playing of voice data according to the voice adjustment modes.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

Fig. 3 is a flowchart of a method for playing voice data in a call according to an embodiment of the present application, where an exemplary illustration is made by taking an execution body as an external caller as an example, and referring to fig. 3, the method includes:

301. the outbound robot establishes call connection with the call terminal and plays voice data corresponding to the preset call operation to the call terminal.

302. In the process of the call-out robot communicating with the communication terminal, a voice control instruction sent by the communication terminal is received, the voice control instruction is triggered based on a virtual keyboard of a communication interface of the communication terminal, different virtual keys in the virtual keyboard are used for triggering different voice control instructions, corresponding control prompt information is displayed at positions corresponding to a plurality of virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys.

303. The outbound robot adjusts the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

According to the voice data playing method in the call, the virtual keyboard is provided in the call interface of the call terminal, the virtual keyboard not only comprises a plurality of virtual keys used for triggering different voice control instructions, but also displays corresponding control prompt information at the positions corresponding to the plurality of virtual keys, so that a user can accurately trigger corresponding voice control instructions according to the control prompt information, the playing of voice data of the outbound robot is adjusted, the mode of playing the voice data of the outbound robot can meet the requirements of the user, and the call effect of the outbound robot is improved.

In one possible implementation, the voice control instruction is a speech adjustment instruction; adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction, comprising:

based on the voice speed adjusting instruction, performing voice speed adjusting processing on voice data to be played to obtain processed voice data;

and sending the processed voice data to the call terminal.

In one possible implementation manner, based on a speech adjustment instruction, performing speech adjustment processing on speech data to be played to obtain processed speech data, where the speech adjustment instruction includes:

the voice speed adjusting instruction is a voice speed slowing instruction, the current voice speed is reduced by a first step length to obtain a second voice speed, and voice data to be played is subjected to voice speed slowing processing based on the second voice speed to obtain voice data matched with the second voice speed.

In one possible implementation, the voice control instruction is a volume adjustment instruction; adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction, comprising:

and sending the processed voice data to the call terminal.

In one possible implementation manner, based on the volume adjustment instruction, performing volume adjustment processing on the voice data to be played to obtain processed voice data, where the processing includes:

The volume adjustment instruction is a volume reduction instruction, the current volume is reduced by a second step length to obtain a second volume, and the volume reduction processing is performed on the voice data to be played based on the second volume to obtain voice data matched with the second volume.

In one possible implementation, the voice control instruction is a progress adjustment instruction; adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction, comprising:

In one possible implementation, determining the first playing progress based on the progress adjustment instruction includes:

the progress adjusting instruction is a fast forward instruction, and the current playing progress of the voice data is increased by a third step length to obtain a first playing progress; or alternatively, the process may be performed,

the progress adjustment instruction is a reversing instruction, and the current playing progress of the voice data is reduced by a third step length to obtain a first playing progress; or alternatively, the process may be performed,

and the progress adjusting instruction is a replay instruction, and the current playing progress of the voice data is cleared to obtain a first playing progress.

In one possible implementation manner, the voice control instruction is a pause instruction, and the playing of the voice data is adjusted according to the voice adjustment mode indicated by the voice control instruction, including:

In one possible implementation, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard;

before the playing of the voice data is adjusted according to the voice adjustment mode indicated by the voice control instruction, the method further comprises the following steps:

In one possible implementation manner, the preset speaking operation includes a control prompt sub-speaking operation and a service sub-speaking operation, and playing voice data corresponding to the preset speaking operation to the call terminal, including:

Fig. 4 is a flowchart of a method for playing voice data in a call, which is provided in an embodiment of the present application, and the embodiment of the present application uses an interaction body as a call terminal and an external caller as an example for performing an exemplary description, and referring to fig. 4, the method includes:

401. The outbound robot establishes call connection with the call terminal and plays voice data corresponding to the preset call operation to the call terminal.

The call terminal can be any terminal with a call function, and the call terminal can be any terminal such as a mobile phone, an intelligent watch, a tablet personal computer and the like. The external calling robot is a device capable of automatically dialing and answering a call, and can be a device capable of automatically dialing and answering a call. After the telephone is switched on, the outbound robot is used for automatically playing voice data corresponding to the preset voice.

In this embodiment of the present application, the outbound robot establishes a call connection with the call terminal, which may be that the call terminal calls the outbound robot to establish the call connection, or that the outbound robot calls the call terminal to establish the call connection, which is not limited in this embodiment of the present application.

The external caller is a robot for playing voice data corresponding to a preset voice, which can be determined based on actual application requirements of the external caller. For example, the outbound robot is a product advertising robot, and the outbound robot is used for playing voice data corresponding to a conversation of an advertising product. As another example, the outbound robot is a member handling robot, and the outbound robot is used to play voice data corresponding to a member management session.

402. The call terminal displays a call interface of the external call robot, wherein the call interface comprises a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, corresponding control prompt information is displayed at positions corresponding to the virtual keys of the virtual keyboard, and the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instructions corresponding to the virtual keys.

In the embodiment of the application, the user sends the voice control instruction on the call terminal by triggering the virtual key in the virtual keyboard. The virtual keyboard can be a digital virtual keyboard provided by a call interface or an alphabetical virtual keyboard provided by the call interface, and the virtual keyboard is not limited in the embodiment of the present application. In some embodiments, the virtual keyboard in the call interface is in a folded state by default, and after receiving the keyboard unfolding instruction, the virtual keyboard is unfolded and displayed. In other embodiments, the virtual keyboard in the telephony interface defaults to the expanded state. The embodiment of the application does not limit the state of the virtual keyboard in the call interface.

A plurality of different voice control commands may be triggered by the virtual keyboard. In some embodiments, the voice control instruction is a voice adjustment instruction, a volume adjustment instruction, a progress adjustment instruction, a pause play instruction, or a continue play instruction, and the voice control instruction may also be a tone adjustment instruction, which is not limited in this embodiment. Wherein, different virtual keys are used for triggering different voice control instructions. For example, when the user clicks the virtual key "1", the call terminal transmits a speed slowing instruction to the outbound robot, and when the user clicks the virtual key "8", the call terminal transmits a volume reducing instruction to the outbound robot.

In order to enable a user to accurately trigger a corresponding voice control instruction, a call terminal displays corresponding control prompt information at positions corresponding to a plurality of virtual keys of a virtual keyboard, wherein the control prompt information is used for prompting a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys. For example, the virtual key "1" is used to trigger the speech speed slowing instruction, and since the speech adjustment mode indicated by the speech speed slowing instruction is speech speed slowing, the speech speed ∈can be displayed on the virtual key "1".

It should be noted that, in the embodiment of the present application, the control prompt information may be an icon, a text, or an icon+text, which is not limited in the embodiment of the present application.

In the embodiment of the present application, the manner of presenting the user is described by way of example only in which the corresponding control prompt information is displayed at the positions corresponding to the plurality of virtual keys of the virtual keyboard. In another embodiment, in order to enable the user of the call terminal to accurately send the voice control instruction, so as to accurately adjust the playing mode of the outbound robot, the outbound robot may play the control prompt information first after the outbound robot and the call terminal establish the call connection.

Optionally, the preset dialect includes a control prompt dialect and a business dialect. The calling-out robot plays the voice data corresponding to the preset voice operation to the call terminal, and the method comprises the following steps: the outbound robot plays the first voice sub-data corresponding to the control prompt sub-phone operation to the call terminal, and plays the second voice sub-data corresponding to the service sub-phone operation to the call terminal after the first voice sub-data is played. The first voice sub-data is the voice sub-data corresponding to the control prompt sub-phone operation, and the second voice sub-data is the voice sub-data corresponding to the service sub-phone operation. The control prompt sub-phone operation is used for prompting a user how to control the outbound robot to adjust the playing mode of the voice data. Business sub-phones are phones that correspond to the functionality of the external caller. For example, the pager robot handles the robot for the member, and the business sub-phone is a member management phone.

In some embodiments, when the outbound robot plays voice data corresponding to a preset voice operation to the call terminal, the voice data is played according to a preset speech speed and a preset voice volume.

403. In the process of communicating with the outbound robot, the communication terminal responds to the selected operation of any virtual key in the virtual keyboard and sends a voice control instruction corresponding to the virtual key to the outbound robot so that the outbound robot adjusts the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

The voice control instruction is used for indicating a voice adjustment mode so that the outbound robot adjusts the playing of voice data according to the voice adjustment mode. For example, during the call between the call terminal and the outbound robot, the user of the call terminal has an urgent need to process for 2 minutes. If the call is interrupted, after the call connection is re-established with the outbound robot, the outbound robot can play the voice data again from the beginning, and the user can repeatedly listen to the first half of the voice data of the outbound robot. However, if the scheme provided by the embodiment of the application is adopted, the user can send a play pause instruction to the outbound robot through the call terminal so as to enable the outbound robot to pause playing of the voice data, and after the user processes the emergency, a play continuation instruction is sent to the outbound robot through the call terminal so as to enable the outbound robot to continue playing of the voice data.

For another example, in the process of the call between the call terminal and the outbound robot, if the user of the call terminal has a sentence that is not clear, if the scheme provided by the embodiment of the application is adopted, the user can send a reversing instruction to the outbound robot through the call terminal, so that the outbound robot plays the previous sentence again.

In the embodiment of the application, the user sends the voice control instruction on the call terminal by triggering the virtual key. Optionally, the voice control instruction carries key information of a triggered virtual key in the virtual keyboard, different key information is used for indicating different voice adjustment modes, and the key information is used for indicating the outbound robot to adjust the playing of voice data according to the voice adjustment modes.

In some embodiments, key information of the activated virtual key may be encoded and transmitted using DTMF (Dual Tone Multi Frequency ) signals. DTMF signals are used to encode digital symbols with analog signals, as shown in fig. 5, the encoding scheme uses a total of 8 analog frequencies to encode 16 symbols, each uniquely determined by a high-tone frequency and a low-tone frequency.

Optionally, the DTMF signal is sent using an RTP (Reliable Transport Protocol ) packet, and the header PT (payload type) of the RTP packet is used to indicate whether the RTP packet is a push-button data packet, or whether the RTP packet is a voice control instruction. For example, when the outbound robot receives an RTP packet, if the packet header pt=126, the RTP packet is a voice control instruction, the content stored in the RTP packet is key information, and the virtual key pressed by the user can be known by querying the table shown in fig. 5.

Optionally, to prevent packet loss, the same key signal may generate multiple RTP packets, and the timestamps in the RTP packets are the same, and when the outbound robot receives the multiple RTP packets, the outbound robot may de-duplicate according to the timestamps in the RTP packets.

404. And the outbound robot receives a voice control instruction sent by the call terminal.

405. The outbound robot adjusts the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

In one possible implementation, the voice control command is triggered based on a virtual keyboard of a call interface of the call terminal, and different virtual keys in the virtual keyboard are used for triggering different voice control commands, wherein the voice control commands carry key information of the triggered virtual keys in the virtual keyboard. Before the outbound robot adjusts the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction, the method further comprises: and determining the voice adjustment mode corresponding to the key information carried by the voice control instruction as the voice adjustment mode indicated by the voice control instruction in the corresponding relation between the key information and the voice adjustment mode.

For example, the correspondence between key information and voice adjustment is as follows:

1-reduce the speech playback rate

3- -increase of speech playback rate

4- - -fast reverse

6- -fast forward

2 increase volume

8-volume reduction

5- -pause

0-replay

When the key information carried by the voice control instruction is 3, the outbound robot can determine that the voice adjustment mode is to improve the voice playing rate. When the key information carried by the voice control instruction is 0, the external calling robot can determine that the voice adjustment mode is replaying. After the outbound robot determines the voice adjustment mode, the voice data can be continuously played according to the determined voice adjustment mode.

In other words, when the key information carried by the voice control instruction is "1", the voice control instruction is a slow down instruction; when the key information carried by the voice control instruction is 3, the voice control instruction is a speech speed accelerating instruction; when the key information carried by the voice control instruction is 4, the voice control instruction is a fast forward instruction; when the key information carried by the voice control instruction is '6', the voice control instruction is a reversing instruction; when the key information carried by the voice control instruction is 2, the voice control instruction is a volume increasing instruction; when the key information carried by the voice control instruction is 8, the voice control instruction is a volume reduction instruction; when the key information carried by the voice control instruction is '5', the voice control instruction is a pause instruction; when the key information carried by the voice control instruction is 0, the voice control instruction is a replay instruction.

Next, the outbound robot according to the embodiment of the present application will exemplarily describe how to continue playing the voice data according to different voice control instructions:

in one possible implementation, the voice control instruction is a voice adjustment instruction, and the voice adjustment instruction is used for adjusting the playing rate of the voice data. The outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction, and comprises the following steps: the outbound robot carries out voice adjustment processing on voice data to be played based on the voice adjustment instruction, obtains processed voice data, and sends the processed voice data to the call terminal.

In some embodiments, after the outbound robot receives the voice speed adjustment command, the outbound robot may adjust according to a certain step size based on the current voice speed. The outbound robot carries out voice adjustment processing on voice data to be played based on the voice adjustment instruction to obtain processed voice data, and the voice data comprises the following steps: the voice speed adjusting instruction is a voice speed accelerating instruction, the calling robot increases the current voice speed by a first step length to obtain a first voice speed, and voice data to be played is subjected to voice speed accelerating processing based on the first voice speed to obtain voice data matched with the first voice speed; or the speed adjusting instruction is a speed slowing instruction, and the current speed is reduced by a first step length to obtain a second speed. And based on the second speech speed, carrying out speech speed slowing processing on the speech data to be played to obtain the speech data matched with the second speech speed.

The first step size may be any step size, which is not limited in the embodiment of the present application. Alternatively, the first step size is empirically set. For example, when the first step length is 0.25 and the voice speed adjustment instruction is a voice speed accelerating instruction and the current voice speed is 1, the current voice speed is increased by the first step length to obtain a first voice speed of 1.25, and the outbound robot continues to play voice data according to the double speed of 1.25.

In some embodiments, the external calling robot is provided with a plurality of speech speeds arranged according to a size sequence, and after receiving the speech speed adjustment instruction, an appropriate speech speed can be selected from the plurality of speech speeds as the adjusted speech speed. Optionally, the voice data to be played is subjected to voice adjustment processing by the external calling robot based on the voice adjustment instruction, so as to obtain processed voice data, which comprises the following steps: the voice speed adjusting instruction is a voice speed accelerating instruction, the outbound robot selects a first voice speed which is adjacent to the current voice speed and is larger than the current voice speed from a plurality of voice speeds, and carries out voice speed accelerating processing on voice data to be played based on the first voice speed to obtain voice data matched with the first voice speed; or the voice speed adjusting instruction is a voice speed slowing instruction, the outbound robot selects a second voice speed which is adjacent to the current voice speed and smaller than the current voice speed from a plurality of voice speeds, and performs voice speed slowing processing on voice data to be played based on the second voice speed to obtain voice data matched with the second voice speed.

In one possible implementation, the voice control instruction is a volume adjustment instruction, and the volume adjustment instruction is used for adjusting the volume of the voice data. The outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction, and comprises the following steps: and based on the volume adjustment instruction, performing volume adjustment processing on the voice data to be played to obtain processed voice data, and sending the processed voice data to the call terminal.

In some embodiments, the outbound robot performs volume adjustment processing on voice data to be played based on a volume adjustment instruction, to obtain processed voice data, including: the volume adjustment instruction is a volume increase instruction, the current volume is increased by a second step length to obtain a first volume, and volume increase processing is carried out on voice data to be played based on the first volume to obtain voice data matched with the first volume; or the volume adjustment instruction is a volume reduction instruction, the current volume is reduced by a second step length to obtain a second volume, and the volume reduction processing is performed on the voice data to be played based on the second volume to obtain the voice data matched with the second volume.

The second step may be any step, which is not limited in the embodiment of the present application. Optionally, the second step size is empirically set.

In some embodiments, the external calling robot is provided with a plurality of volumes arranged in order of magnitude, and after receiving the volume adjustment command, an appropriate volume may be selected from the plurality of volumes as the adjusted volume. Optionally, the outbound robot performs volume adjustment processing on the voice data to be played based on the volume adjustment instruction, so as to obtain processed voice data, including: the volume adjustment instruction is a volume increase instruction, the external calling robot selects a first volume which is adjacent to the current volume and is larger than the current volume from a plurality of volumes, and performs volume increase processing on voice data to be played based on the first volume to obtain voice data matched with the first volume; or the volume adjustment instruction is a volume reduction instruction, the outbound robot selects a second volume which is adjacent to the current volume and smaller than the current volume from a plurality of volumes, and performs volume reduction processing on voice data to be played based on the second volume to obtain voice data matched with the second volume.

In one possible implementation, the voice control instruction is a progress adjustment instruction. The outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction, and comprises the following steps: the outbound robot determines a first playing progress based on the progress adjustment instruction; and continuing to play the voice data from the first playing progress.

In some embodiments, the outbound robot determines a first play progress based on the progress adjustment instruction, comprising: the progress adjusting instruction is a fast forward instruction, and the current playing progress of the voice data is increased by a third step length to obtain a first playing progress; or the progress adjusting instruction is a reversing instruction, and the current playing progress of the voice data is reduced by a third step length to obtain a first playing progress; or, the progress adjusting instruction is a replay instruction, and the current playing progress of the voice data is cleared to obtain a first playing progress.

The third step may be any step, which is not limited in the embodiment of the present application. Optionally, the third step size is empirically determined.

In some embodiments, the outbound robot has a plurality of nodes disposed therein, each of which may be the beginning of a sentence. After receiving the progress adjustment instruction, the outbound robot can adjust the playing progress to the previous node or the next node, so that the beginning of a sentence can be adjusted, a user can know what the sentence is, and user experience is improved. Optionally, the outbound robot determines a first playing progress based on the progress adjustment instruction, including: the progress adjusting instruction is a fast forward instruction, a next node corresponding to the current playing progress is determined from a plurality of nodes, and the playing progress corresponding to the next node is determined to be a first playing progress; or the progress adjustment instruction is a reversing instruction, a last node corresponding to the current playing progress is determined from the plurality of nodes, and the playing progress corresponding to the last node is determined to be a second playing progress.

In one possible implementation, the voice control instruction is a pause instruction. The outbound robot adjusts the playing of voice data according to the voice adjustment mode indicated by the voice control instruction, and comprises the following steps: the outbound robot pauses playing the voice data until receiving a continuous playing instruction, and continuously playing the voice data.

Therefore, when the user is busy, the user can trigger the speech speed accelerating instruction to rapidly carry out the call, and can pause the call first, so that the user can freely adjust the call according to the own requirement, and the call effect is improved.

Fig. 6 is a schematic structural diagram of a voice data playing device in a call according to an embodiment of the present application, and as shown in fig. 6, the device includes:

the connection establishment module 601 is configured to establish a call connection with an outbound robot, where the outbound robot is configured to play voice data corresponding to a preset voice operation;

the display module 602 is configured to display a call interface of the external call robot, where the call interface includes a virtual keyboard, different virtual keys in the virtual keyboard correspond to different voice control instructions, and corresponding control prompt information is displayed at positions corresponding to a plurality of virtual keys in the virtual keyboard, where the control prompt information is used to prompt a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys;

and the sending module 603 is configured to send a voice control instruction corresponding to the virtual key to the outbound robot in response to a selected operation on any virtual key in the virtual keyboard during a call with the outbound robot, so that the outbound robot adjusts playing of voice data according to a voice adjustment mode indicated by the voice control instruction.

It should be noted that: in the voice data playing device in a call provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the voice data playing device in the call provided in the above embodiment and the voice data playing method in the call belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not described herein again.

Fig. 7 is a schematic structural diagram of a voice data playing device in a call according to an embodiment of the present application, where, as shown in fig. 7, the device includes:

a connection establishment module 701, configured to establish a call connection with a call terminal;

The playing module 702 is configured to play voice data corresponding to a preset conversation to the conversation terminal;

the receiving module 703 is configured to receive a voice control instruction sent by the call terminal during a call with the call terminal, where the voice control instruction is triggered based on a virtual keyboard of a call interface of the call terminal, different virtual keys in the virtual keyboard are used to trigger different voice control instructions, and corresponding control prompt information is displayed at positions corresponding to multiple virtual keys in the virtual keyboard, where the control prompt information is used to prompt a voice adjustment mode indicated by the voice control instruction corresponding to the virtual keys;

the playing module 702 is further configured to adjust playing of the voice data according to the voice adjustment mode indicated by the voice control instruction.

As shown in fig. 8, in one possible implementation, the voice control instruction is a speech adjustment instruction; a play module 702, comprising:

the processing unit 7021 is configured to perform a speech adjustment process on the speech data to be played based on the speech adjustment instruction, so as to obtain processed speech data;

a transmitting unit 7022 for transmitting the processed voice data to the call terminal.

In one possible implementation manner, the speech speed adjustment instruction is a speech speed accelerating instruction, and the processing unit 7021 is configured to increase the current speech speed by a first step length to obtain a first speech speed, and perform speech speed accelerating processing on the speech data to be played based on the first speech speed to obtain speech data matched with the first speech speed; or alternatively, the process may be performed,

The speech speed adjusting instruction is a speech speed slowing instruction, and the processing unit 7021 is configured to reduce the current speech speed by a first step length to obtain a second speech speed, and perform speech speed slowing processing on the speech data to be played based on the second speech speed to obtain speech data matched with the second speech speed.

In one possible implementation, the voice control instruction is a volume adjustment instruction; a play module 702, comprising:

a processing unit 7021, configured to perform volume adjustment processing on the voice data to be played based on the volume adjustment instruction, to obtain processed voice data;

In one possible implementation manner, the volume adjustment instruction is a volume increase instruction, and the processing unit 7021 is configured to increase the current volume by a second step to obtain a first volume, and perform volume increase processing on the voice data to be played based on the first volume to obtain voice data matched with the first volume; or alternatively, the process may be performed,

the volume adjustment instruction is a volume reduction instruction, and the processing unit 7021 is configured to reduce the current volume by a second step to obtain a second volume, and perform volume reduction processing on the voice data to be played based on the second volume to obtain voice data matched with the second volume.

In one possible implementation, the voice control instruction is a progress adjustment instruction; a playing module 702, configured to determine a first playing progress based on the progress adjustment instruction; and starting from the first playing progress, continuing to play the voice data.

In one possible implementation, the progress adjustment instruction is a fast forward instruction, and the playing module 702 is configured to increase the current playing progress of the voice data by a third step length to obtain a first playing progress; or alternatively, the process may be performed,

the progress adjustment instruction is a reversing instruction, and the playing module 702 is configured to reduce the current playing progress of the voice data by a third step length to obtain a first playing progress; or alternatively, the process may be performed,

the progress adjustment instruction is a replay instruction, and the playing module 702 is configured to clear the current playing progress of the voice data, so as to obtain a first playing progress.

In one possible implementation, the voice control instruction is a pause instruction, and the playing module 702 is configured to pause playing of the voice data until receiving a continue playing instruction, and continue playing of the voice data.

In one possible implementation manner, the voice control instruction is triggered based on a virtual keyboard of a call interface of the call terminal, different virtual keys in the virtual keyboard are used for triggering different voice control instructions, and the voice control instructions carry key information of the triggered virtual keys in the virtual keyboard; the apparatus further comprises:

And the determining module 704 is configured to determine, as the voice adjustment mode indicated by the voice control instruction, the play mode corresponding to the key information carried by the voice control instruction in the correspondence between the key information and the voice adjustment mode.

In one possible implementation manner, the preset phone operation includes a control prompt sub-phone operation and a service sub-phone operation, and a playing module 702 is configured to play first voice sub-data corresponding to the control prompt sub-phone operation to the call terminal; and after the first voice sub-data is played, playing second voice sub-data corresponding to the business sub-phone operation to the call terminal.

It should be noted that: the voice data playing device in a call provided in the above embodiment only uses the division of the above functional modules to illustrate when playing voice data, and in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the voice data playing device in the call provided in the above embodiment and the voice data playing method in the call belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not described herein again.

Fig. 9 is a block diagram of a call terminal 900 according to an embodiment of the present application. The call terminal 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is configured to store at least one program code for execution by processor 901 to implement the method for playing voice data in a call provided by the method embodiments in the present application.

In some embodiments, the terminal 900 may further optionally include: a peripheral interface 903, and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, a display 905, a camera 906, audio circuitry 907, positioning components 908, and a power source 909.

The peripheral interface 903 may be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The display 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 905 is a touch display, the display 905 also has the ability to capture touch signals at or above the surface of the display 905. The touch signal may be input as a control signal to the processor 901 for processing. At this time, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, providing a front panel of the terminal 900; in other embodiments, the display 905 may be at least two, respectively disposed on different surfaces of the terminal 900 or in a folded design; in still other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display 905 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 905 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The power supply 909 is used to supply power to the various components in the terminal 900. The power supply 909 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 909 includes a rechargeable battery, the rechargeable battery can support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the structure shown in fig. 9 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

Fig. 10 is a schematic structural diagram of an outbound robot according to an embodiment of the present application, where the outbound robot 1000 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1001 and one or more memories 1002, where at least one program code is stored in the memories 1002, and the at least one program code is loaded and executed by the processors 1001 to implement the methods provided in the above-mentioned method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

The server 1000 is configured to perform the steps performed by the server in the method embodiments described above.

The embodiment of the application also provides a computer readable storage medium, wherein at least one program code is stored in the computer readable storage medium, and the at least one program code is loaded and executed by a processor to realize the voice data playing method in the call according to any one of the above implementation modes.

The embodiment of the application also provides a computer program product, which comprises at least one program code, and the at least one program code is loaded and executed by a processor to realize the voice data playing method in the call according to any one of the above implementation manners.

In some embodiments, the computer programs related to the embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices at one site, or alternatively, executing on a plurality of computer devices distributed across multiple sites and interconnected by a communication network, the plurality of computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as being included within the spirit and principles of the present invention.

Claims

1. A voice data playing method in a call is characterized in that the method comprises the following steps:

2. The method of claim 1, wherein the voice control command is a voice adjustment command, a volume adjustment command, a progress adjustment command, a pause play command, or a resume play command.

3. The method according to claim 2 or 1, wherein the voice control instruction carries key information of a triggered virtual key in the virtual keyboard, different key information is used for indicating different voice adjustment modes, and the key information is used for indicating the outbound robot to adjust the playing of the voice data according to the voice adjustment modes.

4. A voice data playing method in a call is characterized in that the method comprises the following steps:

5. The method of claim 4, wherein the voice control instruction is a speech adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

And sending the processed voice data to the call terminal.

6. The method of claim 5, wherein the performing, based on the voice adjustment instruction, voice adjustment processing on the voice data to be played to obtain processed voice data includes:

7. The method of claim 4, wherein the voice control instruction is a volume adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

And sending the processed voice data to the call terminal.

8. The method of claim 7, wherein the performing the volume adjustment on the voice data to be played based on the volume adjustment command to obtain the processed voice data comprises:

9. The method of claim 4, wherein the voice control instruction is a progress adjustment instruction; the step of adjusting the playing of the voice data according to the voice adjustment mode indicated by the voice control instruction comprises the following steps:

10. The method of claim 9, wherein the determining a first progress of playback based on the progress adjustment instruction comprises:

11. The method of claim 4, wherein the voice control command is a pause command, and wherein adjusting the playing of the voice data according to the voice adjustment indicated by the voice control command comprises:

12. The method of claim 4, wherein the voice control command carries key information of a virtual key triggered in the virtual keyboard;

13. The method of claim 4, wherein the pre-call procedure includes a control prompt sub-call procedure and a service sub-call procedure, and the playing the voice data corresponding to the pre-call procedure to the call terminal includes:

14. A voice data playback apparatus in a call, the apparatus comprising:

15. A voice data playback apparatus in a call, the apparatus comprising:

16. A call terminal comprising a processor and a memory, wherein the memory stores at least one program code, and wherein the at least one program code is loaded and executed by the processor to implement the method for playing voice data in a call as claimed in any one of claims 1 to 3.

17. An outbound robot comprising a processor and a memory, wherein the memory has at least one program code stored therein, the at least one program code loaded and executed by the processor to implement the method of playing voice data in a conversation as claimed in any one of claims 4 to 13.

18. A computer readable storage medium, wherein the computer readable storage medium has at least one program code stored therein, the at least one program code is loaded and executed by a processor to implement the method for playing voice data in a call as claimed in any one of claims 1 to 3; or, to implement the method for playing voice data in a call as claimed in any one of claims 4 to 13.