CN114093350A

CN114093350A - Voice data acquisition method, voice call device and computer equipment

Info

Publication number: CN114093350A
Application number: CN202010779592.7A
Authority: CN
Inventors: 王赐烺; 杨卫; 魏雪; 黄耿星; 姜宏维; 于博睿; 杨广东; 周凌林; 田申; 刘行; 赖晶
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-25

Abstract

The application discloses a voice data acquisition method, a voice communication device and computer equipment, and belongs to the technical field of computers. According to the method and the device, the voice data are received through the sandbox environment hooked with the audio processing interface, when the application process of the target application program obtains the voice data through the audio processing interface, the hook function is triggered to intercept the voice data obtaining operation of the target application program, the application process directly obtains the voice data from the sandbox environment, the transmission path from the terminal to the application process of the voice data is shortened, the time delay of obtaining the voice data by the application process is reduced, the time delay of forwarding the voice data by the application process can be reduced, and the user experience of the target application program is improved.

Description

Voice data acquisition method, voice call device and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a voice data acquisition method, a voice communication device, and a computer apparatus.

Background

The cloud game is a game mode based on a cloud computing technology, and a game process runs in a cloud server, so that the requirement of the game on the performance of the terminal can be reduced. In some cloud games, users can communicate with each other through voice, and in such a scenario, a game process needs to quickly acquire voice data sent by the users, and then forward the received voice data to other users.

At present, in order to enable a cloud game process to acquire voice data input by a user, a virtual microphone needs to be simulated in a server of the cloud game, in the process of acquiring the voice data, a user terminal needs to send the voice data to the server, the server sends the voice data to a virtual machine loaded with the virtual microphone, the virtual machine stores the voice data in the virtual microphone, and when the game process reads the voice data, an audio processing interface needs to be called first, a kernel driver needs to be called by the audio processing interface, and then the kernel driver reads the voice data from the virtual microphone.

In the process of acquiring the voice data, the transmission path of the voice data is too long, so that the game process has a large delay in acquiring the voice data, the delay of forwarding the voice data is large, and the experience of a user in performing voice communication in the game process is poor.

Disclosure of Invention

The embodiment of the application provides a voice data acquisition method, a voice communication device and computer equipment, which can shorten a transmission path from a terminal to an application process in a server, reduce the time delay of the application process for acquiring the voice data and improve the user experience of an application program. The technical scheme is as follows:

in one aspect, a method for acquiring voice data is provided, and the method includes:

receiving voice data sent by a terminal through a sandbox environment, wherein the sandbox environment is associated with an audio processing interface of a target application program through a hook function;

responding to a call instruction of a voice data acquisition function of the audio processing interface, and operating the voice data acquisition function, wherein the voice data acquisition function is used for triggering voice acquisition operation;

and responding to the sandbox environment to intercept the voice acquisition operation through the hook function, and sending the voice data to an application process of the target application program through the sandbox environment, wherein the application process is used for forwarding the voice data.

In one aspect, a voice call method is provided, and the method includes:

responding to a recording starting instruction sent by the sandbox environment, and starting a voice call function;

responding to the triggering operation of a voice input control in an operation interface of a target application program, and acquiring voice data;

and sending the voice data to the sandbox environment, and sending the voice data to an application process of the target application program through the sandbox environment, wherein the application process is used for forwarding the voice data.

In one possible implementation manner, the starting the voice call function in response to the recording start instruction sent by the sandbox environment includes:

receiving a recording starting instruction sent by the sandbox environment, wherein the recording starting instruction is generated based on a recording starting operation triggered by a recording starting function in the application process intercepted by the sandbox environment;

and responding to the recording starting instruction, and sending a device starting instruction to the local audio input device, wherein the device starting instruction is used for instructing the audio input device to start recording.

In one possible implementation, after sending the voice data to the sandbox environment and sending the voice data to the application process of the target application program through the sandbox environment, the method further includes:

receiving a recording stopping instruction sent by the sandbox environment, wherein the recording stopping instruction is generated based on a recording stopping operation triggered by a recording stopping function in the application process intercepted by the sandbox environment;

and responding to the recording stopping instruction, and sending a device closing instruction to the local audio input device, wherein the device closing instruction is used for instructing the audio input device to stop recording.

In one aspect, a voice data acquisition apparatus is provided, the apparatus including:

the receiving module is used for receiving the voice data sent by the terminal through a sandbox environment, and the sandbox environment is associated with the audio processing interface of the target application program through a hook function;

the function calling module is used for responding to a calling instruction of a voice data acquisition function of the audio processing interface and operating the voice data acquisition function, and the voice data acquisition function is used for triggering voice acquisition operation;

and the sending module is used for responding to the sandbox environment to intercept the voice acquisition operation through the hook function, sending the voice data to an application process of the target application program through the sandbox environment, and forwarding the voice data by the application process.

In one possible implementation, the function call module is to:

responding to a call instruction of an equipment initialization function of the audio processing interface, and running the equipment initialization function, wherein the equipment initialization function is used for triggering initialization operation of virtual audio input equipment and carries a first sampling parameter corresponding to the target application program;

the device also comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for responding to the sandbox environment to intercept the initialization operation through the hook function, and acquiring the first sampling parameter and a second sampling parameter corresponding to the terminal through the sandbox environment;

the sending module is configured to send a matching result of the first sampling parameter and the second sampling parameter to an application process of the target application program through the sandbox environment.

In one possible implementation, the sending module is configured to:

sending device compatibility information to the application process through the sandbox environment in response to the first sampling parameter being the same as the second sampling parameter;

in response to the first sampling parameter being different from the second sampling parameter, sending device incompatibility information to the application process through the sandbox environment.

In one possible implementation, the apparatus further includes:

the sampling module is used for responding to the difference between the first sampling parameter and the second sampling parameter, and sampling the voice data through the sandbox environment based on the first sampling parameter;

and the storage module is used for storing the sampled voice data through the sandbox environment.

In one possible implementation, the apparatus further includes:

the monitoring module is used for monitoring the sandbox environment;

the function calling module is used for calling the voice acquisition function of the audio processing interface in response to the fact that the sandbox environment is monitored to receive the voice data.

In one possible implementation manner, the function call module is configured to execute a recording stopping function in response to a call instruction to the recording stopping function of the audio processing interface, where the recording stopping function is configured to trigger a closing operation of the virtual audio input device;

the sending module is used for responding to the sandbox environment and intercepting the closing operation through the hook function, and sending a recording stopping instruction to the terminal through the sandbox environment, wherein the recording stopping instruction is used for instructing the terminal to close the local audio input device.

In one possible implementation, the sending module is configured to:

in response to the fact that the number of audio frames included in the voice data is larger than the reference number, the voice data is segmented through the sandbox environment to obtain at least two voice fragments;

and sending the at least two voice fragments to the application process through the sandbox environment.

In one possible implementation, the target application is a cloud game running in a server.

In one aspect, a voice call apparatus is provided, the apparatus including:

the function starting module is used for responding to a recording starting instruction sent by the sandbox environment and starting a voice call function;

the acquisition module is used for responding to the triggering operation of the voice input control in the operation interface of the target application program and acquiring voice data;

and the sending module is used for sending the voice data to the sandbox environment, sending the voice data to the application process of the target application program through the sandbox environment, and forwarding the voice data by the application process.

In one possible implementation, the apparatus further includes:

the display module is used for displaying first prompt information on the operation interface, and the first prompt information is used for prompting a user to perform voice input; and displaying second prompt information on the operation interface, wherein the second prompt information is used for indicating that the user is performing voice input.

In one possible implementation, the sending module is further configured to:

responding to the triggering operation of a voice input control in an operation interface of the target application program, and sending a first message to an application process of the target application program, wherein the first message is used for indicating the application process of the target application program to acquire voice data;

and sending a second message to the application process of the target application program in response to the completion of the voice input, wherein the second message is used for indicating the application process of the target application program to stop voice data acquisition.

In one possible implementation, the function starting module is configured to:

In one possible implementation, the apparatus further includes:

the function closing module is used for receiving a recording stopping instruction sent by the sandbox environment, and the recording stopping instruction is generated based on a recording stopping operation triggered by a recording stopping function in the application process intercepted by the sandbox environment; and responding to the recording stopping instruction, and sending a device closing instruction to the local audio input device, wherein the device closing instruction is used for instructing the audio input device to stop recording.

In one aspect, a computer device is provided and includes one or more processors and one or more memories, where at least one program code is stored in the one or more memories and loaded into and executed by the one or more processors to implement the operations performed by the voice data acquisition method or the operations performed by the voice call method.

In one aspect, a computer-readable storage medium is provided, in which at least one program code is stored, the at least one program code being loaded and executed by a processor to implement operations performed by the voice data acquisition method or operations performed by the voice call method.

In one aspect, a computer program product is provided that includes at least one program code stored in a computer readable storage medium. The processor of the computer device reads the at least one program code from the computer-readable storage medium, and the processor executes the at least one program code, so that the computer device implements the operations performed by the voice data acquisition method or the operations performed by the voice call method.

According to the technical scheme, the voice data are received through the sandbox environment hooked with the audio processing interface, when the application process of the target application program obtains the voice data through the audio processing interface, the hook function is triggered to intercept, the application process directly obtains the voice data from the sandbox environment, the transmission path from the terminal to the application process of the voice data is shortened, the time delay of obtaining the voice data by the application process is reduced, the time delay of forwarding the voice data by the application process is reduced, and the user experience of the target application program is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a voice data acquisition method according to an embodiment of the present application;

fig. 2 is a flowchart of a voice data obtaining method according to an embodiment of the present application;

fig. 3 is a detailed flowchart of a voice data obtaining method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a cloud game interface provided in an embodiment of the present application;

FIG. 5 is a schematic view of another cloud game interface provided by an embodiment of the present application;

fig. 6 is a schematic diagram of a voice data transmission process according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a voice data acquiring apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a voice call apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the following will describe embodiments of the present application in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution.

In order to facilitate understanding of the technical processes of the present application, some terms referred to in the embodiments of the present application are explained below:

cloud gaming (Cloud gaming): also called game on demand (gaming) is an online game technology based on cloud computing technology. Cloud game technology enables light-end devices (thin clients) with relatively limited graphics processing and data computing capabilities to run high-quality games. In a cloud game scene, a game is not operated at a player game terminal, but is operated in a cloud server, the game scene is rendered into a video and audio stream by the cloud server, and the video and audio stream is transmitted to the player game terminal through a network. The player game terminal does not need to have strong graphic operation and data processing capacity, and only needs to have basic streaming media playing capacity and capacity of acquiring player input instructions and sending the instructions to the cloud server.

And (4) sandboxing: a container for simulating a system environment. The sandbox environment is deployed on the physical machine to provide isolation of resources (files, registries, processes, peripherals, sound cards, etc.), and in the embodiments of the present application, the sandbox may be used to simulate virtual audio input devices.

hook (hook): the hook technology is a technology for changing the execution result of a program, and when a computer device handles a specific system event and the system event processes a message, the hook program for the event receives a notification from the system, and the program can respond to the system event at the first time. The hook technology can be applied to a Windows operating system.

Fig. 1 is a schematic diagram of an implementation environment of a voice data acquisition method according to an embodiment of the present application, and referring to fig. 1, the implementation environment includes a terminal 101 and a server 102.

The terminal 101 is a user-side device, and a client, which may be a target application program supporting a cloud game, is installed and operated in the terminal 101. The terminal 101 may be a smart phone, a tablet computer, a notebook computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), a laptop computer, a desktop computer, or the like, which is not limited in the embodiments of the present application.

The server 102 is configured to provide a background service for a client executed by the terminal 101, for example, to provide support for running a cloud game. The server 102 has a sandbox environment disposed therein, which may be used to simulate a virtual audio input device to provide audio input services for cloud games. The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform.

The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application. It should be noted that the cloud game and the sandbox environment may run on the same server or different servers, which is not limited in this embodiment of the present application. In the embodiment of the present application, a cloud game and a sandbox environment are described as an example in which they run on the same server.

Those skilled in the art will appreciate that the number of the terminals may be greater or smaller, for example, the number of the terminals may be only one, or may be several tens or hundreds, or greater. The embodiment of the present application does not limit the number of terminals and the device type in the implementation environment.

The embodiment of the application provides a voice data acquisition method, which can be applied to various types of application programs and combined with various application scenes. For example, the technical scheme provided by the embodiment of the application is applied, the voice data sent by the user is received and stored through the sandbox environment, and then the sandbox environment directly sends the voice data to the cloud game process, so that the transmission path of the voice data from the terminal to the cloud game process can be shortened, the time delay of the cloud game process for obtaining the voice data is reduced, and the time delay of voice communication in the cloud game is reduced.

Fig. 2 is a flowchart of a voice data acquisition method according to an embodiment of the present application. The method may be applied to the server, in the embodiment of the present application, the server is used as an execution subject to describe the voice data obtaining method, referring to fig. 2, and the embodiment may specifically include the following steps:

201. and the server receives the voice data sent by the terminal through a sandbox environment, and the sandbox environment is associated with the audio processing interface of the target application program through a hook function.

Wherein the target application runs in the server, and the target application can be any type of application. In the embodiment of the present application, the target application is a cloud game as an example, and when the cloud game runs, a virtual scene of the game is rendered in a server, and a rendered game screen and other multimedia data are pushed to a terminal in real time. The terminal can display a webpage with the cloud game or operate a client with the cloud game, and the terminal can play the received multimedia data on the webpage or a client interface. In the running process of the cloud game, a user operates based on multimedia data played by a webpage or a client interface, a terminal responds to the user operation to generate a user operation instruction, the user operation instruction is sent to a cloud game process in a server, and the cloud game process processes data processing based on the user operation instruction. For example, the terminal detects the movement operation of the virtual object by the user, generates a virtual object movement instruction, sends the virtual object movement instruction to the cloud game process in the server, the cloud game process moves the virtual object in response to the virtual object movement instruction, renders the game screen in real time during the movement of the virtual object, and displays the terminal to which the game screen is sent.

In the embodiment of the present application, the hooking function is used to hook, i.e., apply hook technology to associate, the sandbox environment and the Audio processing interface (Core Audio API) of the cloud game. The sandboxed environment may be used to simulate a virtual audio input device, such as a virtual microphone or the like. When the application process of the target application program needs to acquire the voice data of the user, simulating virtual audio input equipment in the server through the sandbox environment, and inputting the voice data to the application process of the target application program.

In the embodiment of the application, after a user inputs voice data into a terminal through audio input equipment such as a microphone, the terminal can directly send the voice data to a sandbox environment, and the voice data is stored by the sandbox environment.

202. The server responds to a call instruction of a voice data acquisition function of the audio processing interface and operates the voice data acquisition function, and the voice data acquisition function is used for triggering voice acquisition operation.

The audio processing interface may include a plurality of external devices for accessing to perform audio data processing, and the voice data obtaining function may be configured to access to the virtual audio input device and read voice data from the virtual audio input device.

203. And the server responds to the sandbox environment to intercept the voice acquisition operation through the hook function, and sends the voice data to an application process of the target application program through the sandbox environment, wherein the application process is used for forwarding the voice data.

In the embodiment of the application, after the sandbox environment is hooked with the audio processing interface, the function calling condition of the audio processing interface can be acquired, and when the voice data acquisition function is called to trigger the voice data acquisition operation, the voice data acquisition operation can be intercepted, so that the application process can directly acquire the voice data from the sandbox environment.

According to the technical scheme, the voice data are received through the sandbox environment hooked with the audio processing interface, when the application process of the target application program has a voice data acquisition requirement, namely the audio processing interface is called to execute the voice acquisition operation, the hook function is triggered to intercept the voice data acquisition operation, the application process directly acquires the voice data from the sandbox environment, the transmission path from the terminal to the application process of the voice data is shortened, the time delay of the application process for acquiring the voice data is reduced, the time delay of the application process for forwarding the voice data is reduced, the time delay of voice call in the application program is reduced, and the user experience of the application program is improved.

The foregoing embodiment is a brief introduction to an implementation manner of the present application, fig. 3 is a specific flowchart of a voice data obtaining method provided in the embodiment of the present application, which can be applied in the implementation environment shown in fig. 1, and the voice data obtaining method is specifically described below with reference to fig. 1 and fig. 3.

301. The terminal sends the device access information to the application process of the target application program running in the server.

Wherein the target application is a cloud game running in a server.

In the embodiment of the application, when the terminal detects that the audio input device such as a microphone is accessed, the terminal sends the device access information to the application process of the target application program to prompt that the user of the target application program has the requirement for voice input.

302. And responding to the equipment access information by an application process in the server to perform equipment activation and equipment initialization.

In this embodiment of the application, the sandbox environment may hook the audio processing interface through a hook function, the sandbox environment may obtain a function call condition of the audio processing interface, and when a function related to the audio input device is called, that is, a function target of the called function is the audio input device, for example, a device activation function (active), a device initialization function (initialization), and the like, the sandbox environment may intercept operations triggered by the functions.

In one possible implementation manner, in response to the device access information, an application process of a target application program run by the server calls a device activation function of the audio processing interface to activate the virtual audio input device, in response to the device activation function triggering a device activation operation, a sandbox environment in the server intercepts the device activation operation, and sends an activation result to the application process, namely, the sandbox environment sends information that the device activation is successful to the application process.

In one possible implementation manner, after the device activation is successful, the application process of the target application program may call a device initialization function of the audio processing interface, and the server runs the device initialization function in response to a call instruction for the device initialization function of the audio processing interface. The device initialization function is used for triggering initialization operation of the virtual audio input device, and the device initialization function carries a first sampling parameter corresponding to the target application program. In this embodiment of the present application, when detecting that the device initialization function is called, the sandbox environment may perform interception, and send an initialization result to the application process by the sandbox environment in response to the sandbox environment in the server intercepting the device activation operation through the hook function. The initialization result may indicate whether the terminal is compatible with the target application program, and the initialization result may be determined based on the first sampling parameter corresponding to the target application program and the second sampling parameter corresponding to the terminal. For example, the server responds to the sandbox environment to intercept the initialization operation through the hook function, and the sandbox environment acquires the first sampling parameter and a second sampling parameter corresponding to the terminal; and sending the matching result of the first sampling parameter and the second sampling parameter to the application process of the target application program by the sandbox environment. Specifically, in response to the first sampling parameter being the same as the second sampling parameter, sending, by the sandbox environment, device compatibility information to the application process; sending, by the sandbox environment, device incompatibility information to the application process in response to the first sampling parameter being different from the second sampling parameter. In a possible implementation manner, if the first sampling parameter and the second sampling parameter are different, the sandbox environment may further send a prompt message indicating that voice input fails to the terminal, so as to prompt the user that the voice data currently collected by the terminal cannot be identified by the application process of the target application program. In the embodiment of the application, before the voice data is acquired from the sandbox environment, advanced device compatibility judgment is performed, and the situation that the application process acquires the voice data which cannot be identified and played due to incompatibility of sampling parameters between devices is avoided.

In one possible implementation, the application process of the target application may also query a ServiceManager (service management process) for audio input services to determine whether voice input is currently possible. For example, an application process in a server calls a service acquisition function (GetService) of an audio processing interface, an operation of acquiring an audio input service is triggered in response to the service acquisition function, a sandbox environment in the server intercepts the operation of acquiring the audio input service, the sandbox environment sends a service acquisition result to the application process, and when the application process receives information that service acquisition success sent by the sandbox environment is received, it is determined that voice input can be currently performed.

303. And the application process of the target application program in the server responds to the activation of the equipment and the successful initialization of the equipment, calls a recording starting function of the audio processing interface, responds to the recording starting function to trigger the recording starting operation, intercepts the recording starting operation by the sandbox environment, and sends a recording starting instruction to the terminal by the sandbox environment.

Taking the target application program as a cloud game as an example, in a possible implementation manner, in the running process of the cloud game, users can perform voice communication in real time, and in such a scenario, the voice call function can be started when the cloud game starts to run, that is, the Start recording function (Start) can be immediately called after the cloud game starts to run and the device is successfully activated and initialized. In a possible implementation manner, non-real-time voice communication may also be performed between users, for example, when a user has a need for voice communication, a voice message is sent again, in this scenario, the application process may call the start recording function when it is detected that the user triggers the voice input control, and of course, the start recording function may also be called when the cloud game starts to run, which is not limited in the embodiment of the present application.

In a possible implementation manner, when detecting that an application process calls a recording starting function, the sandbox environment intercepts the call, generates a recording starting instruction based on the intercepted recording starting operation, and sends the recording starting instruction to the terminal to instruct the terminal to start an audio input device to capture voice data input by a user.

304. The terminal responds to a recording starting instruction sent by the sandbox environment, starts a voice call function, acquires voice data input by a user, sends the voice data to the sandbox environment, and sends the voice data to an application process by the sandbox environment.

In a possible implementation manner, the terminal responds to a recording starting instruction and sends a device starting instruction to the local audio input device, the device starting instruction is used for indicating the audio input device to start recording, the terminal collects voice data input by a user through the voice input device and sends the voice data to the sandbox environment, the voice data are stored by the sandbox environment, and the application process of the target application program directly obtains the voice data from the sandbox environment.

In the embodiment of the present application, two scenarios including real-time voice input and non-real-time voice input are included, and a process of transmitting voice data from a terminal to an application process is described below based on the two scenarios. In a scene of real-time voice input, after the terminal starts a voice call function, voice data of a user are collected in real time, and the voice data are sent to a sandbox environment in real time. In one possible implementation, the application process monitors the sandbox environment, and calls a voice obtaining function of the audio processing interface to obtain the voice data in response to monitoring that the sandbox environment receives the voice data. Optionally, the application process of the target application program may also perform voice data reading on the sandbox environment based on the reference period, that is, call the voice obtaining function based on the reference period to obtain the voice data. The reference period may be set by a developer, and is not limited in this embodiment of the application. It should be noted that, in the embodiment of the present application, there is no limitation on which method is specifically adopted to obtain the voice data from the sandbox environment.

In a non-real-time voice input scene, when a terminal detects that a user inputs voice, a first message is sent to an application process, and the first message is used for prompting the application process of a target application program to acquire voice data. In a possible implementation manner, the terminal displays an operation interface of the target application program, the operation interface displays a voice input control, and in response to detecting that the user triggers the voice input control, the terminal sends the first message to the application process of the target application program. Of course, the terminal may also be triggered to send the first message to the application process in other manners, for example, the triggering operation of the user on the target shortcut key is detected, the first target gesture of the user is detected, and the like, which is not limited in this embodiment of the application. The target shortcut key and the first target gesture may be set by a developer, which is not limited in the embodiment of the present application. In one possible implementation manner, when the user does not perform voice input, for example, the voice input control is not triggered, the terminal displays first prompt information on the operation interface, where the first prompt information is used for prompting the user to perform voice input; and in the process of voice input of the user, the terminal displays second prompt information on the operation interface, wherein the second prompt information is used for indicating that the user is carrying out voice input. Taking the target application program as an example of a cloud game, fig. 4 is a schematic diagram of a cloud game interface provided in an embodiment of the present application, as shown in fig. 4 (a), when a user does not perform voice input, a first region 401 in an operation interface of the cloud game may display first prompt information "hold a target shortcut key to speak", where the first region is any region in the operation interface, and in the embodiment of the present application, the first region is taken as an example of a lower left region of the operation interface. As shown in fig. 4 (b), when the user inputs a voice, the first area 401 may display a second prompt message "character a is speaking", which is a virtual character currently used by the user. Fig. 5 is a schematic view of another cloud game interface provided in an embodiment of the present application, as shown in fig. 5 (a), when a user does not perform voice input and the terminal does not turn on the audio input device, a voice input icon 501 in an operation interface of a cloud game is displayed in a first state, for example, in white; as shown in fig. 5 (b), when the user performs voice input, the voice input icon 501 is displayed in the second state, for example, in yellow.

In the embodiment of the application, the application process of the target application program calls the voice obtaining function of the audio processing interface to obtain the voice data in response to the first message sent by the terminal. In one possible implementation manner, in response to a call instruction of a voice data acquisition function of the audio processing interface, the server executes the voice data acquisition function, wherein the voice data acquisition function is used for triggering a voice acquisition operation. When the sandbox environment detects that the voice data acquisition function is called, the voice data acquisition function can be intercepted, the sandbox environment responds to the fact that the hook function intercepts the voice acquisition operation, the voice data are sent to the application process of the target application program through the sandbox environment, and then the application process forwards the voice data to other users. For example, when the target application is a cloud game, the other users are users participating in the same game.

305. And the application process of the target application program in the server responds to the completion of the acquisition of the voice data, calls a recording stopping function of the audio processing interface, responds to the recording stopping function to trigger the recording stopping operation, intercepts the recording stopping operation in the sandbox environment, and sends a recording stopping instruction to the terminal by the sandbox environment.

In the embodiment of the application, the terminal responds to the recording stopping instruction and sends a device closing instruction to the local audio input device, wherein the device closing instruction is used for instructing the audio input device to stop recording. The following describes the above step 305 based on two scenarios, real-time speech input and non-real-time speech input. In one possible implementation, in a scenario of real-time voice input, when the target application finishes running, the application process of the target application may determine that the user no longer needs to perform voice input, and may call a Stop recording function (Stop). In one possible implementation manner, in a non-real-time voice input scenario, the terminal sends a second message to the application process in response to completion of the voice input, the second prompt message is used for prompting the application process that the voice input is completed, and the application process of the target application program calls the stop recording function based on the second message. For example, when the terminal detects that the user releases a voice input control of the operation interface, or detects that a target shortcut key is released, or detects a second target gesture of the user, the second message may be sent to the application process. It should be noted that, in the embodiment of the present application, it is not limited to specifically which manner is used to trigger the terminal to send the second message to the application process. The target shortcut key and the second target gesture may be set by a developer, which is not limited in the embodiment of the present application.

In one possible implementation environment, in response to a call instruction to a stop recording function of the audio processing interface, the server runs the stop recording function, the stop recording function being used to trigger a shutdown operation of the virtual audio input device. And responding to the sandbox environment intercepting the closing operation through the hook function, and sending a recording stopping instruction to the terminal through the sandbox environment by the server so as to instruct the terminal to close the local audio input equipment.

Fig. 6 is a schematic diagram of a voice data transmission process provided in an embodiment of the present application, and the target application is a cloud game, and the voice data acquisition method is described with reference to fig. 4. Referring to fig. 6, the terminal inputs voice data collected by an audio input device such as a microphone 601 into a cloud game client 602, and the cloud game client sends the voice data to a sandbox environment 603. When the cloud game process acquires the voice data, the cloud game process calls an Audio API604(Application Programming Interface) and then calls a Core Audio API605, acquires the voice data in the sandbox environment 603 through at least one function of the Core Audio API, and returns the voice data to the Audio API604 and then to the cloud game process. In the embodiment of the application, based on a sandbox environment, voice input is directly simulated in the cloud game server, so that system calling is reduced when the cloud game process acquires voice data, caching steps of the voice data are reduced, the time delay of the voice data from a client to the cloud game process is effectively reduced, the time delay of voice call in the cloud game is reduced, and the user experience of the cloud game is improved.

In the above process, before obtaining the voice data from the sandbox environment, the application process needs to call the device initialization function to determine whether the application process is compatible with the sampling parameter of the terminal, if so, the application process may directly obtain the voice data from the sandbox environment, and if not, the application process may resample the voice data from the sandbox environment to obtain the voice data that can be identified and played by the application process of the target application program. That is, in response to the first sampling parameter being different from the second sampling parameter, the voice data is sampled through the sandbox environment based on the first sampling parameter, and the sampled voice data is stored through the sandbox environment. And when the application process of the target application program calls the voice data acquisition function, the sandbox environment intercepts the voice data acquisition function and sends the re-sampled voice data to the application process.

In the embodiment of the application, a resampling mechanism is set in the sandbox environment, when the sampling parameters of the terminal are incompatible with the adoption parameters of the target application program, data are adopted again without the need of a user to replace the currently used terminal equipment, so that failure in obtaining voice data caused by the reason of the terminal equipment can be effectively avoided, and the user is ensured to have good user experience.

In the embodiment of the application, the sandbox environment is set in the cloud game server to shorten a transmission path from the terminal to the application process of the voice data, so that the time delay of the application process for acquiring the voice data is reduced, and the time delay of forwarding the voice data in a voice communication scene is reduced. In a possible implementation manner, when the amount of the voice data stored in the sandbox environment is large, the sandbox environment may segment the voice data first, and then send the obtained voice fragments to the application process of the target application program respectively. That is, in response to that the number of audio frames included in the speech data is greater than the reference number, the speech data is segmented through the sandbox environment to obtain at least two speech fragments, and the at least two speech fragments are sent to the application process through the sandbox environment. The reference number may be set by a developer, and is not limited in the embodiments of the present application. In a possible implementation manner, the above process of segmenting the voice data may also be executed by an application process of a target application program, that is, after the application process sends the voice data in the sandbox environment, if the data volume of the voice data is large, the application process may segment the voice data to obtain a voice fragment, and then forward the voice fragment to other users. In the embodiment of the application, the transmission efficiency of the voice data can be improved by segmenting the voice data process.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 7 is a schematic structural diagram of a speech data acquisition apparatus according to an embodiment of the present application, and referring to fig. 7, the apparatus includes:

a receiving module 701, configured to receive voice data sent by a terminal through a sandbox environment, where the sandbox environment is associated with an audio processing interface of a target application through a hook function;

a function calling module 702, configured to run a voice data obtaining function of the audio processing interface in response to a call instruction of the voice data obtaining function, where the voice data obtaining function is used to trigger a voice obtaining operation;

a sending module 703, configured to, in response to the sandbox environment intercepting the voice obtaining operation through the hook function, send the voice data to an application process of the target application program through the sandbox environment, where the application process is configured to forward the voice data.

In one possible implementation, the function call module 702 is configured to:

the sending module 703 is configured to send the matching result of the first sampling parameter and the second sampling parameter to the application process of the target application program through the sandbox environment.

In one possible implementation, the sending module 703 is configured to:

In one possible implementation, the apparatus further includes:

the monitoring module is used for monitoring the sandbox environment;

the function calling module 702 is configured to, in response to monitoring that the sandbox environment receives the voice data, call a voice obtaining function of the audio processing interface.

In one possible implementation manner, the function calling module 702 is configured to execute, in response to a call instruction to a stop recording function of the audio processing interface, the stop recording function, where the stop recording function is configured to trigger a closing operation on a virtual audio input device;

the sending module 703 is configured to send, in response to the sandbox environment intercepting the closing operation through the hook function, a recording stop instruction to the terminal through the sandbox environment, where the recording stop instruction is used to instruct the terminal to close a local audio input device.

In one possible implementation, the sending module 703 is configured to:

The device provided by the embodiment of the application receives the voice data through the sandbox environment hooked with the audio processing interface, and when the application process of the target application program has a voice data acquisition requirement, namely the audio processing interface is called to execute the voice acquisition operation, the hook function is triggered to intercept the voice data acquisition operation, so that the application process directly acquires the voice data from the sandbox environment, the transmission path from the terminal to the application process of the voice data is shortened, the time delay of acquiring the voice data by the application process is reduced, the time delay of forwarding the voice data by the application process is reduced, and the user experience of the application program is improved.

It should be noted that: in the voice data acquiring apparatus provided in the above embodiment, when acquiring voice data, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the voice data acquisition apparatus and the voice data acquisition method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 8 is a schematic structural diagram of a voice call apparatus according to an embodiment of the present application, and referring to fig. 8, the apparatus includes:

a function starting module 801, configured to start a voice call function in response to a recording start instruction sent by a sandbox environment;

an obtaining module 802, configured to obtain voice data in response to a trigger operation on a voice input control in an operation interface of a target application;

a sending module 803, configured to send the voice data to the sandbox environment, and send the voice data to an application process of the target application program through the sandbox environment, where the application process is configured to forward the voice data.

In one possible implementation, the apparatus further includes:

In one possible implementation, the sending module 803 is further configured to:

In one possible implementation, the function starting module 801 is configured to:

In one possible implementation, the apparatus further includes:

According to the device provided by the embodiment of the application, after the voice call function is started, the voice data input by the user are directly sent to the sandbox environment in the server, the application process directly obtains the voice data from the sandbox environment, the transmission path from the terminal to the application process of the voice data is shortened, the delay of the application process for obtaining the voice data is reduced, the delay of the voice call in the application program is accordingly reduced, and the user experience of the application program is improved.

It should be noted that: in the voice communication device provided in the above embodiment, only the division of the above functional modules is used for illustration when performing voice communication, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the voice call apparatus and the voice call method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

The computer device provided by the above technical solution can be implemented as a terminal or a server, for example, fig. 9 is a schematic structural diagram of a terminal provided in the embodiment of the present application. The terminal 900 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 900 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

In general, terminal 900 includes: one or more processors 901 and one or more memories 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, a 9-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used for storing at least one program code, which is used for being executed by the processor 901 to implement the voice data acquisition method and the voice call method provided by the method embodiments in the present application.

In some embodiments, terminal 900 can also optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, providing the front panel of the terminal 900; in other embodiments, the number of the display panels 905 may be at least two, and each of the display panels is disposed on a different surface of the terminal 900 or is in a foldable design; in some embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the terminal 900. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of the terminal 900. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the terminal 900 for navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 909 is used to provide power to the various components in terminal 900. The power source 909 may be alternating current, direct current, disposable or rechargeable. When power source 909 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 900 can also include one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 900. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the display screen 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the terminal 900, and the gyro sensor 912 may cooperate with the acceleration sensor 911 to acquire a 3D motion of the user on the terminal 900. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side bezel of the terminal 900 and/or underneath the display 905. When the pressure sensor 913 is disposed on the side frame of the terminal 900, the user's holding signal of the terminal 900 may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the display screen 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the terminal 900. When a physical key or vendor Logo is provided on the terminal 900, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the display screen 905 is increased; when the ambient light intensity is low, the display brightness of the display screen 905 is reduced. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

Proximity sensor 916, also known as a distance sensor, is typically disposed on the front panel of terminal 900. The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the terminal 900 gradually decreases, the processor 901 controls the display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the terminal 900 gradually becomes larger, the display 905 is controlled by the processor 901 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal 900, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1001 and one or more memories 1002, where the one or more memories 1002 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 1001 to implement the methods provided by the foregoing method embodiments. Of course, the server 1000 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 1000 may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, including at least one program code, which is executable by a processor to perform the voice data acquisition method or the voice call method in the above-described embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided that includes at least one program code stored in a computer readable storage medium. The processor of the computer device reads the at least one program code from the computer-readable storage medium, and the processor executes the at least one program code, so that the computer device implements the operations performed by the voice data acquisition method or the operations performed by the voice call method.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or implemented by at least one program code associated with hardware, where the program code is stored in a computer readable storage medium, such as a read only memory, a magnetic or optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for acquiring voice data, the method comprising:

2. The method of claim 1, wherein before executing the voice data capture function in response to a call instruction to the voice data capture function of the audio processing interface, the method further comprises:

responding to the sandbox environment to intercept the initialization operation through the hook function, and acquiring the first sampling parameter and a second sampling parameter corresponding to the terminal through the sandbox environment;

and sending the matching result of the first sampling parameter and the second sampling parameter to the application process of the target application program through the sandbox environment.

3. The method of claim 2, wherein sending the matching result of the first sampling parameter and the second sampling parameter to the application process of the target application program through the sandbox environment comprises:

sending, by the sandbox environment, device incompatibility information to the application process in response to the first sampling parameter being different from the second sampling parameter.

4. The method according to claim 2, wherein in response to the sandbox environment intercepting the initialization operation through the hook function, after the first sampling parameter and the corresponding second sampling parameter of the terminal are obtained through the sandbox environment, the method further comprises:

in response to the first sampling parameter being different from the second sampling parameter, sampling, by the sandbox environment, the voice data based on the first sampling parameter;

and storing the sampled voice data through the sandbox environment.

5. The method of claim 1, wherein before executing the voice data capture function in response to a call instruction to the voice data capture function of the audio processing interface, the method further comprises:

monitoring the sandbox environment;

and calling a voice acquisition function of the audio processing interface in response to monitoring that the sandbox environment receives the voice data.

6. The method according to any one of claims 1 to 5, wherein in response to the sandbox environment intercepting the voice acquisition operation through the hook function, after sending the voice data to the application process of the target application program through the sandbox environment, the method further comprises:

responding to a call instruction of a recording stopping function of the audio processing interface, and running the recording stopping function, wherein the recording stopping function is used for triggering the closing operation of the virtual audio input equipment;

and responding to the sandbox environment to intercept the closing operation through the hook function, and sending a recording stopping instruction to the terminal through the sandbox environment, wherein the recording stopping instruction is used for indicating the terminal to close local audio input equipment.

7. The method according to any one of claims 1 to 5, wherein the sending the voice data to the application process of the target application program through the sandbox environment comprises:

8. The method according to any one of claims 1 to 5, wherein the target application is a cloud game running in a server.

9. A voice call method, comprising:

10. The method of claim 9, wherein before the voice data is obtained in response to the triggering operation of the voice input control in the operation interface of the target application program, the method further comprises:

displaying first prompt information on the operation interface, wherein the first prompt information is used for prompting a user to perform voice input;

after responding to the triggering operation of the voice input control in the operation interface of the target application program, the method further comprises the following steps:

and displaying second prompt information on the operation interface, wherein the second prompt information is used for indicating that the user is carrying out voice input.

11. The method of claim 9, wherein after the voice call function is turned on in response to a start recording instruction sent by the sandbox environment, the method further comprises:

responding to triggering operation of a voice input control in an operation interface of the target application program, and sending a first message to an application process of the target application program, wherein the first message is used for indicating the application process of the target application program to acquire voice data;

and responding to the completion of the voice input, and sending a second message to the application process of the target application program, wherein the second message is used for indicating the application process of the target application program to stop voice data acquisition.

12. The method of claim 9, wherein the opening the voice call function in response to the start recording command sent by the sandbox environment comprises:

and responding to the recording starting instruction, and sending a device starting instruction to local audio input equipment, wherein the device starting instruction is used for indicating the audio input equipment to start recording.

13. The method of claim 9, wherein after sending the voice data to the sandbox environment and sending the voice data to the application process of the target application through the sandbox environment, the method further comprises:

and responding to the recording stopping instruction, and sending an equipment closing instruction to local audio input equipment, wherein the equipment closing instruction is used for indicating the audio input equipment to stop recording.

14. A voice data acquisition apparatus, characterized in that the apparatus comprises:

the receiving module is used for receiving the voice data sent by the terminal through a sandbox environment, and the sandbox environment is associated with an audio processing interface of a target application program through a hook function;

and the sending module is used for responding to the sandbox environment to intercept the voice acquisition operation through the hook function, and sending the voice data to an application process of the target application program through the sandbox environment, wherein the application process is used for forwarding the voice data.

15. A voice call apparatus, comprising:

and the sending module is used for sending the voice data to the sandbox environment, sending the voice data to an application process of the target application program through the sandbox environment, and forwarding the voice data by the application process.

16. A computer device comprising one or more processors and one or more memories having stored therein at least one program code, the at least one program code being loaded into and executed by the one or more processors to perform operations performed by the voice data acquisition method of any one of claims 1 to 8 or the voice call method of claim 9 or 13.