WO2023098332A1 - 一种音频处理方法、装置、设备、介质及程序产品 - Google Patents

一种音频处理方法、装置、设备、介质及程序产品 Download PDF

Info

Publication number
WO2023098332A1
WO2023098332A1 PCT/CN2022/126681 CN2022126681W WO2023098332A1 WO 2023098332 A1 WO2023098332 A1 WO 2023098332A1 CN 2022126681 W CN2022126681 W CN 2022126681W WO 2023098332 A1 WO2023098332 A1 WO 2023098332A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual object
audio
information
game
target
Prior art date
Application number
PCT/CN2022/126681
Other languages
English (en)
French (fr)
Inventor
曹木勇
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023098332A1 publication Critical patent/WO2023098332A1/zh
Priority to US18/223,711 priority Critical patent/US20230364513A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/20Input arrangements for video game devices
    • A63F13/21Input arrangements for video game devices characterised by their sensors, purposes or types
    • A63F13/215Input arrangements for video game devices characterised by their sensors, purposes or types comprising means for detecting acoustic signals, e.g. using a microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/40Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
    • A63F13/42Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
    • A63F13/424Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving acoustic input signals, e.g. by using the results of pitch or rhythm extraction or voice recognition
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6063Methods for processing data by generating or executing the game program for sound processing

Definitions

  • the present application relates to the field of computer technology, specifically to the field of artificial intelligence, and in particular to an audio processing method, an audio processing device, an audio processing device, a computer-readable storage medium, and a computer program product.
  • the game scene supports the collection of the game user's voice, obtains the game user's voice and audio, and transmits the voice audio to other game users in the game, so as to realize voice communication between multiple game users in the game scene.
  • An embodiment of the present application provides an audio processing method, the method comprising:
  • An embodiment of the present application provides an audio processing method, the method comprising:
  • the target audio of the first game user and the spatial position information of the first virtual object is the virtual object manipulated by the first game user in the game scene;
  • the target audio is to convert the voice audio of the first game user The audio obtained later and matched with the character attribute of the first virtual object;
  • the target audio is played according to the spatial position information of the first virtual object, wherein the first virtual object and the second virtual object are in the same game scene, and the second virtual object is a virtual object manipulated by the second game user in the game scene.
  • An embodiment of the present application provides an audio processing device, which includes:
  • An acquisition unit configured to acquire the voice audio of the first game user and the spatial position information of the first virtual object manipulated by the first game user in the game scene;
  • a processing unit configured to convert the voice audio of the first game user to obtain target audio matching the character attribute of the first virtual object
  • the processing unit is further configured to send the target audio and the spatial position information of the first virtual object to the second game user, so that the second game user can play the target audio according to the spatial position information of the first virtual object, wherein the second game The second virtual object manipulated by the user is in the same game scene as the first virtual object.
  • the embodiment of the present application also provides an audio processing device, which includes:
  • the receiving unit is used to receive the target audio of the first game user and the spatial position information of the first virtual object, the first virtual object is a virtual object manipulated by the first game user in the game scene; the target audio is for the first game user The audio obtained after the speech audio is converted and matched with the character attribute of the first virtual object;
  • the processing unit is configured to play the target audio according to the spatial position information of the first virtual object, wherein the first virtual object and the second virtual object are in the same game scene, and the second virtual object is the second game user in the game scene
  • the manipulated virtual object is configured to play the target audio according to the spatial position information of the first virtual object, wherein the first virtual object and the second virtual object are in the same game scene, and the second virtual object is the second game user in the game scene.
  • An embodiment of the present application provides an audio processing device, and the audio processing device includes:
  • a computer-readable storage medium A computer program is stored in the computer-readable storage medium. When the computer program is executed by a processor, the above audio processing method is implemented.
  • An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is suitable for being loaded by a processor and executing the above audio processing method.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the audio processing device reads the computer instructions from the computer-readable storage medium, and when the computer instructions are executed by the processor, the above-mentioned audio processing method is realized.
  • FIG. 1 shows a schematic diagram of the architecture of an audio processing system provided by an exemplary embodiment of the present application
  • Fig. 2 shows a schematic flow chart of an audio processing method provided by an exemplary embodiment of the present application
  • Fig. 3 shows a schematic flow chart of an analog-to-digital conversion provided by an exemplary embodiment of the present application
  • Fig. 4 shows a schematic diagram of prompting to turn on the microphone provided by an exemplary embodiment of the present application
  • Fig. 5 shows a schematic diagram of a game scene in which the target point is a camera provided by an exemplary embodiment of the present application
  • FIG. 6 shows a schematic diagram of converting a time-domain signal into a frequency-domain signal according to an exemplary embodiment of the present application
  • Fig. 7a shows a schematic diagram of transmitting target audio and spatial position information of a first virtual object using two different data channels provided by an exemplary embodiment of the present application
  • Fig. 7b shows a schematic diagram of transmitting target audio and spatial position information of a first virtual object using the same data channel provided by an exemplary embodiment of the present application
  • Fig. 8 shows a schematic flowchart of an audio processing method provided by an exemplary embodiment of the present application
  • FIG. 9 shows a schematic diagram of a mapping relationship between distance information and volume information provided by an exemplary embodiment of the present application.
  • Fig. 10 shows a schematic flowchart of an audio processing method provided by an exemplary embodiment of the present application
  • Fig. 11a shows a schematic flowchart of a method for performing audio processing by a source terminal according to an exemplary embodiment of the present application
  • Fig. 11b shows a schematic flowchart of a method for performing audio processing by a cloud forwarding server provided by an exemplary embodiment of the present application
  • Fig. 11c shows a schematic flowchart of a method for performing audio processing by a target terminal provided in an exemplary embodiment of the present application
  • Fig. 12 shows a schematic structural diagram of an audio processing device provided by an exemplary embodiment of the present application.
  • Fig. 13 shows a schematic structural diagram of an audio processing device provided by an exemplary embodiment of the present application
  • Fig. 14 shows a schematic structural diagram of an audio processing device provided by an exemplary embodiment of the present application.
  • relevant game scenarios provide users with a relatively simple and direct voice and audio processing mode, that is, after the voice and audio of game users are encoded, they are directly transmitted to other game users for voice and audio playback.
  • This makes the sound effect presented when playing voice audio is flat, and cannot reflect the three-dimensional spatial relationship between game characters controlled by multiple game users, and the timbre of voice audio is similar to the voice of game users in the real world. Speech audio in scenes lacks stealth.
  • Embodiments of the present application provide an audio processing method, device, device, medium, and program product, which can improve the three-dimensional sense of voice and audio in game scenes, and enhance the privacy of voice and audio.
  • the audio processing system includes multiple terminals (such as Terminal 101, terminal 102, ...) and servers (such as server 103, server 104, and server 105), the embodiment of the present application does not limit the number of terminals and servers.
  • terminals may include but are not limited to: smart phones (such as Android phones, iOS phones, etc.), tablet computers, portable personal computers, mobile Internet devices (Mobile Internet Devices, MID for short), smart TVs, vehicle-mounted devices, head-mounted devices, etc. Audio processing device capable of touch screen.
  • An application program (which may be referred to as an application for short, such as a game application, a social application, a video application, a web application, a game applet deployed in any application, etc.) may run on the terminal.
  • Servers may include, but are not limited to: data processing servers, Web servers, application servers, cloud servers (or cloud servers for short), and other devices with complex computing capabilities.
  • the server may be a background server of any application, and is used for interacting with a terminal running the application to provide computing and application service support for the application.
  • the server may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers.
  • the terminal and the server may be directly or indirectly connected by wire or wirelessly, and this embodiment of the present application does not limit the connection method between the terminal and the server.
  • the so-called game scene may refer to a three-dimensional space scene provided by the target game and supporting one or more game players (or called game users) to play.
  • the game scenes provided by the target game may include: a scene where a virtual object (that is, a character controlled by a player in the target game) drives a vehicle (such as a car, a boat, etc.), a scene where a virtual object shoots a gun, a scene where a virtual object parachute The scene, ....
  • the target game may include but not limited to: client game, web game, applet game, cloud game, arcade game, remote control game and so on.
  • the so-called audio refers to all the sounds that humans can hear; audio is widely used in various fields due to its advantages of high synchronization and strong interaction, such as audio is used in the field of games. For example, assuming that the game scene includes game user 1 and game user 2, then the voice and audio of game user 1 can be collected and sent to game user 2 to realize the information between multiple game users in the game scene comminicate.
  • the general principle of the audio processing solution proposed in the embodiment of the present application may include: if the voice audio of the first game user (such as any game user) in the game scene is obtained, then convert the voice audio, Make the target audio obtained after the conversion process match the character attributes of the first virtual object; this not only ensures that the target audio can accurately convey the content that the first game user wants to express, but also adjusts the timbre of the target audio to be consistent with the first virtual object.
  • a timbre matching the character attributes of a virtual object avoids exposing the real voice of the first game user, and improves the privacy and interest of the voice.
  • the spatial position information of the first virtual object manipulated by the first game user in the game scene can also be obtained, so that when the target audio is played based on the spatial position information of the first virtual object, it can represent the presence of the first virtual object in the game scene.
  • the spatial position provides the second game user (any game user except the first game user among the game users participating in the target game) with a more realistic three-dimensional sense of space.
  • the audio processing solution may be jointly executed by the source terminal used by the first game user, the target terminal used by the second game user, and the server, or by a target application (such as Any application), the running target application in the target terminal used by the second game user, and the background server corresponding to the target application are jointly executed; for the convenience of illustration, the source terminal, the target terminal and the server are used to jointly execute the audio processing scheme in the following Introduce as an example.
  • the source terminal used by the first game user may be terminal 101
  • the target terminal used by the second game user may be terminal 102
  • the server may be a cloud server. It may include: a cloud configuration server 103, a cloud signaling server 104, and a cloud data transmission server 105; the following three kinds of cloud servers are briefly introduced:
  • the cloud configuration server 103 can provide configuration services for the target game, and can specifically provide configuration resources for the running of the target game. For example, when the first game user uses the terminal 101 to open the target game, the terminal 101 sends a data configuration request to the cloud configuration server 103, and the data configuration request is used to request the cloud configuration server 103 to return the configuration resources needed to initialize the target game, so that the terminal 101 Initialize the target game based on the configuration resources.
  • the cloud signaling server 104 is used to implement communication connections between multiple game users (or multiple terminals used by multiple game users) participating in the target game.
  • the status update (such as the update of the network status of each terminal, etc.) can be implemented through the cloud signaling server; for example, game user 1, game User 2 and game user 3 participate in the same game scene. If it is detected that the terminal used by game user 1 is disconnected from the cloud signaling server 104, if game user 1 is offline, the cloud signaling server 104 will send a message to game user 2 and the cloud signaling server 104. Game user 3 sends a notification message, which is used to notify game user 1 of going offline.
  • the cloud data transmission server 105 is used to implement data forwarding between multiple game users (or multiple terminals used by multiple game users) participating in the target game.
  • the cloud data transmission server 105 can be used to forward the target audio of the first game user sent by the terminal 101 to the terminal 102 .
  • the above is just a brief introduction to the three cloud servers, and the three cloud servers will be further introduced in combination with specific embodiments later.
  • the number of second game users in the same game scene as the first game user can be at least two, since the audio processing flow between any second game user and the first game user is consistent, therefore In the following, a second game user is taken as an example to introduce the audio processing solution.
  • the cloud configuration server 103, cloud signaling server 104, and cloud data forwarding server 105 mentioned above are independent cloud servers, and the terminal can interact with any one or more of the three cloud servers as required.
  • the embodiment of the present application may also involve other types of cloud servers, and the embodiment of the present application does not limit the type and quantity of cloud servers.
  • the embodiment of the present application proposes a more detailed audio processing method, and the audio processing method proposed in the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • Fig. 2 shows a schematic flow chart of an audio processing method provided by an exemplary embodiment of the present application; the embodiment of the present application takes the audio processing method executed by the above-mentioned source terminal as an example for illustration, the audio processing method May include but not limited to steps S201-S204:
  • S201 Acquire voice and audio of the first game user.
  • the voice audio of the first game user refers to: the digital signal obtained by sound collection and processing of the analog signal captured by the microphone; the analog signal captured by the microphone is obtained by collecting the sound of the physical environment where the first game user is located by the microphone .
  • the microphone may be deployed in the source terminal used by the first game user, or the microphone may be an external device connected to the source terminal. Specifically, when the microphone is in the open state, the microphone can collect the sound in the physical environment of the first game user to obtain an analog signal; then perform sound collection processing on the collected analog signal, and convert the analog signal into a A digital signal transmitted by a device.
  • An analog signal also known as a continuous signal, is a representation of a continuously changing physical quantity of a signal and information, for example, the amplitude, frequency or phase of a signal changes continuously over time.
  • Digital signals are also called discrete signals. Compared with analog signals, they refer to signals that are discrete and discontinuous in value.
  • the digital signal is obtained by collecting and processing the sound of the analog signal. Specifically, it can be generated by sampling, quantizing and encoding the analog signal by using pulse code modulation (Pulse Code Modulation, PCM).
  • PCM Pulse Code Modulation
  • the following is a brief introduction to the process of converting an analog signal into a digital signal in conjunction with the flow diagram of the analog-to-digital conversion shown in Figure 3; as shown in Figure 3, first, the continuously changing analog signal is sampled to obtain discrete sampling values ; Sampling refers to the process of periodically scanning the analog signal and changing the time-continuous signal into a time-discrete signal. Secondly, quantify the discrete sampling value obtained by sampling.
  • quantization refers to the process of discretizing the instantaneous value obtained by sampling, that is, using a set of specified levels to express the instantaneous value with the closest level value. Usually expressed in binary. Finally, the quantized value is encoded to obtain a digital signal.
  • encoding is to use a set of binary codes to mark each quantized value with a fixed level. It should be understood that the waveform of the analog signal and the values of the horizontal and vertical coordinates shown in FIG. Explained here.
  • the analog signal can be converted into a digital signal that can be processed by the source terminal, that is, the obtained Voice audio of the first game user.
  • the embodiment of the present application also supports prompting the first game user to turn on the microphone. For example, a prompt message is output on the display screen of the source terminal, and the prompt message is used to prompt the first game user to turn on the microphone so as to collect the voice audio of the first game user; Can be "Please turn on the microphone"; and so on.
  • the microphone deployed on the source terminal when the first game user successfully logs in to the target game using an identity (such as game account, password, fingerprint information, face information, etc.), if it is detected that the microphone is not turned on, the Output a prompt message (prompt message 401 as shown in Figure 4) on the display screen, so that the first game user can perform the operation of turning on the microphone after seeing the prompt message; , perform the steps of initializing the relevant parameters of the microphone, such as setting the acquisition rate of the microphone (also known as the sampling frequency, which refers to the number of samples collected per unit time), the number of channels (that is, the number of sound wave data generated each time when collecting sound) number), the number of sampling bits (that is, the number of bits used for each sampling point), etc., in order to start the microphone.
  • the acquisition rate of the microphone also known as the sampling frequency, which refers to the number of samples collected per unit time
  • the sampling frequency that is, the number of sound wave data generated each time when collecting sound
  • sampling bits that is, the number of bits
  • S202 Obtain spatial position information of a first virtual object manipulated by a first game user in a game scene.
  • the first virtual object is a game character controlled by the first game user in the game scene, and the actions performed by the game character (such as shooting, jumping, running, etc.) are all controlled by the first game user.
  • the spatial position information of the first virtual object in the game scene can be used to represent: the three-dimensional position information of the first virtual object in the game scene, namely (X, Y, Z), where X, Y, and Z correspond to the distances in three directions , the unit is meter (or centimeter, kilometer and other units); the spatial position information in the game scene can be restored to the position information in the real world according to a certain proportion.
  • the spatial position information of the first virtual object may include two types; in one implementation manner, the spatial position information of the first virtual object may include: the target of the first virtual object determined based on the coordinate origin in the game scene coordinate. In another implementation manner, the spatial position information of the first virtual object may include: target distance information and orientation information between the first virtual object and the second virtual object in the game scene.
  • the second virtual object here is a game character controlled by the second game user and in the same game scene as the first virtual object.
  • the first virtual object and the second virtual object being in the same game scene may refer to: the first game user who manipulates the first virtual object enters the same game scene as the second game user who manipulates the second virtual object in the target game (or Simply understood as entering the same game room).
  • the game scene often includes multiple frames of game screens, and the first virtual object and the second virtual object in the same game scene are not necessarily displayed in each frame of the game screen of the game scene at the same time; that is, according to In the game playing situation, only the first virtual object or the second virtual object may be included in one frame of image of the game scene, but the first virtual object and the second virtual object are still in the same game scene.
  • the spatial position information of the first virtual object includes: target coordinates of the first virtual object determined based on the coordinate origin in the game scene.
  • the implementation of obtaining the spatial position information of the first virtual object manipulated by the first game user in the game scene may include: first determining the target point in the game scene as the coordinate origin; and then establishing the spatial coordinates according to the coordinate origin system; finally, the target coordinates of the first virtual object are generated based on the space coordinate system.
  • the target point in the game scene may include: a camera (or called a camera) or a light source point; the camera in the game scene is similar to human eyes and can be used to observe the game scene; It is used to illuminate the game scene, so that physical shadows can be generated in the game scene, increasing the realism and three-dimensional sense of the game scene.
  • the camera and the light source point in the game scene may be located at the same or different positions, which is not limited in this embodiment of the present application.
  • An exemplary game scene in which the target point is a camera can be referred to FIG. 5, as shown in FIG.
  • the target coordinates ie spatial position information
  • the target coordinates are (2,10,0). It is not difficult to understand that, depending on the location of the camera in the game scene, or the direction of the spatial coordinate system established based on the camera, the spatial position information of the first virtual object in the game scene is different.
  • the specific value of the spatial position information of the first virtual object is not limited.
  • the spatial position information of the first virtual object includes: target distance information and orientation information between the first virtual object and the second virtual object.
  • the implementation of obtaining the spatial position information of the first virtual object manipulated by the first game user in the game scene may include: first obtaining the first position information of the first virtual object in the game scene, and the second virtual object The second position information of the object in the game scene; then, perform distance calculation on the first position information and the second position information to obtain the target distance information between the first virtual object and the second virtual object; and then calculate the first position information The orientation calculation is performed with the second position information to obtain the orientation information between the first virtual object and the second virtual object.
  • the first position information of the first virtual object in the game scene may refer to the target coordinates (or referred to as the first coordinates) determined based on the origin of the coordinates of the first virtual object in the game scene as mentioned in the foregoing embodiments; similarly
  • the second position information of the second virtual object in the game scene may refer to the second coordinates of the second virtual object in the game scene determined based on the coordinate origin.
  • the first virtual object or the second virtual object can also be directly used as target point to establish a space coordinate system; for example, when the first virtual object is used as the target point to establish a space coordinate system, the first coordinate of the first virtual object is (0,0,0) by default, then only the second virtual object can be calculated The second coordinate of the object in the spatial coordinate system; to a certain extent, it can reduce the calculation amount of calculating the spatial position information and improve the data processing efficiency.
  • the distance calculation between the first coordinate and the second coordinate can obtain the first virtual object and the second virtual
  • the target distance information between objects is about 11.7.
  • the embodiment of the present application does not limit the specific implementation of the distance calculation between the first virtual object and the second virtual object.
  • the orientation information between the first virtual object and the second virtual object can be obtained by performing the orientation calculation on the first coordinate and the second coordinate: in the x-axis direction, the first virtual object is closer than the second virtual object Close to the coordinate origin, the first virtual object is farther from the coordinate origin than the second virtual object in the y-axis direction, and the first virtual object is closer to the origin than the second virtual object in the z-axis direction.
  • the embodiment of the present application introduces the frontal orientation of the second virtual object to express the orientation information of the first virtual object and the second virtual object.
  • the front of the second virtual object faces the positive direction of the y-axis
  • the orientation information between the first virtual object and the second virtual object can be expressed as: the first virtual object is located on the upper left of the second virtual object about 30° from the side.
  • the orientation information between the first virtual object and the second virtual object can be expressed as other content; for example, if the front of the second virtual object faces the negative direction of the x-axis, then The orientation information between the first virtual object and the second virtual object may be expressed as: the first virtual object is located about 60° above and to the right of the second virtual object.
  • S203 Perform conversion processing on the voice audio of the first game user to obtain target audio matching the character attribute of the first virtual object.
  • the voice audio of the first game user is obtained by performing sound collection processing on the voice of the first game user collected by the microphone, and the timbre of the voice audio is similar to the timbre of the real voice of the first game user If the voice audio of the first game user is directly played, then the second game user is likely to recognize the real identity of the first game user based on the timbre of the voice audio of the first game user, resulting in the real identity of the first game user Exposure of identity.
  • the embodiment of the present application supports converting the speech audio of the first game user, and the timbre of the target audio obtained by the conversion processing is different from the timbre of the speech audio; this can ensure that the second game user cannot recognize the audio based on the target audio.
  • the real identity of the first game user can be revealed, and the privacy and fun of the voice can be improved.
  • the step of converting the voice audio of the first game user may include but not limited to steps s11-s13, wherein:
  • s11 Perform a first conversion process on the voice and audio of the first game user, and extract the frequency domain information of the voice and audio of the first game user.
  • the sounds produced in the natural environment are composed of a series of vibrations with different frequencies and amplitudes emitted by the sounding object (or simply called the sounding body, such as the first game user). (or superposition).
  • the sound produced by the vibration with the lowest frequency among multiple vibrations is called the fundamental tone
  • the fundamental tone is often the sound produced by the overall vibration of the sounding object, which can determine the pitch of the sound and is used to express the main content of the sound
  • the sound other than the sound produced by the vibration with the lowest frequency is called overtone.
  • Overtone is often the sound produced by the vibration of the part of the pronunciation object, which can determine the timbre of the sound (such as immature timbre, low-level timbre, rough timbre, etc.).
  • the voice audio of the first game user is a time-domain signal formed by the superposition of the fundamental tone and the overtone corresponding to at least one frequency.
  • the waveform of the time-domain signal reflected on the coordinate axis is continuous over time.
  • the abscissa of the coordinate axis is time, and the ordinate is the change of the signal.
  • the frequency domain information includes: the pitch domain information obtained based on the frequency conversion of the pitch in the voice audio, and the overtone obtained based on the frequency conversion of the overtone in the voice audio Audio domain information.
  • the first transform processing described above refers to Fourier transform processing (or simply referred to as Fourier transform), and Fourier transform is a technique for converting a signal into a frequency, that is, a transformation method for converting the time domain to the frequency domain.
  • Fourier transform is a technique for converting a signal into a frequency, that is, a transformation method for converting the time domain to the frequency domain.
  • An exemplary schematic diagram of transforming a time-domain signal into a frequency-domain signal can be seen in FIG. 6. As shown in FIG. 6, each frequency in the waveform of the time-domain signal is disassembled, and the value of each frequency is mapped to On the abscissa, the frequency domain signal corresponding to the time domain signal can be obtained by mapping the amplitude value corresponding to the frequency to the ordinate.
  • s12 Modify the overtone domain information according to the role attribute of the first virtual object, to obtain modified overtone domain information.
  • the frequency domain information of voice audio includes fundamental audio domain information and overtone domain information
  • the base audio domain information determines the content that the first game user wants to express
  • the overtone domain information determines the first game user's The timbre of the sound.
  • the overtone domain information may be modified according to the role attribute of the first virtual object.
  • the specific implementation process may include: first obtain the audio configuration information corresponding to the role attribute of the first virtual object, the audio configuration information includes harmonic configuration information; then modify the harmonic domain information according to the harmonic configuration information to obtain the modified harmonic audio domain information.
  • the audio configuration information can be used to modify the harmonic domain information (such as the harmonic frequency segment in the frequency segment corresponding to the frequency domain information).
  • the modification here may refer to gaining (eg amplifying the amplitude value in the overtone domain information) or attenuating (eg reducing the amplitude value in the overtone domain information) the information in the overtone domain to obtain the modified overtone domain information.
  • the role attribute of the first virtual object may include but not limited to: age attribute, gender attribute, appearance attribute, etc., and the timbres of voices corresponding to virtual objects with different role attributes are different.
  • the audio configuration information corresponding to the character attribute of the first virtual object is determined according to the character attribute of the first virtual object; for example, the character attribute 1 of the first virtual object includes the audio configuration information 1 and the first
  • the audio configuration information 2 is different when the role attribute 2 of the virtual object includes "60 years old, female".
  • the timbre represented by the audio configuration information 1 is more tender and crisp than the timbre represented by the audio configuration information 2.
  • the audio configuration information of different character attributes is set in advance by business personnel.
  • the second game user can determine the second virtual object according to the character attributes selected or configured by the first game user. Audio configuration information corresponding to a virtual object.
  • the audio configuration information corresponding to the role attribute of the first virtual object can also be generated according to the game scene; in this implementation mode, after the voices and audios of multiple game players in the game scene are modified, the timbre of the modified voices and audios is identical.
  • the modified overtone domain information indicates The timbres are all different from the timbres of the real voices of the first game user, improving the privacy of the voices.
  • the audio configuration information is determined according to the character configuration of the first virtual object
  • the modified overtone domain information of the corresponding audio configuration information is not the same, which makes the sounds of multiple game users have different timbres. To a certain extent, the uniqueness of the game sound in the game scene can be realized, and the interest of the target game can be improved. This will increase game user stickiness.
  • s13 Fuse the fundamental audio domain information and the modified overtone domain information, and perform a second transformation process on the fused frequency domain information to obtain a target audio that matches the role attribute of the first virtual object.
  • the embodiment of the present application further performs a second transformation process on the fused frequency domain information, so that the frequency domain information is transformed into target audio corresponding to the time domain.
  • the second transformation processing is an inverse Fourier transform, and the inverse Fourier transform can transform a frequency-domain signal into a time-domain signal.
  • the processing process of the inverse Fourier transform is similar to the processing process of the aforementioned Fourier transform, and will not be described in detail in this embodiment of the present application.
  • the voice audio of the first game user can be converted and processed to obtain the target audio after the timbre change, that is, the conversion process is to change the timbre of the voice audio;
  • a game user's real voice with the same timbre is transformed into a target audio whose timbre matches the character attribute of the first virtual object; on the premise of accurately conveying the content that the first game user wants to express, the transmitted voice is changed
  • the timbre makes it difficult for the second game user participating in the same game scene as the first game user to perceive the real identity of the first game user, which improves the interest of the target game and improves the stickiness of the game user.
  • S204 Send the target audio and the spatial position information of the first virtual object to the second game user.
  • the embodiment of the present application supports sending the target audio and the spatial position information of the first virtual object to the second game user, and the second virtual object manipulated by the second game user is in the same scene as the first virtual object manipulated by the first game user middle.
  • the second game user can play the target audio according to the spatial position information of the first virtual object, specifically according to the relationship between the first virtual object and the second virtual object.
  • the target audio is played based on the distance information and azimuth information of the target.
  • the spatial position information of the first virtual object indicates that: when the distance between the first virtual object and the second virtual object is relatively short, the volume of the target audio is played at a higher volume, so that the second game user understands the first virtual object
  • the distance between the second virtual object and the second virtual object is relatively short; on the contrary, when the distance between the first virtual object and the second virtual object is relatively large, the volume of the target audio is low when playing the target audio, so that the second game user understands the second virtual object.
  • the distance between a virtual object and a second virtual object is far.
  • the spatial position information of the first virtual object indicates: the first virtual object is located directly behind the second virtual object (or in other directions), then when the target audio is played, the second game user feels that the sound source is directly behind, which means This enables the second game user to experience a relatively three-dimensional auditory experience, thereby improving the authenticity of the game scene.
  • the embodiment of the present application supports the use of mutually independent data channels to independently send the target audio and the spatial position information of the first virtual object to the second game user; or, use the same data channel to transmit the target audio and the spatial position information of the first virtual object sent to the second game user.
  • the above two transmission methods are introduced below; among them:
  • the target audio is encoded to generate the first audio data packet;
  • the encoding here is not the same as the encoding in the aforementioned pulse code modulation, and the encoding here adopts a compression algorithm (compaction algorithm) Compress the target audio to reduce the space occupied by the target audio, which can improve the efficiency and speed of data transmission and reduce the energy consumption of data transmission;
  • the compression algorithm refers to the algorithm of data compression, which is also often called signal coding in the field of electronics and communication.
  • compression may include but not limited to: dictionary algorithm, fixed bit length algorithm (Fixed Bit Length Packing), run-length encoding (run-length encoding, RLE), etc.
  • dictionary algorithm Fixed Bit Length Packing
  • run-length encoding run-length encoding, RLE
  • the second data channel uses the second data channel to send the spatial position information of the first virtual object to the second game user; specifically, generate a second audio data packet based on the spatial position information of the first virtual object, and send the second audio data packet
  • the second data generated based on the spatial position information can also be The audio data packet is encoded, and the encoded second audio data packet is sent through the second data channel.
  • the first data channel and the second data channel are different.
  • FIG. 7a An exemplary schematic diagram of using two different data channels to respectively transmit the target audio and the spatial position information of the first virtual object can be seen in FIG. 7a; as shown in FIG. 7a, the terminal 101 controlled by the first game user can use the first Data channel, the first audio data packet is sent to the cloud data forwarding server 105, so that the cloud data forwarding server 105 adopts the first data channel to forward the first audio data packet to the terminal 102 controlled by the second game user; A terminal 101 controlled by a game user uses the second data channel to send the second audio data packet to the cloud data forwarding server 105, so that the cloud data forwarding server 105 uses the second data channel to forward the second audio data packet to the second game user Terminal 102 for manipulation.
  • the embodiment of the present application does not limit the sequence of sending the target audio and the spatial position information of the first virtual object. That is to say, the first data channel can be used to send the target audio to the second game user, and then the second data channel can be used to send the spatial position information of the first virtual object to the second game user; or, the second data channel can be used first.
  • the channel sends the spatial position information of the first virtual object to the second game user, and then uses the first data channel to send the target audio to the second game user; or, simultaneously uses the first data channel to send the target audio to the second game user , and using the second data channel to send the spatial position information of the first virtual object to the second game user.
  • the target audio is encoded to generate the first audio data packet.
  • the target audio is encoded to generate the first audio data packet.
  • appending the spatial position information of the first virtual object to the first audio data packet may include: appending the spatial position information of the first virtual object to the end or beginning of the first audio data packet;
  • the first audio data packet with the spatial position information of the first virtual object added is sent to the second game user.
  • An exemplary schematic diagram of sending the first audio data packet with the spatial position information of the first virtual object to the second game user can be referred to in FIG. 7 b .
  • the voice and audio of the first game user can be converted so that the converted target audio matches the character attributes of the first virtual object, which ensures that the target audio can accurately convey the first game user's desire. While expressing the content, by adjusting the timbre of the target audio to match the timbre of the character attribute of the first virtual object, the exposure of the real voice of the first game user is avoided, and the privacy and interest of the voice are improved.
  • the spatial position information of the first virtual object in the game scene can be obtained, so that when the target audio is played based on the spatial position information of the first virtual object, the spatial position of the first virtual object in the game scene can be represented and provided to the second Two game users have a more realistic sense of three-dimensional space.
  • Fig. 8 shows a schematic flow chart of an audio processing method provided by an exemplary embodiment of the present application; the embodiment of the present application takes the audio processing method executed by the above-mentioned target terminal as an example for illustration, the audio processing method May include but not limited to steps S801-S802:
  • S801 Receive target audio of a first game user and spatial position information of a first virtual object.
  • the first virtual object is a virtual object manipulated by the first game user in the game scene;
  • the target audio is obtained by converting the voice audio of the first game user and matches the character attribute of the first virtual object
  • the specific implementation of converting the voice and audio of the first game user to obtain the target audio please refer to the relevant description of the specific implementation shown in step S202 in the embodiment shown in FIG. 2 , and will not repeat them here.
  • the first game user can use an independent data channel to send the target audio and the spatial position information of the first virtual object, or use the same data channel to send the target audio and the first virtual object.
  • the spatial location information of the virtual object when the first game user uses the first data channel to send the target audio and uses the second data channel to send the spatial position information of the first virtual object, the second game user receives the target audio through the first data channel and the second data channel Receive the spatial position information of the first virtual object; similarly, when the first game user uses the same data channel to send the spatial position information of the first virtual object and the target audio to the second game user, the second game user uses the same data channel The channel receives spatial position information of the first virtual object and target audio.
  • S802 Play target audio according to the spatial position information of the first virtual object.
  • the audio playback information between the first virtual object and the second virtual object is determined, the audio playback information includes audio volume information and audio orientation information; and then according to the audio playback information Play the target audio.
  • the audio volume information contained in the audio playback information is determined according to the target distance information between the first virtual object and the second virtual object in the game scene, and the audio volume information is used to indicate the volume when the target audio is played ;
  • the unit of the audio volume information may be decibels, for example, the audio volume information is 100 decibels.
  • the audio orientation information contained in the audio playback information is determined according to the orientation information between the first virtual object and the second virtual object in the game scene, and the audio orientation information is used to indicate the direction of the sound source when the target audio is played;
  • the orientation information may include: the orientation angles of the first game virtual object and the second game virtual object in the game scene, for example, the first game virtual object is located 30° above and to the left of the second game user.
  • the audio playback information includes audio volume information.
  • the implementation of determining the audio volume information based on the spatial position information of the first virtual object may include:
  • target distance information between the first virtual object and the second virtual object is obtained based on the spatial position information of the first virtual object.
  • the manner of determining the target distance information is different. For example: when the spatial position information of the first virtual object includes: the target coordinates determined based on the origin of the coordinates of the first virtual object in the game scene, the second coordinates of the second virtual object can be determined in the game scene first, and then according to the first The target coordinates of the virtual object and the second coordinates of the second virtual object are used to calculate the target distance information between the first virtual object and the second virtual object; wherein, the method of determining the second coordinates of the second virtual object in the game scene can be Refer to the related description of related content in the foregoing embodiment shown in FIG.
  • the spatial position information of the first virtual object includes: target distance information between the first virtual object and the second virtual object, the distance between the first virtual object and the second virtual object can be obtained directly from the spatial position information. target distance information.
  • the mapping relationship between different distance information and volume information is obtained. It can be understood that, according to the different distance information between the first virtual object and the second virtual object, the volume information corresponding to each distance information can be mapped; in this way, for the second game user, the target audio heard by it
  • the volume is also not the same.
  • the distance information indicates that there is a difference of 2 meters between the first virtual object and the second virtual object
  • the volume information that has a mapping relationship with the distance information can be 100 decibels (see Figure 9);
  • the distance information indicates that the first virtual object When the distance between the object and the second virtual object is 10 meters, the volume information mapped to the distance information may be 20 decibels (see FIG.
  • FIG. 9 is only an exemplary mapping relationship between distance information and audio information.
  • distance information and audio information The mapping relationship between them may be different from the mapping relationship shown in FIG. 9 ; the embodiment of the present application does not limit the mapping relationship between distance information and volume information.
  • the audio volume information between the first virtual object and the second virtual object is determined according to the mapping relationship and the target distance information. For example, assuming that the target distance information indicates that the distance between the first virtual object and the second virtual object is 6 meters, then the target distance information is matched with each distance information in the mapping relationship shown in Figure 9, and 6 meters can be obtained.
  • the volume information corresponding to a meter is about 33.3 decibels, so 33.3 decibels is used as the audio volume information between the first virtual object and the second virtual object.
  • the audio playback information includes audio orientation information.
  • the spatial position information of the first virtual object may include: the target coordinates of the first virtual object determined based on the coordinate origin in the game scene, or the orientation information between the first virtual object and the second virtual object; Then, according to the content contained in the spatial position information of the first virtual object, the manner of determining the audio orientation information is different.
  • the spatial position information of the first virtual object includes: the target coordinates determined based on the origin of the coordinates of the first virtual object in the game scene, the second coordinates of the second virtual object can be determined in the game scene first, and then according to the first
  • the target coordinates of the virtual object and the second coordinates of the second virtual object are used to calculate the audio orientation information between the first virtual object and the second virtual object; wherein, the realization method of determining the second coordinates of the second virtual object in the game scene , and, for the implementation of determining the audio orientation information according to the second coordinates of the second virtual object and the target coordinates of the first virtual object, please refer to the relevant description of the related content in the embodiment shown in FIG. 2 above, and details are not repeated here.
  • the spatial position information of the first virtual object includes: the orientation information between the first virtual object and the second virtual object, the orientation information between the first virtual object and the second virtual object can be obtained directly from the spatial position information
  • the orientation information is the audio orientation information.
  • the embodiment of the present application analyzes the target audio volume information and audio orientation information according to the audio volume information and audio orientation information. Playing is performed so that the played target audio can reflect the distance and direction of the first virtual object and the second virtual object in the game scene.
  • the implementation of playing the target audio according to the audio volume information and audio orientation information is different.
  • the physical environment where the second game user is located contains multiple speakers, or the target terminal held by the second game user can call the target acoustic function as an example.
  • the implementation of playing is given as an example, where:
  • the multiple speakers can be adjusted first, so that the adjusted speakers can reflect the first virtual object when playing the target audio. and the direction between the second virtual object; then, play the target audio according to the audio volume information and the adjusted multiple speakers.
  • the adjustment of the multiple speakers may include: adjusting the placement position, playback mode, or power of the multiple speakers; the embodiment of the present application does not limit the specific adjustment method.
  • the distance between the first virtual object and the second virtual object can be reflected when the target audio is played according to the audio volume information, and the first virtual object and the second virtual object can be reflected when the target audio is played according to the adjusted multiple speakers
  • the direction or orientation between the speakers makes the sound effect produced by multiple speakers form a surround sound effect.
  • the target acoustic function can be called to filter the target audio to obtain the filtered target audio;
  • the human ear can perceive which direction the first virtual object is located in the second virtual object in the game scene; then, the filtered target audio is played according to the audio volume information.
  • the distance between the first virtual object and the second virtual object can be perceived according to the audio volume information, and the direction between the first virtual object and the second virtual object can be perceived according to the filtered target audio.
  • the target acoustic function may include Head Related Transfer Functions (Head Related Transfer Functions, HRTF), and at this time, the sound effect localization mode may refer to the HRTF mode.
  • HRTF also known as ATF (anatomical transfer function)
  • ATF anatomical transfer function
  • HRTF uses interaural time delay (Interaural Time Delay, HDITD), binaural amplitude difference (Interaural Amplitude Difference, IAD) and pinna frequency vibration technology can process the target audio in real time, so that the processed target audio can produce stereo sound effects, so that when the sound of the processed target audio is transmitted to the pinna, ear canal and tympanic membrane in the human ear, the human ear There will be a feeling of surround sound.
  • HDITD Interaural Time Delay
  • IAD Interaural Amplitude Difference
  • the embodiment of the present application does not limit the target acoustic function to only the head-related transfer function; for the convenience of explanation, the embodiment of the present application only takes the target acoustic function as the head-related transfer function The processing procedure of the target audio is introduced.
  • the embodiment of the present application can play the target audio according to the audio volume information and audio orientation information, so that the played target audio can reflect the three-dimensional spatial characteristics, provide a three-dimensional auditory experience, and further improve the immersive experience of game users. gaming experience.
  • the voice audio of the first game user is converted so that the converted target audio matches the character attribute of the first virtual object, which ensures that the target audio can accurately convey the content that the first game user wants to express.
  • the timbre of the target audio to match the timbre of the character attribute of the first virtual object, the exposure of the real voice of the first game user is avoided, and the privacy and interest of the voice are improved.
  • FIG. 2 and FIG. 8 respectively introduce in detail the implementation of the audio processing method performed by the source terminal and the target terminal.
  • the source terminal, the target terminal, and the server such as cloud server
  • the audio processing method may include but not limited to steps S1001-S1018:
  • the source terminal sends a data configuration request to the cloud configuration server.
  • the source terminal receives the configuration information returned by the cloud configuration server in response to the data configuration request.
  • the source terminal initializes the target game according to the configuration information.
  • the source terminal sends a data configuration request to the cloud configuration server.
  • a game user-related configuration information may include the configuration resources required to run the target game, and the configuration resources include: configuration parameters (such as the resolution of the game screen, the system framework of the target game, game data, etc.), and the configuration parameters of the first game user in the target game. Character attributes (such as gender, age, etc.) of the first virtual object manipulated in the game scene.
  • the source terminal may initialize the target game based on the configuration resource (eg, load the configuration resource for running the target game), so that the first game user successfully starts the target game.
  • the source terminal detects a trigger event of entering a game scene of the target game.
  • the source terminal sends a state change request to the cloud signaling server.
  • the source terminal receives the feedback result returned by the cloud signaling server.
  • the trigger event that triggers the entry into the game scene of the target game may include: an event generated when the trigger operation is performed on the entrance to the game scene in the first game interface of the target game, and a second game user's input is received. Events generated when a game is invited, events generated when a voice signal for controlling entry into a game scene is received, and so on. When a trigger event is generated, it means that the first game user intends to enter the game scene.
  • the source terminal can generate a state change request based on the trigger event, and the state change request carries the scene identifier of the game scene (such as desert scene, competitive scene, scene, rainforest scene, etc.), the state change request is used to request the cloud signaling server to record the relevant game data of the game scene played by the first game user (such as the game start time, the scene identification of the game scene, the second player in the same game scene) Information about game users, etc.).
  • the cloud signaling server can return a feedback result to the source terminal in response to the state change request sent by the source terminal.
  • the feedback result can include: success or failure to enter the game scene, and the second game user in the same game scene as the first virtual object.
  • the object data (such as nickname, historical game record, game level, etc.) of the second virtual object.
  • the target terminal sends a data configuration request to the cloud configuration server.
  • the target terminal receives the configuration information returned by the cloud configuration server in response to the data configuration request.
  • the target terminal initializes the target game according to the configuration information.
  • steps S1007-S1009 refer to the relevant description of the specific implementation shown in steps S1001-S1003, but the specific implementation shown in steps S1007-S1009 is executed by the target terminal , and the specific implementation manner shown in steps S1001-S1003 is performed by the source terminal; the specific implementation manner shown in steps S1007-S1009 will not be repeated here.
  • the target terminal detects a trigger event of entering a game scene of the target game.
  • the target terminal sends a state change request to the cloud signaling server.
  • the target terminal receives the feedback result returned by the cloud signaling server.
  • steps S1010-S1012 refer to the relevant description of the specific implementation shown in steps S1004-S1006, but the specific implementation shown in steps S1010-S1012 is executed by the target terminal , and the specific implementation manner shown in steps S1004-S1006 is performed by the source terminal; the specific implementation manner shown in steps S1010-S1012 will not be repeated here.
  • the source terminal acquires the voice and audio of the first game user, and acquires spatial position information of the first virtual object manipulated by the first game user in the game scene.
  • step S1013 For the specific implementation manner of step S1013 , reference may be made to the relevant description of the specific implementation manner shown in step S201 in the embodiment shown in FIG. 2 , which will not be repeated here.
  • the embodiment of the present application also supports the first game user to select the first virtual object in the target game, so that the subsequent first game The user can manipulate the first virtual object to play the target game.
  • selecting the first virtual object can be understood as selecting or setting the role attribute of the first virtual object, for example, selecting the role attribute of the first virtual object as female, setting the age of the first virtual object as 20 years old, and so on.
  • this embodiment of the present application also supports notifying the first game user to turn on the microphone, wherein the specific implementation process of turning on the microphone can be referred to in the aforementioned step S201 The relevant descriptions are not repeated here.
  • the source terminal converts the voice and audio of the first game user to obtain target audio that matches the character attribute of the first virtual object.
  • step S1014 reference may be made to the relevant description of the specific implementation manner shown in step S202 in the embodiment shown in FIG. 2 , and details are not repeated here.
  • the voice pre-processing before converting the voice and audio of the first game user, it also supports performing sound pre-processing on the voice and audio of the first game user to obtain the pre-processed voice and audio. Compared with the pre-processed voice and audio Before the pre-processing of the voice audio, the interference signal and noise signal generated by the environment or the circuit in the voice audio are filtered out, so that the audio quality is higher and the clarity of the voice audio is improved.
  • the voice pre-processing may include but not limited to: processing methods such as echo cancellation, noise reduction, and voice activity detection; the embodiment of the present application does not limit the specific implementation of the voice pre-processing, which will be described here.
  • the source terminal sends the target audio and the spatial position information of the first virtual object to the cloud data forwarding server.
  • step S1015 reference may be made to the relevant description of the specific implementation manner shown in step S203 in the embodiment shown in FIG. 2 , which will not be repeated here.
  • the cloud data forwarding server sends the target audio and the spatial position information of the first virtual object to the target terminal.
  • the target terminal receives the target audio and the spatial position information of the first virtual object forwarded by the cloud data forwarding server.
  • steps S1016-S1018 reference may be made to relevant descriptions of specific implementation manners shown in steps S801-S802 in the embodiment shown in FIG. 8 , and details are not repeated here.
  • the source terminal can convert the multiple voice audios collected to generate target audio corresponding to each voice audio, and convert each target audio
  • the audio is encoded to obtain a first audio data packet corresponding to each voice audio, and each first audio data packet and the spatial position information corresponding to each voice audio are sent to the cloud forwarding server.
  • the target terminal can receive multiple first audio data packets and corresponding spatial position information.
  • the embodiment of the present application supports multiple first audio data packets and corresponding spatial position information after receiving multiple first audio data packets and The corresponding spatial position information is buffered and sorted.
  • the so-called buffered sorting refers to sorting and storing the received signals (such as multiple first audio data packets and corresponding spatial position information) according to the order of generation of the source terminal, so that the subsequent When playing the target audio in the plurality of first audio data packets in the sequence of buffering and sorting, it is possible to prepare to deliver the content that the first game user wants to express.
  • the source terminal sends the first audio data packet generated after encoding the target audio to the cloud data forwarding server, so that the cloud data forwarding server forwards the first audio data packet to the target terminal; after receiving the first audio data packet forwarded by the cloud data forwarding server, the target terminal will also decode the first audio data packet to obtain the target audio.
  • Decoding is the process of decompressing the first audio data packet using the decompression algorithm to restore the target audio; the process of encoding and decoding is corresponding, that is, the target terminal needs to use the decompression algorithm corresponding to the compression algorithm used by the source terminal to decompress the first audio data packet. An audio packet is decompressed.
  • steps S1001-S1018 show the flow of the source terminal, target terminal and server (such as cloud configuration server, cloud signaling server and cloud data forwarding server) jointly executing the audio processing method.
  • server such as cloud configuration server, cloud signaling server and cloud data forwarding server
  • the execution subject of each step is given below in conjunction with Fig. 11a, Fig. 11b and Fig. 11c; wherein:
  • the source terminal can execute steps S1001-S1006, and the specific implementation process shown in steps S1013-S1015; a flow diagram of a source terminal executing steps S1001-S1006, and steps S1013-S1015 can be seen in FIG. 11a.
  • the process of the source terminal executing the audio processing method includes: start (for example, open the target game) ⁇ initialize the target game ⁇ set the first virtual object ⁇ turn on the microphone ⁇ obtain the voice and audio of the first game user ⁇ sound pre-processing ⁇ First transformation processing (such as Fourier transformation) ⁇ overtone characterization (such as modification of overtone domain information corresponding to speech audio) ⁇ second transformation processing (such as inverse Fourier transformation) ⁇ encoding ⁇ obtaining the spatial position information of the first virtual object ⁇ Send the target audio and spatial position information to the cloud data forwarding server ⁇ End (such as releasing system hardware and software resources, exiting the target game).
  • First transformation processing such as Fourier transformation
  • overtone characterization such as modification of overtone domain information corresponding to speech audio
  • second transformation processing such as inverse Fourier transformation
  • the voice audio of the first game user may not be characterized by overtones, so that a three-dimensional auditory sensation can still be experienced when the voice audio is played, but the timbre of the voice audio is similar to the real voice of the first game user.
  • Cloud servers include cloud configuration services, cloud signaling servers, and cloud data forwarding servers. Different cloud servers play different roles in audio processing methods; for example, cloud data forwarding servers are used to implement communication between source terminals and target terminals. data forwarding.
  • the flow of the audio processing method performed by the cloud data forwarding server is given below in conjunction with FIG. 11b.
  • receive data such as cyclically receiving voice and audio and spatial position information sent by the source terminal
  • forward data such as forwarding the received data to the target terminal, such as step S1016)
  • ⁇ end such as releasing the system software hardware resources, exit the target game.
  • the target terminal can execute steps S1007-S1012 and the specific implementation process shown in steps S1017-S1018; a schematic flowchart of a target terminal executing steps S1007-S1012 and steps S1017-S1018 can be seen in FIG. 11c.
  • the process of the target terminal executing the audio processing method includes: start ⁇ initialize the target game ⁇ buffer and sort ⁇ decode ⁇ obtain the spatial position information of the first virtual object ⁇ play the target audio according to the spatial position information ⁇ end.
  • steps S1001-S1018 can be executed first to realize the initialization of the target game by the source terminal, and then steps S1007-S1009 can be executed to realize the target game.
  • the terminal initializes the target game; or, execute steps S1001-S1003 and steps S1007-S1009 at the same time; or, first execute steps S1007-S1009 to realize the initialization of the target terminal to the target game, and then execute steps S1001-S1003 to realize the source terminal to target Game initialization.
  • steps S1001-S1018 are only part of the process steps of the audio processing method. In actual application scenarios, the audio processing method may also include other steps. The embodiment of the present application does not limit the specific implementation steps of the audio processing method.
  • the voice and audio of the first game user can be converted so that the converted target audio matches the character attributes of the first virtual object, which ensures that the target audio can accurately convey the first While the game user wants to express the content, by adjusting the timbre of the target audio to match the character attribute of the first virtual object, avoid exposing the real voice of the first game user, and improve the privacy and interest of the voice.
  • the spatial position information of the first virtual object in the game scene can be obtained, so that when the target audio is played based on the spatial position information of the first virtual object, the three-dimensional position information of the first virtual object in the game scene can be represented, Provide a more realistic three-dimensional sense of space.
  • FIG. 12 shows a schematic structural diagram of an audio processing device provided in an exemplary embodiment of the present application.
  • the audio processing device may be a computer program (including program code) running in the source terminal; the audio processing device may be used In performing some or all of the steps in the method embodiments shown in Figure 2 and Figure 10, the audio processing device includes the following units:
  • An acquisition unit 1201 configured to acquire the voice and audio of the first game user, and acquire the spatial position information of the first virtual object manipulated by the first game user in the game scene;
  • the processing unit 1202 is configured to convert the voice audio of the first game user to obtain target audio matching the character attribute of the first virtual object;
  • the processing unit 1202 is further configured to send the target audio and the spatial position information of the first virtual object to the second game user, so that the second game user plays the target audio according to the spatial position information of the first virtual object, wherein the second The second virtual object controlled by the game user is in the same game scene as the first virtual object.
  • the processing unit 1202 is configured to convert the voice audio of the first game user to obtain a target audio that matches the character attribute of the first virtual object, and is specifically configured to:
  • the voice audio of the first game user is subjected to the first transformation process, and the frequency domain information of the voice audio of the first game user is extracted, and the frequency domain information includes fundamental audio domain information and overtone domain information;
  • the fundamental audio domain information and the modified overtone domain information are fused, and the fused frequency domain information is subjected to a second transformation process to obtain a target audio that matches the role attribute of the first virtual object.
  • the processing unit 1202 is configured to modify the overtone domain information according to the role attribute of the first virtual object, and when obtaining the modified overtone domain information, is specifically configured to:
  • the audio configuration information includes overtone configuration information, the audio configuration information is determined according to the role attribute of the first virtual object, or the audio configuration information is generated according to the game scene;
  • the overtone domain information is modified to obtain the modified overtone domain information.
  • the spatial position information of the first virtual object includes: the target coordinates of the first virtual object determined based on the coordinate origin in the game scene; the processing unit 1202 is configured to acquire The spatial position information of the first virtual object is specifically used for:
  • the target point in the game scene includes: camera or light source point;
  • a space coordinate system is established according to the coordinate origin, and target coordinates of the first virtual object are generated based on the space coordinate system.
  • the spatial position information of the first virtual object includes: target distance information and orientation information between the first virtual object and the second virtual object;
  • the spatial position information of the first virtual object manipulated in it is specifically used for:
  • the orientation calculation is performed on the first position information and the second position information to obtain the orientation information between the first virtual object and the second virtual object.
  • the processing unit 1202 when sending the target audio and the spatial position information of the first virtual object to the second game user, is specifically used for:
  • Encoding the target audio generating a first audio data packet, and sending the first audio data packet to a second game user using a first data channel;
  • the first data channel is different from the second data channel.
  • the processing unit 1202 when sending the target audio and the spatial position information of the first virtual object to the second game user, is specifically used for:
  • each unit in the audio processing device shown in FIG. 12 can be separately or all combined into one or several other units to form, or one (some) units can be split again. It is composed of multiple functionally smaller units, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions. In practical applications, the functions of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit. In other embodiments of the present application, the audio processing device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented cooperatively by multiple units.
  • a general-purpose computing device such as a computer that includes processing elements such as a central processing unit (CPU), a random access storage medium (RAM), and a read-only storage medium (ROM) and storage elements
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • the computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device via the computer-readable recording medium, and executed therein.
  • the processing unit 1202 can be used to convert the voice audio of the first game user, so that the converted target audio matches the character attribute of the first virtual object, which ensures that the target audio can While accurately conveying the content that the first game user wants to express, by adjusting the timbre of the target audio to match the timbre of the character attribute of the first virtual object, it avoids exposing the real voice of the first game user and improves the privacy and privacy of the voice.
  • the processing unit 1202 can be used to obtain the spatial position information of the first virtual object in the game scene, so that when the target audio is played based on the spatial position information of the first virtual object, it can represent the position of the first virtual object in the game scene. Three-dimensional position information provides a more realistic sense of three-dimensional space.
  • FIG. 13 shows a schematic structural diagram of an audio processing device provided in an exemplary embodiment of the present application.
  • the audio processing device may be a computer program (including program code) running in the source terminal; the audio processing device may be used In performing some or all of the steps in the method embodiments shown in FIGS. 8 and 10 , the audio processing device includes the following units:
  • the receiving unit 1301 is used to receive the target audio of the first game user and the spatial position information of the first virtual object, the first virtual object is the virtual object manipulated by the first game user in the game scene; the target audio is for the first game user The voice audio obtained after conversion processing, and the audio that matches the role attribute of the first virtual object;
  • the processing unit 1302 is configured to play the target audio according to the spatial position information of the first virtual object, wherein the first virtual object and the second virtual object are in the same game scene, and the second virtual object is the second game user in the game scene Virtual objects manipulated in .
  • the processing unit 1302 is configured to, when playing the target audio according to the spatial position information of the first virtual object, specifically:
  • the processing unit 1302 is configured to determine the audio playback information between the first virtual object and the second virtual object in the game scene based on the spatial position information of the first virtual object, specifically:
  • the spatial position information of the first virtual object includes: the target coordinates of the first virtual object determined based on the origin of coordinates in the game scene; the processing unit 1302 is configured to, based on the spatial position information of the first virtual object, determine When playing audio information between the first virtual object and the second virtual object in the game scene, it is specifically used for:
  • each unit in the audio processing device shown in FIG. 13 can be separately or all combined into one or several other units to form, or one (some) units can be split again It is composed of multiple functionally smaller units, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the functions of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit.
  • the audio processing device may also include other units.
  • these functions may also be implemented with the assistance of other units, and may be implemented cooperatively by multiple units.
  • a general-purpose computing device such as a computer that includes processing elements such as a central processing unit (CPU), a random access storage medium (RAM), and a read-only storage medium (ROM) and storage elements
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • the computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device via the computer-readable recording medium, and executed therein.
  • the processing unit 1302 can be used to convert the voice audio of the first game user, so that the converted target audio matches the character attribute of the first virtual object, which ensures that the target audio can While accurately conveying the content that the first game user wants to express, by adjusting the timbre of the target audio to match the timbre of the character attribute of the first virtual object, it avoids exposing the real voice of the first game user and improves the privacy and privacy of the voice.
  • the processing unit 1302 can be used to obtain the spatial position information of the first virtual object in the game scene, so that when the target audio is played based on the spatial position information of the first virtual object, it can represent the position of the first virtual object in the game scene. Three-dimensional position information provides a more realistic sense of three-dimensional space.
  • Fig. 14 shows a schematic structural diagram of an audio processing device provided by an exemplary embodiment of the present application.
  • the audio processing device includes a processor 1401 , a communication interface 1402 and a computer-readable storage medium 1403 .
  • the processor 1401, the communication interface 1402, and the computer-readable storage medium 1403 may be connected through a bus or in other ways.
  • the communication interface 1402 is used for receiving and sending data.
  • the computer-readable storage medium 1403 can be stored in the memory of the audio processing device, the computer-readable storage medium 1403 is used to store a computer program, the computer program includes program instructions, and the processor 1401 is used to execute the program instructions stored in the computer-readable storage medium 1403 .
  • the processor 1401 (or called CPU (Central Processing Unit, central processing unit)) is the computing core and control core of the audio processing device, which is suitable for implementing one or more instructions, specifically for loading and executing one or more instructions so as to Realize the corresponding method flow or corresponding function.
  • CPU Central Processing Unit
  • the embodiment of the present application also provides a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a memory device in an audio processing device, and is used to store programs and data. It can be understood that the computer-readable storage medium here may include a built-in storage medium in the audio processing device, and of course may also include an extended storage medium supported by the audio processing device.
  • a computer-readable storage medium provides storage space storing a processing system of an audio processing device. Moreover, one or more instructions suitable for being loaded and executed by the processor 1401 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes).
  • the computer-readable storage medium here can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; in some embodiments, it can also be at least one A computer-readable storage medium located remotely from the aforementioned processor.
  • one or more instructions are stored in the computer-readable storage medium; the processor 1401 loads and executes one or more instructions stored in the computer-readable storage medium, so as to implement the above audio processing method embodiment The corresponding steps in; In specific implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and perform the following steps:
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and are executed to convert the voice and audio of the first game user to obtain For the target audio, perform the following steps:
  • the voice audio of the first game user is subjected to the first transformation process, and the frequency domain information of the voice audio of the first game user is extracted, and the frequency domain information includes fundamental audio domain information and overtone domain information;
  • the fundamental audio domain information and the modified overtone domain information are fused, and the fused frequency domain information is subjected to a second transformation process to obtain a target audio that matches the role attribute of the first virtual object.
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and are executed to modify the harmonic domain information according to the role attribute of the first virtual object to obtain the modified harmonic domain information, perform the following steps:
  • the audio configuration information includes overtone configuration information, the audio configuration information is determined according to the role attribute of the first virtual object, or the audio configuration information is generated according to the game scene;
  • the overtone domain information is modified to obtain the modified overtone domain information.
  • the spatial position information of the first virtual object includes: the target coordinates of the first virtual object determined based on the coordinate origin in the game scene; one or more instructions in the computer-readable storage medium are loaded by the processor 1401 And when acquiring the spatial position information of the first virtual object manipulated by the first game user in the game scene, specifically perform the following steps:
  • the target point in the game scene includes: camera or light source point;
  • a space coordinate system is established according to the coordinate origin, and target coordinates of the first virtual object are generated based on the space coordinate system.
  • the spatial position information of the first virtual object includes: target distance information and orientation information between the first virtual object and the second virtual object; one or more instructions in the computer-readable storage medium are processed by When the device 1401 loads and acquires the spatial position information of the first virtual object manipulated by the first game user in the game scene, specifically perform the following steps:
  • the orientation calculation is performed on the first position information and the second position information to obtain the orientation information between the first virtual object and the second virtual object.
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and when sending the target audio and the spatial position information of the first virtual object to the second game user, the specific execution is as follows step:
  • Encoding the target audio generating a first audio data packet, and sending the first audio data packet to a second game user using a first data channel;
  • the first data channel is different from the second data channel.
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and when sending the target audio and the spatial position information of the first virtual object to the second game user, the specific execution is as follows step:
  • one or more instructions are stored in the computer-readable storage medium; the processor 1401 loads and executes one or more instructions stored in the computer-readable storage medium, so as to implement the above-mentioned audio processing method Corresponding steps in the example; In specific implementation, one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and perform the following steps:
  • the target audio of the first game user and the spatial position information of the first virtual object is the virtual object manipulated by the first game user in the game scene;
  • the target audio is to convert the voice audio of the first game user The audio obtained later and matched with the character attribute of the first virtual object;
  • the target audio is played according to the spatial position information of the first virtual object, wherein the first virtual object and the second virtual object are in the same game scene, and the second virtual object is a virtual object manipulated by the second game user in the game scene.
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and when the target audio is played according to the spatial position information of the first virtual object, the following steps are specifically performed:
  • one or more instructions in the computer-readable storage medium are loaded by the processor 1401 and executed based on the spatial position information of the first virtual object to determine the first virtual object and the second virtual object in the game scene.
  • the processor 1401 When playing the audio information in between, perform the following steps:
  • the spatial position information of the first virtual object includes: the target coordinates of the first virtual object determined based on the coordinate origin in the game scene; one or more instructions in the computer-readable storage medium are loaded by the processor 1401 And when the audio playback information between the first virtual object and the second virtual object in the game scene is determined based on the spatial position information of the first virtual object, the following steps are specifically performed:
  • the processor 1401 can convert the voice audio of the first game user, so that the converted target audio matches the character attribute of the first virtual object, which ensures that the target audio can be accurately While conveying the content that the first game user wants to express, by adjusting the timbre of the target audio to match the character attribute of the first virtual object, avoid exposing the real voice of the first game user, and improve the privacy and fun of the voice sex.
  • the processor 1401 can obtain the spatial position information of the first virtual object in the game scene, so that when the target audio is played based on the spatial position information of the first virtual object, it can represent the three-dimensionality of the first virtual object in the game scene. Position information provides a more realistic three-dimensional sense of space.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the audio processing device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the audio processing device executes the above audio processing method.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • a computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • a computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted over computer-readable storage media.
  • Computer instructions may be transferred from one website site, computer, server, or data center to another website site by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) , computer, server or data center for transmission.
  • a computer-readable storage medium may be any available medium that can be accessed by a computer, or a data processing device including a server, a data center, and the like integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk, SSD)), etc.
  • a magnetic medium for example, a floppy disk, a hard disk, or a magnetic tape
  • an optical medium for example, a DVD
  • a semiconductor medium for example, a solid state disk (Solid State Disk, SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

本申请实施例公开了一种音频处理方法、装置、设备、介质及程序产品,其中的方法包括:获取第一游戏用户的语音音频以及第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频;将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户,使第二游戏用户根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第二游戏用户操控的第二虚拟对象与第一虚拟对象处于同一游戏场景。

Description

一种音频处理方法、装置、设备、介质及程序产品
本申请要求于2021年12月1日提交中国专利局、申请号为202111460896.8名称为“一种音频处理方法、装置、设备、介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及人工智能领域,尤其涉及一种音频处理方法、一种音频处理装置、一种音频处理设备、一种计算机可读存储介质及一种计算机程序产品。
背景
许多应用场景(如直播场景、游戏场景、视频会议场景等)中,均会涉及对声音的处理。例如,游戏场景中支持对游戏用户的声音进行采集,得到游戏用户的语音音频,并将语音音频传输至游戏中其他游戏用户,实现游戏场景中多个游戏用户之间的语音交流。
技术内容
本申请实施例提供了一种音频处理方法,该方法包括:
获取第一游戏用户的语音音频以及第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频;
将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户,使第二游戏用户根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第二游戏用户操控的第二虚拟对象与第一虚拟对象处于同一游戏场景。
本申请实施例提供了一种音频处理方法,该方法包括:
接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,第一虚拟对象是第一游戏用户在游戏场景中操控的虚拟对象;目标音频是对第一游戏用户的语音音频进行转换处理后得到的,且与第一虚拟对象的角色属性相匹配的音频;
根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第一虚拟对象与第二虚拟对象处于同一游戏场景中,第二虚拟对象是第二游戏用户在游戏场景中操控的虚拟对象。
本申请实施例提供了一种音频处理装置,该装置包括:
获取单元,用于获取第一游戏用户的语音音频以及第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
处理单元,用于对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频;
处理单元,还用于将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户,使第二游戏用户根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第二游戏用户操控的第二虚拟对象与第一虚拟对象处于同一游戏场景。
本申请实施例还提供一种音频处理装置,该装置包括:
接收单元,用于接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,第一虚拟对象是第一游戏用户在游戏场景中操控的虚拟对象;目标音频是对第一游戏用户的语音音频进行转换处理后得到的,且与第一虚拟对象的角色属性相匹配的音频;
处理单元,用于根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第一虚拟对象与第二虚拟对象处于同一游戏场景中,第二虚拟对象是第二游戏用户在游戏场景中操控的虚拟 对象。
本申请实施例提供了一种音频处理设备,该音频处理设备包括:
处理器,用于加载并执行计算机程序;
计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,该计算机程序被处理器执行时,实现上述音频处理方法。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,该计算机程序适于由处理器加载并执行上述音频处理方法。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。音频处理设备的处理器从计算机可读存储介质读取该计算机指令,计算机指令被处理器执行时实现上述的音频处理方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请一个示例性实施例提供的一种音频处理系统的架构示意图;
图2示出了本申请一个示例性实施例提供的一种音频处理方法的流程示意图;
图3示出了本申请一个示例性实施例提供的一种模数转换的流程示意图;
图4示出了本申请一个示例性实施例提供的一种提示打开麦克风的示意图;
图5示出了本申请一个示例性实施例提供的一种目标点为摄像头的游戏场景的示意图;
图6示出了本申请一个示例性实施例提供的一种时域信号变换为频域信号的示意图;
图7a示出了本申请一个示例性实施例提供的一种采用两个不同的数据通道,分别传输目标音频和第一虚拟对象的空间位置信息的示意图;
图7b示出了本申请一个示例性实施例提供的一种采用同一数据通道传输目标音频和第一虚拟对象的空间位置信息的示意图;
图8示出了本申请一个示例性实施例提供的一种音频处理方法的流程示意图;
图9示出了本申请一个示例性实施例提供的一种距离信息与音量信息之间的映射关系的示意图;
图10示出了本申请一个示例性实施例提供的一种音频处理方法的流程示意图;
图11a示出了本申请一个示例性实施例提供的一种源终端执行音频处理方法的流程示意图;
图11b示出了本申请一个示例性实施例提供的一种云端转发服务器执行音频处理方法的流程示意图;
图11c示出了本申请一个示例性实施例提供的一种目标终端执行音频处理方法的流程示意图;
图12示出了本申请一个示例性实施例提供的一种音频处理装置的结构示意图;
图13示出了本申请一个示例性实施例提供的一种音频处理装置的结构示意图;
图14示出了本申请一个示例性实施例提供的一种音频处理设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
但实践发现,相关游戏场景为用户提供了一种较为简单和直接的语音音频处理模式,即将游戏用户的语音音频编码后,就直接传输给其他游戏用户进行语音音频的播放。这使得播放语音音频时呈现的声音效果是平面的,并不能体现出多个游戏用户操控的游戏角色之间的立体空间关系,且语音音频的音色与现实世界中游戏用户的声音相似,导致游戏场景中的语音音频缺乏隐秘性。
本申请实施例提供一种音频处理方法、装置、设备、介质及程序产品,可以提高游戏场景中 语音音频的立体空间感,提升语音音频的隐秘性。
在本申请实施例中,涉及一种适用于本申请实施例提供的音频处理方法的音频处理系统,该音频处理系统的架构示意图可如图1所示;该音频处理系统包括多个终端(如终端101、终端102、……)和服务器(如服务器103、服务器104及服务器105),本申请实施例对终端和服务器的数量不作限定。其中:终端可包括但不限于:智能手机(如Android手机、iOS手机等)、平板电脑、便携式个人计算机、移动互联网设备(Mobile Internet Devices,简称MID)、智能电视、车载设备、头戴设备等可以进行触屏的音频处理设备。终端中可以运行应用程序(可简称为应用,如游戏应用、社交应用、视频应用、web应用、任一应用中部署的游戏类小程序等等)。服务器可以包括但不限于:数据处理服务器、Web服务器、应用服务器、云端服务器(或简称为云服务器)等等具有复杂计算能力的设备。服务器可以是任一应用的后台服务器,用于与运行该任一应用的终端进行交互,以为该任一应用提供计算和应用服务支持。服务器可以是独立的物理服务器,也可以是由多个物理服务器构成的服务器集群或者分布式系统。终端和服务器可以通过有线或无线方式进行直接或间接地通信连接,本申请实施例并不对终端和服务器之间的连接方式进行限定。
基于上述的音频处理系统,提出一种基于游戏场景的音频处理方案。所谓游戏场景可以是指由目标游戏提供的、支持一个或多个游戏玩家(或称为游戏用户)游玩的三维空间场景。例如,目标游戏提供的游戏场景可包括:虚拟对象(即游戏玩家在目标游戏中操控的角色)驾驶载具(如汽车、船等)的场景、虚拟对象持枪械射击的场景、虚拟对象跳伞的场景、……。其中,目标游戏可包括但不限于:客户端游戏、网页游戏、小程序游戏、云游戏、街机游戏、遥控游戏等等。所谓音频是指人类能够听见的所有声音;音频凭借其具有高度同步性、互动性强等优势被广泛应用于各个领域,如音频被应用于游戏领域。举例来说,假设游戏场景中包括游戏用户1和游戏用户2,那么可采集游戏用户1的语音音频,并将语音音频发送至游戏用户2,来实现游戏场景中多个游戏用户之间的信息交流。
在具体实现中,本申请实施例提出的音频处理方案的大致原理可包括:若获取到游戏场景中第一游戏用户(如任一游戏用户)的语音音频,则对该语音音频进行转换处理,使得转换处理后得到的目标音频是与第一虚拟对象的角色属性相匹配的;这不仅能确保目标音频能够准确传递第一游戏用户欲表达的内容,还通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。并且,还可以获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的空间位置,提供给第二游戏用户(参与目标游戏的游戏用户中除第一游戏用户的任一游戏用户)更为真实的立体空间感。
该音频处理方案可以由第一游戏用户所使用的源终端、第二游戏用户所使用的目标终端以及服务器共同执行,或者,由第一游戏用户所使用的源终端中的运行的目标应用(如任一应用)、第二游戏用户所使用的目标终端中的运行的目标应用以及目标应用对应的后台服务器共同执行;为便于阐述,后续均以源终端、目标终端以及服务器来共同执行音频处理方案为例进行介绍。其中,在图1所示的音频处理系统中,第一游戏用户所使用的源终端可以为终端101、第二游戏用户所使用的目标终端可以为终端102,服务器可以是云服务器,云服务器具体可以包括:云端配置服务器103、云端信令服务器104以及云端数据传输服务器105;下面对给出的三种云端服务器进行简单介绍:
①云端配置服务器103可以为目标游戏提供配置服务,具体可为目标游戏的运行提供配置资源。例如,第一游戏用户使用终端101打开目标游戏时,终端101向云端配置服务器103发送数据配置请求,该数据配置请求用于请求云端配置服务器103返回初始化目标游戏所需要的配置资源,以便于终端101基于配置资源初始化目标游戏。②云端信令服务器104用于实现参与目标游戏的多个游戏用户(或多个游戏用户所使用的多个终端)之间的通信连接。具体地,当参与目标游戏的多个游戏用户之间的通信连接发生变化时,均可以通过云端信令服务器实现状态更新(如各个终端的网络状态的更新等);例如,游戏用户1、游戏用户2以及游戏用户3参与同一游戏场景,若检测到游戏用户1所示用的终端与云端信令服务器104断开连接,如游戏用户1下线,则云端信令服务器104向游戏用户2以及游戏用户3发送通知消息,该通知消息用于通知游戏用户1下线。③云端数据传输服务器105用于实现参与目标游戏的多个游戏用户(或多个游戏用户所使用的多个终端)之间的数据转发。例如,云端数据传输服务器105可用于将终端101发送的第一游戏用户的目标音频转发给终端102。上述只是对三种云端服务器的简单介绍,后续会结合具体实施例对三种云端服务器进一步介绍。
需要说明的是,与第一游戏用户处于同一游戏场景的第二游戏用户的数量可以为至少两个,由于任一个第二游戏用户与第一游戏用户之间的音频处理流程是一致的,因此后续均以一个第二游戏用户为例对音频处理方案进行介绍。另外,上述提及的云端配置服务器103、云端信令服务器104以及云端数据转发服务器105是相互独立的云端服务器,终端可以按照需求与三个云端服务器中的任一个或多个进行交互。当然,根据实际应用的需求,本申请实施例还可能涉及其他类型的云端服务器,本申请实施例对云端服务器的类型和数量不作限定。
基于上述描述的音频处理方案,本申请实施例提出更为详细的音频处理方法,下面将结合附图对本申请实施例提出的音频处理方法进行详细介绍。
图2示出了本申请一个示例性实施例提供的一种音频处理方法的流程示意图;本申请实施例以该音频处理方法由上述提及的源终端来执行为例进行说明,该音频处理方法可包括但不限于步骤S201-S204:
S201:获取第一游戏用户的语音音频。
第一游戏用户的语音音频是指:对麦克风捕捉的模拟信号进行声音采集处理得到的数字信号;此处麦克风捕捉的模拟信号是由麦克风对第一游戏用户所处物理环境的声音进行采集得到的。其中,麦克风可部署于第一游戏用户所使用的源终端中,或麦克风是外接于源终端的设备。具体地,在麦克风处于打开状态时,麦克风可对第一游戏用户所处的物理环境中的声音进采集,得到模拟信号;再对采集的模拟信号进行声音采集处理,将模拟信号转换成能够被设备传输的数字信号。模拟信号又称为连续信号,是一种信号与信息的连续变化的物理量表示,例如,信号的幅度、频率或相位随时间作连续变化。数字信号又称为离散信号,相对于模拟信号而言,是指在取值上是离散的、不连续的信号。
正如前述所描述的,数字信号是对模拟信号进行声音采集处理得到的,具体可是使用脉冲编码调制(Pulse Code Modulation,PCM),对模拟信号进行抽样、量化和编码产生的。下面结合图3所示的模数转换的流程示意图,对将模拟信号转换为数字信号的过程进行简单介绍;如图3所示,首先,对连续变化的模拟信号进行抽样,得到离散的抽样值;抽样是指对模拟信号进行周期性扫描,把时间上连续的信号变成时间上离散的信号的过程。其次,对抽样得到的离散的抽样值进行量化,所谓量化是指把经过抽样得到的瞬时值离散的过程,即用一组规定的电平,将瞬时值用最接近的电平值来表示,通常是采用二进制来表示。最后,对量化值进行编码,得到数字信号,所谓编码就是用一组二进制码组来标识每一个有固定电平的量化值。应当理解的是,图3所示的模拟信号的波形以及横纵坐标的取值均是示例性的,在其他应用场景中模拟信号的波形以及横纵坐标的取值可发生适应性变化,特在此说明。
对麦克风在第一游戏用户所处物理环境中采集的模拟信号,执行图3所示的模数转换的具体实现方式后,可将模拟信号转换为能够被源终端处理的数字信号,即获取到第一游戏用户的语音音频。需要说明的是,在麦克风处于关闭状态的情况下,本申请实施例还支持提示第一游戏用户开启麦克风。例如,在源终端的显示屏幕上输出提示消息,该提示消息用于提示第一游戏用户打开麦克风,以便于采集第一游戏用户的语音音频;再如,输出提示语音,该提示语音的语音内容可为“请开启麦克风”;等等。以麦克风部署于源终端为例,当第一游戏用户采用身份标识(如游戏账号、密码、指纹信息、面容信息等)成功登录目标游戏时,若检测到麦克风未被打开,则在源终端的显示屏幕上输出提示消息(如图4所示的提示消息401),以便于第一游戏用户在看到提示消息后能够执行开启麦克风的操作;这样源终端响应于第一游戏用户开启麦克风的操作,执行初始化麦克风的相关参数的步骤,如设置麦克风的采集率(又称为采样频率,是指单位时间内采集的样本数)、声道数(即采集声音时每次生成的声波数据的个数)、采样位数(即每个采样点所采用的比特的数量)等,以实现启动麦克风。
S202:获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息。
其中,第一虚拟对象是第一游戏用户在游戏场景中操控的游戏角色,该游戏角色所执行的动作(如打枪、跳跃、奔跑等)均是由第一游戏用户来操控的,该游戏角色可以包括:人或动物。第一虚拟对象在游戏场景中的空间位置信息可用于表征:第一虚拟对象在游戏场景中的三维位置信息,即(X,Y,Z),X、Y、Z分别对应三个方向的距离,单位为米(或厘米、千米等其他单位);在游戏场景中的空间位置信息可以按照一定比例还原成现实世界中的位置信息。
本申请实施例中,第一虚拟对象的空间位置信息可包括两种;一种实现方式中,第一虚拟对象的空间位置信息可包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标。另一种实 现方式中,第一虚拟对象的空间位置信息可包括:在游戏场景中第一虚拟对象与第二虚拟对象之间的目标距离信息和方位信息。此处的第二虚拟对象是由第二游戏用户操控的,且与第一虚拟对象处于同一游戏场景的游戏角色。其中,第一虚拟对象与第二虚拟对象处于同一游戏场景可是指:操控第一虚拟对象的第一游戏用户,与操控第二虚拟对象的第二游戏用户在目标游戏中进入同一游戏场景(或简单理解为进入同一游戏房间)。值得说明的是,游戏场景往往包含多帧游戏画面,而处于同一游戏场景的第一虚拟对象和第二虚拟对象,并不一定同时显示于游戏场景的每帧游戏画面中;也就是说,按照游戏游玩情况,在游戏场景的一帧图像中可以只包含第一虚拟对象或第二虚拟对象,但第一虚拟对象和第二虚拟对象仍是处于同一游戏场景中的。
下面分别对上述给出的两种第一虚拟对象的空间位置信息的确定方式进行阐述,其中:
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标。此实现方式下,获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息的实现方式可包括:先将游戏场景中的目标点确定为坐标原点;再根据坐标原点建立空间坐标系;最后基于空间坐标系生成第一虚拟对象的目标坐标。其中,游戏场景中的目标点可以包括:摄像头(或称为摄像机)或光源点;游戏场景中的摄像头类似于人类的眼睛,可用于观察游戏场景;游戏场景中的光源点类似于灯光,用于照亮游戏场景,使得游戏场景中可产生物理的阴影,增加游戏场景的真实感和立体感。游戏场景中的摄像头和光源点可以位于相同或不同的位置,本申请实施例对此不作限定。
一种示例性的目标点为摄像头的游戏场景可参见图5,如图5所示,该游戏场景501中包含第一虚拟对象502以及第二虚拟对象503;假设游戏场景501中的摄像头位于水平地面的右边位置,基于摄像头建立空间坐标系504;那么以游戏场景501中所示的第一虚拟对象502和空间坐标系504之间的位置关系,可得到一种示例性的第一虚拟对象的目标坐标(即空间位置信息)为(2,10,0)。不难理解的是,根据摄像头在游戏场景中的设置位置的不同,或基于摄像头建立的空间坐标系的方向不同,第一虚拟对象在游戏场景中的空间位置信息并不相同,本申请实施例对第一虚拟对象的空间位置信息的具体数值不作限定。
在其他实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象与第二虚拟对象之间的目标距离信息和方位信息。此实现方式下,获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息的实现方式可包括:首先获取第一虚拟对象在游戏场景中的第一位置信息,以及第二虚拟对象在游戏场景中的第二位置信息;然后,对第一位置信息和第二位置信息进行距离运算,得到第一虚拟对象与第二虚拟对象之间的目标距离信息;再对第一位置信息和第二位置信息进行方位运算,得到第一虚拟对象和第二虚拟对象之间的方位信息。其中,第一虚拟对象在游戏场景中的第一位置信息可是指前述实施例提及的,第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标(或称为第一坐标);同理,第二虚拟对象在游戏场景中的第二位置信息可是指第二虚拟对象在游戏场景中基于坐标原点确定的第二坐标。
可以理解的是,当第一虚拟对象的空间位置信息为:第一虚拟对象与第二虚拟对象之间的目标距离信息和方位信息时,还可以直接将第一虚拟对象或第二虚拟对象作为目标点来建立空间坐标系;例如,将第一虚拟对象作为目标点建立空间坐标系时,第一虚拟对象的第一坐标默认为(0,0,0),那么就可以只计算第二虚拟对象在空间坐标系中的第二坐标;在一定程度上可减少计算空间位置信息的计算量,提高数据处理效率。
继续以图5所示的游戏场景为例,在以摄像头为坐标原点建立的空间坐标系504中,假设计算得到的第二虚拟对象的第二坐标(即第二位置信息)为(8,0,1),第一虚拟对象的第一坐标(即第一位置信息)为(2,10,0);则对第一坐标和第二坐标进行距离运算可得到第一虚拟对象和第二虚拟对象之间的目标距离信息大约为11.7,本申请实施例对第一虚拟对象和第二虚拟对象之间的距离运算的具体实现方式不作限定,例如,可以通过计算第一坐标和第二坐标对应的坐标值的差值,再对三个差值的平方和开根号,得到第一虚拟对象和第二虚拟对象之间的目标距离信息。同理,对第一坐标和第二坐标进行方位运算可得到第一虚拟对象和第二虚拟对象之间的方位信息大致为:在x轴方向上第一虚拟对象相比于第二虚拟对象更靠近坐标原点,在y轴方向上第一虚拟对象相比于第二虚拟对象远离坐标原点,在z轴方向上第一虚拟对象相比于第二虚拟对象更靠近原点。
为更好地理解第一虚拟对象和第二虚拟对象的方位信息,本申请实施例引入第二虚拟对象的正面朝向来表述第一虚拟对象和第二虚拟对象的方位信息。如图5所示的第二虚拟对象的正面朝向y轴正方向,则此时第一虚拟对象和第二虚拟对象之间的方位信息可表述为:第一虚拟对象位 于第二虚拟对象的左上方大约30°的位置。当然,根据第二虚拟对象的正面朝向的不同,第一虚拟对象和第二虚拟对象之间的方位信息可表述为其他内容;例如,第二虚拟对象的正面朝向x轴负方向,则此时第一虚拟对象和第二虚拟对象之间的方位信息可表述为:第一虚拟对象位于第二虚拟对象的右上方大约60°的位置。
S203:对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频。
正如前述步骤S201所描述的,第一游戏用户的语音音频是对麦克风采集的第一游戏用户的声音进行声音采集处理得到的,该语音音频的音色与第一游戏用户的真实声音的音色是相似的;若直接对第一游戏用户的语音音频进行播放,那么第二游戏用户很可能基于第一游戏用户的语音音频的音色,识别出第一游戏用户的真实身份,导致第一游戏用户的真实身份的暴露。基于此,本申请实施例支持对第一游戏用户的语音音频进行转换处理,转换处理得到的目标音频的音色是与语音音频的音色不同的;这能确保第二游戏用户基于目标音频并不能识别出第一游戏用户的真实身份,提高声音的隐秘性和趣味性。
在具体实现中,对第一游戏用户的语音音频进行转换处理的步骤可包括但不限于步骤s11-s13,其中:
s11:将第一游戏用户的语音音频进行第一变换处理,提取到第一游戏用户的语音音频的频域信息。
需要说明的是,在自然环境(或称为物理环境)中产生的声音都是由发音物体(或简称为发声体,如第一游戏用户)发出的一系列频率、振幅各不相同的振动复合(或叠加)而成的。其中,将多个振动中频率最低的振动所发出的音称为基音,基音往往是由发音物体整体振动所产生的声音,可决定声音的音高,用于表达声音的主要内容;而将多个振动中除频率最低的振动产生的音以外的音称为泛音,泛音往往是发音物体部分振动所产生的声音,可决定声音的音色(如音色为稚嫩音色、低层音色、粗犷音色等)。
基于上述对声音的描述可知,第一游戏用户的语音音频是由基音和至少一种频率对应的泛音叠加而成的时域信号,该时域信号体现在坐标轴上的波形表现为随时间连续变化的信号,坐标轴的横坐标是时间,纵坐标是信号的变化。当对语音音频进行第一变换处理时,其实质是对语音音频的波形进行第一变换处理,即将波形中的每个频率拆开来,再在纵轴上展开,横坐标是频率,就可以得到语音音频对应的频域信息(或称为频域信号);该频域信息包括:基于语音音频中基音的频率变换得到的基音频域信息,以及基于语音音频中泛音的频率变换得到的泛音频域信息。其中,上述描述的第一变换处理是指傅立叶变换处理(或简称为傅立叶变换),傅立叶变换是一种将信号转换成频率的技术,即时域转换到频域的变换方法。一种示例性的将时域信号变换为频域信号的示意图可参见图6,如图6所示,将时域信号的波形中的每个频率拆开来,并将各个频率的数值映射至横坐标,将频率对应的幅度值映射至纵坐标,就可以得到时域信号对应的频域信号。
s12:按照第一虚拟对象的角色属性对泛音频域信息进行修改,得到修改后的泛音频域信息。
正如步骤s11所描述的,语音音频的频域信息包括基音频域信息和泛音频域信息,而基音频域信息决定了第一游戏用户欲表达的内容,泛音频域信息决定第一游戏用户的声音的音色。考虑到在游戏场景中既要确保准确地传递第一游戏用户欲表达的内容,又要提升第一游戏用户的声音的隐秘性;因此,本申请实施例支持对泛音频域信息进行修改,使得修改后的泛音频域信息指示的音色与第一游戏用户的真实声音的音色不同,且能够正确表达第一游戏用户欲表达的内容。
具体地,可按照第一虚拟对象的角色属性对泛音频域信息进行修改。具体实现过程可包括:先获取第一虚拟对象的角色属性对应的音频配置信息,该音频配置信息包括泛音配置信息;再根据泛音配置信息,对泛音频域信息进行修改,得到修改后的泛音频域信息。也就是说,在获取到第一虚拟对象的角色属性信息对应的音频配置信息后,可采用音频配置信息对泛音频域信息(如频域信息对应的频率段中的泛音频率段)进行修改,此处的修改可是指对泛音频域信息进行增益(如放大泛音频域信息中的幅度值)或衰减(如缩小泛音频域信息中的幅度值),得到修改后的泛音频域信息。
其中,第一虚拟对象的角色属性可包括但不限于:年龄属性、性别属性、样貌属性等等,不同角色属性的虚拟对象对应的声音的音色并不相同。第一虚拟对象的角色属性对应的音频配置信息是根据第一虚拟对象的角色属性确定的;例如,第一虚拟对象的角色属性1包括“12岁、女性”时的音频配置信息1与第一虚拟对象的角色属性2包括“60岁、女性”时的音频配置信息2并不相同,体现在音色上音频配置信息1表现的音色相比于音频配置信息2表现的音色更加稚嫩、清脆。 其中,不同角色属性的音频配置信息是由业务人员事先设置好的,当第一游戏用户在选择或配置第一虚拟对象的角色属性时,可根据第一游戏用户选择或配置的角色属性确定第一虚拟对象对应的音频配置信息。另外,第一虚拟对象的角色属性对应的音频配置信息还可以是根据游戏场景生成的;此实现方式下,游戏场景中多个游戏玩家的语音音频被修改后,修改后的语音音频的音色是相同的。
综上所述,不管音频配置信息是根据第一虚拟对象的角色配置确定的,还是根据游戏场景生成的,采用音频配置信息对泛音频域信息进行修改后,修改后的泛音频域信息指示的音色均与第一游戏用户的真实声音的音色不同,提升声音的隐秘性。并且,在音频配置信息是根据第一虚拟对象的角色配置确定的这种实现方式下,由于参与游戏场景的多个游戏玩家操控的多个虚拟对象的角色属性并不相同,那么根据不同角色信息对应的音频配置信息修改后的泛音频域信息并不相同,这使得多个游戏用户发出的声音的音色不同,在一定程度上实现游戏场景中游戏声音的唯一性,提高目标游戏的趣味性,进而提升游戏用户粘性。
s13:融合基音频域信息和修改后的泛音频域信息,并对融合后的频域信息进行第二变换处理,得到与第一虚拟对象的角色属性相匹配的目标音频。
由于基音频域信息决定了第一游戏用户欲表达的内容,因此将基音频域信息和修改后的泛音频域信息融合后,得到的融合的频域信息不仅可以准确表达第一游戏用户欲表达的内容,还可以改变第一游戏用户的声音的音色,提升目标游戏的隐秘性。在得到融合后的频域信息后,本申请实施例还对融合后的频域信息进行第二变换处理,使得频域信息变换为时域对应的目标音频。与前述提及的第一变换处理对应的,此处的第二变换处理是逆傅立叶变换,该逆傅立叶变换能够将频域信号变换为时域信号。其中,逆傅立叶变换的处理过程与前述提及的傅立叶变换的处理过程是类似的,本申请实施例在此不作详细描述。
通过上述步骤s11-s13所示的具体实现过程,可将第一游戏用户的语音音频进行转换处理,得到音色变化后的目标音频,即转换处理是为了改变语音音频的音色;这样就将与第一游戏用户的真实声音的音色一致的语音音频,变换为音色与第一虚拟对象的角色属性相匹配的目标音频;在准确传递第一游戏用户欲表达的内容的前提下,改变了传递的声音的音色,使得与第一游戏用户参与同一游戏场景的第二游戏用户,不容易察觉第一游戏用户的真实身份,提升目标游戏的趣味性,提高游戏用户的粘性。
S204:将目标音频和第一虚拟对象的空间位置信息发送至第二游戏用户。
本申请实施例支持将目标音频和第一虚拟对象的空间位置信息发送至第二游戏用户,第二游戏用户所操控的第二虚拟对象与第一游戏用户所操控的第一虚拟对象处于同一场景中。这样第二游戏用户在接收到目标音频和第一虚拟对象的空间位置信息后,可根据第一虚拟对象的空间位置信息对目标音频进行播放,具体是根据第一虚拟对象和第二虚拟对象之间的目标距离信息和方位信息对目标音频进行播放。例如,第一虚拟对象的空间位置信息指示:第一虚拟对象与第二虚拟对象之间的距离较近时,播放目标音频时的音量较大,以使第二游戏用户了解到第一虚拟对象与第二虚拟对象之间的距离较近;反之,当第一虚拟对象和第二虚拟对象之间的距离较大时,播放目标音频时的音量较小,以使第二游戏用户了解到第一虚拟对象与第二虚拟对象之间的距离较远。再如,第一虚拟对象的空间位置信息指示:第一虚拟对象位于第二虚拟对象的正后方(或其他方向),则播放目标音频时,第二游戏用户感受到的声音来源是正后方,这使得第二游戏用户能够感受到较为立体的听觉感受,提高游戏场景的真实性。
本申请实施例支持采用相互独立的数据通道,将目标音频和第一虚拟对象的空间位置信息独立发送到第二游戏用户;或者,采用同一数据通道将目标音频和第一虚拟对象的空间位置信息发送到第二游戏用户。下面对上述这两种传输方式进行介绍;其中:
1)采用相互独立的数据通道将目标音频和第一虚拟对象的空间位置信息独立发送到第二游戏用户。具体实现中,首先,对目标音频进行编码,生成第一音频数据包;此处的编码与前述提及的脉冲编码调制中的编码并不相同,此处的编码是采用压缩算法(compaction algorithm)对目标音频进行压缩,以减少目标音频的占用空间,可提升数据传输效率和速度,减少数据传输能耗;压缩算法是指数据压缩的算法,在电子与通信领域也常被称为信号编码,包括压缩和还原(或编码和解码),压缩可包括但不限于:字典算法、固定位长算法(Fixed Bit Length Packing)、行程长度编码(run-length encoding,RLE)等等。其次,采用第一数据通道将编码得到的第一音频数据包发送至第二游戏用户。最后,采用第二数据通道将第一虚拟对象的空间位置信息,发送至第二游戏用户;具体是基于第一虚拟对象的空间位置信息生成第二音频数据包,并将第二音频数据包 发送至第二游戏用户,该第二音频数据包所包含的内容可以为“pos:x=5;y=6;z=7”,表示第一虚拟对象处于游戏场景中的x,y,z坐标分别为5、6、7米。当然,如果第一虚拟对象的空间位置信息的数据量较大或存在冗余,那么在采用第二数据通道发送第一虚拟对象的空间位置信息之前,也可以对基于空间位置信息生成的第二音频数据包进行编码,并采用第二数据通道发送编码后的第二音频数据包。其中,第一数据通道和第二数据通道不同。
一种示例性的采用两个不同的数据通道分别传输目标音频和第一虚拟对象的空间位置信息的示意图可参见图7a;如图7a所示,第一游戏用户操控的终端101可以采用第一数据通道,将第一音频数据包发送至云端数据转发服务器105,以便于云端数据转发服务器105采用第一数据通道将第一音频数据包转发至第二游戏用户操控的终端102;同理,第一游戏用户操控的终端101采用第二数据通道将第二音频数据包发送至云端数据转发服务器105,以便于云端数据转发服务器105采用第二数据通道将第二音频数据包转发至第二游戏用户操控的终端102。
需要说明的是,本申请实施例并不对发送目标音频和第一虚拟对象的空间位置信息的先后顺序进行限定。也就是说,可以先采用第一数据通道将目标音频发送至第二游戏用户,再采用第二数据通道将第一虚拟对象的空间位置信息发送至第二游戏用户;或者,先采用第二数据通道将第一虚拟对象的空间位置信息发送至第二游戏用户,再采用第一数据通道将目标音频发送至第二游戏用户;或者,同时采用第一数据通道将目标音频发送至第二游戏用户,以及采用第二数据通道将第一虚拟对象的空间位置信息发送至第二游戏用户。
2)采用同一数据通道将目标音频和第一虚拟对象的空间位置信息发送至第二游戏用户。具体实现中,首先,对目标音频进行编码生成第一音频数据包,此处编码的具体实现方式,可参见前述实现方式1)所示的具体实现方式的相关描述,在此不作赘述。其次,将第一虚拟对象的空间位置信息附加至第一音频数据包;具体可包括:将第一虚拟对象的空间位置信息附加至第一音频数据包的包尾或包首;例如,将第一虚拟对象的空间位置信息附加至第一音频数据包的包尾时,附加第一虚拟对象的空间位置信息的第一音频数据包的内容为“[voice_data][type=pos;len=12;x=5;y=6;z=7]”,表示第一音频数据包后叠加类型为“pos”,长度为12字节,值为“x=5;y=6;z=7”的空间位置信息。最后,将附加第一虚拟对象的空间位置信息的第一音频数据包,发送至第二游戏用户。一种示例性的将附加第一虚拟对象的空间位置信息的第一音频数据包,发送至第二游戏用户的示意图可参见图7b。
本申请实施例中,可对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。另外,可获取第一虚拟对象在游戏场景中的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的空间位置,提供给第二游戏用户更为真实的立体空间感。
图8示出了本申请一个示例性实施例提供的一种音频处理方法的流程示意图;本申请实施例以该音频处理方法由上述提及的目标终端来执行为例进行说明,该音频处理方法可包括但不限于步骤S801-S802:
S801:接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息。
其中,第一虚拟对象是第一游戏用户在游戏场景中操控的虚拟对象;目标音频是对第一游戏用户的语音音频进行转换处理得到的,且与第一虚拟对象的角色属性相匹配的音频;具体地对第一游戏用户的语音音频进行转换处理得到目标音频的具体实现方式,可参见图2所示实施例中步骤S202所示的具体实现方式的相关描述,在此不作赘述。
正如前述图2所示实施例中步骤S204所描述的,第一游戏用户可采用独立的数据通道发送目标音频和第一虚拟对象的空间位置信息,或者,采用同一数据通道发送目标音频和第一虚拟对象的空间位置信息。那么当第一游戏用户采用第一数据通道发送目标音频,以及采用第二数据通道发送第一虚拟对象的空间位置信息时,第二游戏用户通过第一数据通道接收目标音频以及通过第二数据通道接收第一虚拟对象的空间位置信息;同理,当第一游戏用户采用同一数据通道将第一虚拟对象的空间位置信息以及目标音频发送至第二游戏用户时,第二游戏用户采用该同一数据通道接收第一虚拟对象的空间位置信息以及目标音频。
S802:根据第一虚拟对象的空间位置信息对目标音频进行播放。
在具体实现中,基于第一虚拟对象的空间位置信息,确定第一虚拟对象与第二虚拟对象之间的音频播放信息,该音频播放信息包括音频音量信息和音频方位信息;再按照音频播放信息对目 标音频进行播放。其中,音频播放信息所包含的音频音量信息,是根据游戏场景中第一虚拟对象和第二虚拟对象之间的目标距离信息确定的,音频音量信息用于指示对目标音频进行播放时的音量大小;音频音量信息的单位可为分贝,如音频音量信息为100分贝。音频播放信息所包含的音频方位信息,是根据游戏场景中第一虚拟对象和第二虚拟对象之间的方位信息确定的,音频方位信息用于指示对目标音频进行播放时的声音来源方向;音频方位信息可包括:第一游戏虚拟对象和第二游戏虚拟对象在游戏场景中的方位角度,如第一游戏虚拟对象位于第二游戏用户的左上方30°。
下面分别对确定音频音量信息和音频方位信息的实现方式进行介绍,其中:
1)音频播放信息包括音频音量信息。基于第一虚拟对象的空间位置信息确定音频音量信息的实现方式可包括:
首先,基于第一虚拟对象的空间位置信息,得到第一虚拟对象与第二虚拟对象之间的目标距离信息。根据第一虚拟对象的空间位置信息所包含的内容不同,确定目标距离信息的方式并不相同。例如:当第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标时,可先在游戏场景中确定第二虚拟对象的第二坐标,再根据第一虚拟对象的目标坐标和第二虚拟对象的第二坐标,计算第一虚拟对象和第二虚拟对象之间的目标距离信息;其中,在游戏场景中确定第二虚拟对象的第二坐标的方式可参见前述图2所示实施例中相关内容的相关描述,在此不作赘述。再如:当第一虚拟对象的空间位置信息包括:第一虚拟对象与第二虚拟对象之间的目标距离信息时,可直接从空间位置信息中获取第一虚拟对象与第二虚拟对象之间的目标距离信息。
其次,获取不同距离信息与音量信息之间的映射关系。可以理解的是,根据第一虚拟对象与第二虚拟对象之间的不同的距离信息,可映射得到与各个距离信息对应的音量信息;这样对于第二游戏用户来说,其听到的目标音频的音量也并不相同。例如:距离信息指示第一虚拟对象和第二虚拟对象之间相差2米时,与该距离信息具有映射关系的音量信息可以为100分贝(参见图9);再如:距离信息指示第一虚拟对象和第二虚拟对象之间相差10米时,与该距离信息具有映射关系的音量信息可以为20分贝(参见图9);分贝值越高,表示第二游戏用户听到的音量越大。需要说明的是,图9只是给出的一种示例性的距离信息与音频信息之间的映射关系,在实际应用场景中,根据不同的发音物体以及不同的声音传播介质,距离信息与音频信息之间的映射关系与图9所示的映射关系可能并不相同;本申请实施例对距离信息与音量信息之间的映射关系不作限定。
最后,根据映射关系和目标距离信息,确定第一虚拟对象与第二虚拟对象之间的音频音量信息。举例来说,假设目标距离信息指示第一虚拟对象和第二虚拟对象之间的距离为6米,则将目标距离信息与图9所示的映射关系中的各个距离信息进行匹配,可得到6米对应的音量信息大约为33.3分贝,则将33.3分贝作为第一虚拟对象与第二虚拟对象之间的音频音量信息。
2)音频播放信息包括音频方位信息。正如前述所描述的,第一虚拟对象的空间位置信息可包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标,或,第一虚拟对象与第二虚拟对象之间的方位信息;那么根据第一虚拟对象的空间位置信息所包含的内容不同,确定音频方位信息的方式并不相同。例如:当第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标时,可先在游戏场景中确定第二虚拟对象的第二坐标,再根据第一虚拟对象的目标坐标和第二虚拟对象的第二坐标,计算第一虚拟对象和第二虚拟对象之间的音频方位信息;其中,在游戏场景中确定第二虚拟对象的第二坐标的实现方式,以及,根据第二虚拟对象的第二坐标以及第一虚拟对象的目标坐标确定音频方位信息的实现方式,可参见前述图2所示实施例中相关内容的相关描述,在此不作赘述。再如:当第一虚拟对象的空间位置信息包括:第一虚拟对象与第二虚拟对象之间的方位信息时,可直接从空间位置信息中获取第一虚拟对象与第二虚拟对象之间的方位信息,该方位信息即为音频方位信息。
基于上述实现方式1)和实现方式2),得到第一虚拟对象和第二虚拟对象之间的音频音量信息和音频方位信息后,本申请实施例再根据音频音量信息和音频方位信息对目标音频进行播放,使得播放的目标音频能够体现第一虚拟对象和第二虚拟对象在游戏场景中的距离和方向。其中,根据第二游戏用户所处的物理环境中包含的设备条件的不同,根据音频音量信息和音频方位信息对目标音频进行播放的实现方式并不相同。下面分别以第二游戏用户所处的物理环境中包含多个喇叭,或第二游戏用户持有的目标终端可调用目标声学函数为例为例,对根据音频音量信息和音频方位信息对目标音频进行播放的实现方式进行示例性介绍,其中:
在一种实现方式中,假设第二游戏用户所处的物理环境中包含多个喇叭,那么可先对多个喇 叭进行调整,使得调整后的喇叭播放目标音频时,能够体现出第一虚拟对象和第二虚拟对象之间的方向;然后,再根据音频音量信息和调整后的多个喇叭播放目标音频。其中,对多个喇叭的调整可包括:对多个喇叭的摆放位置、播放模式或功率等进行调整;本申请实施例对具体的调整方式不作限定。基于此,根据音频音量信息播放目标音频时可体现第一虚拟对象和第二虚拟对象之间的距离,且根据调整后的多个喇叭播放目标音频时可体现第一虚拟对象和第二虚拟对象之间的方向或方位,使得多个喇叭所产生的音效形成环绕立体音效。
其他实现方式中,若第二游戏用户所使用的目标终端中开启了音效定位模式(如HRTF模式),那么可先调用目标声学函数对目标音频进行过滤处理,得到过滤处理后的目标音频;当播放该过滤处理后的目标音频时,人耳能够感知在游戏场景中第一虚拟对象位于第二虚拟对象的哪个方向;然后,再根据音频音量信息播放过滤处理后的目标音频,此时人耳可根据音频音量信息感知第一虚拟对象和第二虚拟对象之间距离,且根据过滤处理后的目标音频感知第一虚拟对象和第二虚拟对象之间的方向。其中,目标声学函数可以包括头相关传输函数(Head Related Transfer Functions,HRTF),此时音效定位模式可是指HRTF模式。HRTF又称为ATF(anatomical transfer function),是一种音效定位算法;HRTF作为一组滤波器,其利用耳间时间延迟(Interaural Time Delay,HDITD)、双耳幅度差(Interaural Amplitude Difference,IAD)和耳廓频率振动等技术,可实时处理目标音频,使得处理后的目标音频能够产生立体音效,这样处理后的目标音频的声音传递至人耳内的耳廓、耳道和鼓膜时,人耳会有环绕音效的感觉。那么采用头相关传输函数对目标音频进行过滤处理可包括:可将音频方位信息作为头相关传输函数的输入信息,得到新的头相关传输函数;再采用该新的头相关传输函数对目标音频进行过滤处理,得到过滤处理后的目标音频。需要说明的是,本申请实施例并不限定目标声学函数只为头相关传输函数;为便于说明,本申请实施例只是以目标声学函数为头相关传输函数为例,对采用头相关传输函数对目标音频的处理过程进行介绍。
综上所述,本申请实施例可按照音频音量信息和音频方位信息对目标音频进行播放,使得播放的目标音频能够体现三维立体的空间特征,提供立体的听觉感受,进而提升游戏用户的沉浸式游戏体验。并且,对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。
上述图2和图8所示实施例,分别对源终端和目标终端执行音频处理方法的实现方式进行了详细介绍,下面结合图10并以该音频处理方法由源终端、目标终端以及服务器(如云端服务器)共同执行为例,对音频处理方案的整体流程进行说明;该音频处理方法可包括但不限于步骤S1001-S1018:
S1001、源终端向云端配置服务器发送数据配置请求。
S1002、源终端接收云端配置服务器响应于所述数据配置请求返回的配置信息。
S1003、源终端根据所述配置信息初始化目标游戏。
步骤S1001-S1003中,当第一游戏用户通过源终端打开并使用目标游戏时,源终端向云端配置服务器发送数据配置请求,该数据配置请求用于请求云端配置服务器查找并返回与目标游戏、第一游戏用户相关的配置信息。其中,配置信息可包括运行目标游戏所需要的配置资源,该配置资源包括:配置参数(如游戏画面的分辨率、目标游戏的系统框架、游戏数据等),以及第一游戏用户在目标游戏的游戏场景中操控的第一虚拟对象的角色属性(如性别、年龄等)等。源终端在接收到云端配置服务器响应于数据配置请求所返回的配置资源后,可基于配置资源初始化目标游戏(如加载运行目标游戏的配置资源),使得第一游戏用户成功启动目标游戏。
S1004、源终端检测到进入目标游戏的游戏场景的触发事件。
S1005、源终端向云端信令服务器发送状态变化请求。
S1006、源终端接收云端信令服务器返回的反馈结果。
步骤S1004-S1006中,触发进入目标游戏的游戏场景的触发事件可以包括:在目标游戏的第一游戏界面中对进入游戏场景的入口执行触发操作时所产生的事件、接收到第二游戏用户的游戏邀请时所产生的事件、接收到控制进入游戏场景的语音信号时所产生的事件等等。当产生触发事件时,表示第一游戏用户欲进入游戏场景,此时源终端可基于触发事件生成状态变化请求,该状态变化请求中携带游戏场景的场景标识(如沙漠场景、竞技场景、雪天场景、雨林场景等),该状 态变化请求用于请求云端信令服务器记录第一游戏用户游玩的游戏场景的相关游戏数据(如游戏开始时间、游戏场景的场景标识、处于同一游戏场景的第二游戏用户的相关信息等)。云端信令服务器响应于源终端发送的状态变化请求可返回反馈结果给源终端,该反馈结果可包括:成功或失败进入游戏场景、与第一虚拟对象处于同一游戏场景的第二游戏用户所操控的第二虚拟对象的对象数据(如昵称、历史游戏记录、游戏等级等)。
S1007、目标终端向云端配置服务器发送数据配置请求。
S1008、目标终端接收云端配置服务器响应于所述数据配置请求返回的配置信息。
S1009、目标终端根据所述配置信息初始化目标游戏。
需要说明的是,步骤S1007-S1009所示的具体实现方式,可参见步骤S1001-S1003所示的具体实现方式的相关描述,只是步骤S1007-S1009所示的具体实现方式是由目标终端来执行的,而S1001-S1003所示的具体实现方式是由源终端来执行;在此对步骤S1007-S1009所示的具体实现方式的不作赘述。
S1010、目标终端检测到进入目标游戏的游戏场景的触发事件。
S1011、目标终端向云端信令服务器发送状态变化请求。
S1012、目标终端接收云端信令服务器返回的反馈结果。
需要说明的是,步骤S1010-S1012所示的具体实现方式,可参见步骤S1004-S1006所示的具体实现方式的相关描述,只是步骤S1010-S1012所示的具体实现方式是由目标终端来执行的,而S1004-S1006所示的具体实现方式是由源终端来执行;在此对步骤S1010-S1012所示的具体实现方式的不作赘述。
S1013、源终端获取第一游戏用户的语音音频,以及获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息。
需要说明的是,步骤S1013的具体实现方式,可参见图2所示实施例中步骤S201所示的具体实现方式的相关描述,在此不作赘述。
另外,若第一游戏用户是首次登录目标游戏,则在获取第一游戏用户的语音音频之前,本申请实施例还支持第一游戏用户在目标游戏中选择第一虚拟对象,这样后续第一游戏用户可操控第一虚拟对象进行目标游戏的游玩。其中,选择第一虚拟对象可理解为选择或设置第一虚拟对象的角色属性,例如,选择第一虚拟对象的角色属性为女性、设置第一虚拟对象的年龄为20岁等等。另外,若在获取第一游戏用户的语音音频之前,检测到麦克风未被打开,本申请实施例还支持通知第一游戏用户打开麦克风,其中,打开麦克风的具体实现过程可参见前述步骤S201所示的相关描述,在此不作赘述。
S1014、源终端对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频。
需要说明的是,步骤S1014的具体实现方式,可参见图2所示实施例中步骤S202所示的具体实现方式的相关描述,在此不作赘述。
本申请实施例在对第一游戏用户的语音音频进行转换处理之前,还支持对第一游戏用户的语音音频进行声音前处理,得到前处理后的语音音频,该前处理后的语音音频相比于前处理之前的语音音频,过滤掉语音音频中环境产生或电路产生的干扰信号、噪声信号,使得音频质量更高,提升语音音频的清晰性。其中,声音前处理可包括但不限于:回声消除、降噪、声音活动检测等处理方法;本申请实施例对声音前处理的具体实现方式不作限定,在此说明。
S1015、源终端将目标音频与第一虚拟对象的空间位置信息发送至云端数据转发服务器。
需要说明的是,步骤S1015的具体实现方式,可参见图2所示实施例中步骤S203所示的具体实现方式的相关描述,在此不作赘述。
S1016、云端数据转发服务器将目标音频和第一虚拟对象的空间位置信息,发送至目标终端。
S1017、目标终端接收云端数据转发服务器转发的目标音频和第一虚拟对象的空间位置信息。
S1018、根据第一虚拟对象的空间位置信息对目标音频进行播放。
需要说明的是,步骤S1016-S1018的具体实现方式,可参见图8所示实施例中步骤S801-S802所示的具体实现方式的相关描述,在此不作赘述。
可以理解的是,源终端中采集的第一游戏用户的语音音频往往不止一个,那么源终端可以将采集的多个语音音频经过转换处理后,生成各个语音音频对应的目标音频,并将各个目标音频进行编码,得到各个语音音频对应的第一音频数据包,以及将各个第一音频数据包和各个语音音频对应的空间位置信息发送至云端转发服务器。这样目标终端可以接收到多个第一音频数据包和对 应的空间位置信息。为便于目标终端能够完整、准确地传达第一游戏用户的语音音频,本申请实施例支持在接收到多个第一音频数据包和对应的空间位置信息后,对多个第一音频数据包和对应的空间位置信息进行缓冲排序,所谓缓冲排序是指对接收到的信号(如多个第一音频数据包和对应的空间位置信息)按源终端的产生的先后顺序进行排序存储,使得后续按照缓冲排序的先后顺序对多个第一音频数据包中的目标音频进行播放时,能够准备传递第一游戏用户欲表达的内容。
另外,正如前述步骤S202所示的相关描述,源终端是将对目标音频编码后生成的第一音频数据包发送至云端数据转发服务器,以便于云端数据转发服务器将第一音频数据包转发至目标终端;那么目标终端接收到云端数据转发服务器转发的第一音频数据包后,还会对第一音频数据包进行解码,才得到目标音频。解码是采用解压缩算法对第一音频数据包进行解压缩,恢复目标音频的过程;编码与解码的过程是对应的,即目标终端需采用与源终端采用的压缩算法对应的解压缩算法对第一音频数据包进行解压缩。
上述步骤S1001-S1018示出了源终端、目标终端以及服务器(如云端配置服务器、云端信令服务器以及云端数据转发服务器)共同执行音频处理方法的流程。下面结合图11a、图11b以及图11c给出各个步骤的执行主体;其中:
源终端可以执行步骤S1001-S1006,以及步骤S1013-S1015所示的具体实现过程;一种源终端执行步骤S1001-S1006,以及步骤S1013-S1015的流程示意图可参见图11a。如图11a所示,源终端执行音频处理方法的流程包括:开始(如打开目标游戏)→初始化目标游戏→设置第一虚拟对象→开启麦克风→获取第一游戏用户的语音音频→声音前处理→第一变换处理(如傅立叶变换)→泛音特性化(如对语音音频对应的泛音频域信息的修改)→第二变换处理(如逆傅立叶变换)→编码→获取第一虚拟对象的空间位置信息→发送目标音频和空间位置信息至云端数据转发服务器→结束(如释放系统软硬件资源,退出目标游戏)。需要说明的是,上述流程的具体实现过程可参见图2或图10所示实施例的相关描述,在此不作赘述;另外,在具体实现过程中上述流程中所有步骤并非一定要全部执行,如可以不对第一游戏用户的语音音频进行泛音特性化,这样播放语音音频时仍然可体会立体的听觉感觉,只是语音音频的音色与第一游戏用户的真实声音是相似的。
云端服务器包括云端配置服务、云端信令服务器以及云端数据转发服务器,在音频处理方法中不同云端服务器起到的作用并不相同;例如,云端数据转发服务器用于实现源终端和目标终端之间的数据转发。下面结合图11b以给出云端数据转发服务器执行音频处理方法的流程,如图11b所示,云端数据转发服务器执行音频处理方法的流程包括:开始→初始化系统(如响应系统资源申请,准备接收数据(如语音音频))→接收数据(如循环接收源终端发送的语音音频和空间位置信息)→转发数据(如将接收到的数据转发到目标终端,如步骤S1016)→结束(如释放系统软硬件资源,退出目标游戏)。上述流程的具体实现过程可参见图2、图8或图10所示实施例的相关描述,在此不作赘述。
目标终端可以执行步骤S1007-S1012,以及步骤S1017-S1018所示的具体实现过程;一种目标终端执行步骤S1007-S1012,以及步骤S1017-S1018的流程示意图可参见图11c。如图11c所示,目标终端执行音频处理方法的流程包括:开始→初始化目标游戏→缓冲排序→解码→获取第一虚拟对象的空间位置信息→根据空间位置信息播放目标音频→结束。上述流程的具体实现过程可参见图8或图10所示实施例的相关描述,在此不作赘述。
值得说明的是,本申请实施例对步骤S1001-S1018中各个步骤执行的先后顺序不作限定,例如,可以先执行步骤S1001-S1003实现源终端对目标游戏的初始化,再执行步骤S1007-S1009实现目标终端对目标游戏的初始化;或者,同时执行的步骤S1001-S1003和步骤S1007-S1009;或者,先执行步骤S1007-S1009实现目标终端对目标游戏的初始化,再执行步骤S1001-S1003实现源终端对目标游戏的初始化。另外,步骤S1001-S1018只是给出音频处理方法的部分流程步骤,在实际应用场景中,音频处理方法还可以包括其他步骤,本申请实施例并不限定音频处理方法的具体实现步骤。
本申请实施例中,一方面,可对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。另一方面,可获取第一虚拟对象在游戏场景中的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的三维位置信息,提供更为真实的立体空间感。
上述详细阐述了本申请实施例的方法,为了便于更好地实施本申请实施例的上述方法,相应地,下面提供了本申请实施例的装置。
图12示出了本申请一个示例性实施例提供的一种音频处理装置的结构示意图,该音频处理装置可以是运行于源终端中的一个计算机程序(包括程序代码);该音频处理装置可以用于执行图2及图10所示的方法实施例中的部分或全部步骤,该音频处理装置包括如下单元:
获取单元1201,用于获取第一游戏用户的语音音频,以及获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
处理单元1202,用于对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频;
处理单元1202,还用于将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户,使第二游戏用户根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第二游戏用户操控的第二虚拟对象与第一虚拟对象处于同一游戏场景。
在一种实现方式中,处理单元1202,用于对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频时,具体用于:
将第一游戏用户的语音音频进行第一变换处理,提取到第一游戏用户的语音音频的频域信息,频域信息包括基音频域信息和泛音频域信息;
按照第一虚拟对象的角色属性对泛音频域信息进行修改,得到修改后的泛音频域信息;
融合基音频域信息和修改后的泛音频域信息,并对融合后的频域信息进行第二变换处理,得到与第一虚拟对象的角色属性相匹配的目标音频。
在一种实现方式中,处理单元1202,用于按照第一虚拟对象的角色属性对泛音频域信息进行修改,得到修改后的泛音频域信息时,具体用于:
获取第一虚拟对象的角色属性对应的音频配置信息,音频配置信息包括泛音配置信息,音频配置信息是根据第一虚拟对象的角色属性确定的,或者,音频配置信息是根据游戏场景生成的;
根据泛音配置信息,对泛音频域信息进行修改,得到修改后的泛音频域信息。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标;处理单元1202,用于获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息时,具体用于:
将游戏场景中的目标点确定为坐标原点,游戏场景中的目标点包括:摄像头或光源点;
根据坐标原点建立空间坐标系,并基于空间坐标系生成第一虚拟对象的目标坐标。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象与第二虚拟对象之间的目标距离信息和方位信息;处理单元1202,用于获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息时,具体用于:
获取第一虚拟对象在游戏场景中的第一位置信息,以及第二虚拟对象在游戏场景中的第二位置信息;
对第一位置信息和第二位置信息进行距离运算,得到第一虚拟对象与第二虚拟对象之间的目标距离信息;以及,
对第一位置信息和第二位置信息进行方位运算,得到第一虚拟对象与第二虚拟对象之间的方位信息。
在一种实现方式中,处理单元1202,用于将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户时,具体用于:
对目标音频进行编码,生成第一音频数据包,并采用第一数据通道将第一音频数据包发送至第二游戏用户;以及,
采用第二数据通道将第一虚拟对象的空间位置信息,发送至第二游戏用户;
其中,第一数据通道与第二数据通道不同。
在一种实现方式中,处理单元1202,用于将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户时,具体用于:
对目标音频进行编码,生成第一音频数据包;
将第一虚拟对象的空间位置信息附加至第一音频数据包;
将附加第一虚拟对象的空间位置信息的第一音频数据包,发送至第二游戏用户。
根据本申请的一个实施例,图12所示的音频处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单 元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该音频处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图2及图10所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图12中所示的音频处理装置,以及来实现本申请实施例的音频处理方法。计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
本申请实施例中,一方面,处理单元1202可用于对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。另一方面,处理单元1202可用于获取第一虚拟对象在游戏场景中的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的三维位置信息,提供更为真实的立体空间感。
图13示出了本申请一个示例性实施例提供的一种音频处理装置的结构示意图,该音频处理装置可以是运行于源终端中的一个计算机程序(包括程序代码);该音频处理装置可以用于执行图8及图10所示的方法实施例中的部分或全部步骤,该音频处理装置包括如下单元:
接收单元1301,用于接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,第一虚拟对象是第一游戏用户在游戏场景中操控的虚拟对象;目标音频是对第一游戏用户的语音音频进行转换处理后得到的,且与第一虚拟对象的角色属性相匹配的音频;
处理单元1302,用于根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第一虚拟对象与第二虚拟对象处于同一游戏场景中,第二虚拟对象是第二游戏用户在游戏场景中操控的虚拟对象。
在一种实现方式中,处理单元1302,用于根据第一虚拟对象的空间位置信息对目标音频进行播放时,具体用于:
基于第一虚拟对象的空间位置信息,确定第一虚拟对象与第二虚拟对象之间的音频播放信息,音频播放信息包括音频音量信息和音频方位信息;
按照音频播放信息播放目标音频。
在一种实现方式中,处理单元1302,用于基于第一虚拟对象的空间位置信息,确定第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息时,具体用于:
基于第一虚拟对象的空间位置信息,得到第一虚拟对象与第二虚拟对象之间的目标距离信息;
获取不同距离信息与音量信息之间的映射关系;
根据映射关系以及所述目标距离信息,确定出第一虚拟对象与第二虚拟对象之间的音频音量信息;
根据音频音量信息,确定音频播放信息。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标;处理单元1302,用于基于第一虚拟对象的空间位置信息,确定第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息时,具体用于:
获取第二虚拟对象在游戏场景中的参考坐标;
对目标坐标和参考坐标进行计算,得到第一虚拟对象和第二虚拟对象之间的音频方位信息;
根据音频方位信息,确定音频播放信息。
根据本申请的一个实施例,图13所示的音频处理装置中的各个单元可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)单元还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,该音频处理装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质 (ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图8及图10所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图13中所示的音频处理装置,以及来实现本申请实施例的音频处理方法。计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。
本申请实施例中,一方面,处理单元1302可用于对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。另一方面,处理单元1302可用于获取第一虚拟对象在游戏场景中的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的三维位置信息,提供更为真实的立体空间感。
图14示出了本申请一个示例性实施例提供的一种音频处理设备的结构示意图。请参见图14,该音频处理设备包括处理器1401、通信接口1402以及计算机可读存储介质1403。其中,处理器1401、通信接口1402以及计算机可读存储介质1403可通过总线或者其它方式连接。其中,通信接口1402用于接收和发送数据。计算机可读存储介质1403可以存储在音频处理设备的存储器中,计算机可读存储介质1403用于存储计算机程序,计算机程序包括程序指令,处理器1401用于执行计算机可读存储介质1403存储的程序指令。处理器1401(或称CPU(Central Processing Unit,中央处理器))是音频处理设备的计算核心以及控制核心,其适于实现一条或多条指令,具体适于加载并执行一条或多条指令从而实现相应方法流程或相应功能。
本申请实施例还提供了一种计算机可读存储介质(Memory),计算机可读存储介质是音频处理设备中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机可读存储介质既可以包括音频处理设备中的内置存储介质,当然也可以包括音频处理设备所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了音频处理设备的处理系统。并且,在该存储空间中还存放了适于被处理器1401加载并执行的一条或多条的指令,这些指令可以是一个或多个的计算机程序(包括程序代码)。需要说明的是,此处的计算机可读存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;在一些实施例中,还可以是至少一个位于远离前述处理器的计算机可读存储介质。
在一个实施例中,该计算机可读存储介质中存储有一条或多条指令;由处理器1401加载并执行计算机可读存储介质中存放的一条或多条指令,以实现上述音频处理方法实施例中的相应步骤;具体实现中,计算机可读存储介质中的一条或多条指令由处理器1401加载并执行如下步骤:
获取第一游戏用户的语音音频,以及获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频;
将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户,使第二游戏用户根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第二游戏用户操控的第二虚拟对象与第一虚拟对象处于同一游戏场景。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行对第一游戏用户的语音音频进行转换处理,得到与第一虚拟对象的角色属性相匹配的目标音频时,具体执行如下步骤:
将第一游戏用户的语音音频进行第一变换处理,提取到第一游戏用户的语音音频的频域信息,频域信息包括基音频域信息和泛音频域信息;
按照第一虚拟对象的角色属性对泛音频域信息进行修改,得到修改后的泛音频域信息;
融合基音频域信息和修改后的泛音频域信息,并对融合后的频域信息进行第二变换处理,得到与第一虚拟对象的角色属性相匹配的目标音频。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行按照第一虚拟对象的角色属性对泛音频域信息进行修改,得到修改后的泛音频域信息时,具体执行如下步骤:
获取第一虚拟对象的角色属性对应的音频配置信息,音频配置信息包括泛音配置信息,音频配置信息是根据第一虚拟对象的角色属性确定的,或者,音频配置信息是根据游戏场景生成的;
根据泛音配置信息,对泛音频域信息进行修改,得到修改后的泛音频域信息。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标;计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息时,具体执行如下步骤:
将游戏场景中的目标点确定为坐标原点,游戏场景中的目标点包括:摄像头或光源点;
根据坐标原点建立空间坐标系,并基于空间坐标系生成第一虚拟对象的目标坐标。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象与第二虚拟对象之间的目标距离信息和方位信息;计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行获取第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息时,具体执行如下步骤:
获取第一虚拟对象在游戏场景中的第一位置信息,以及第二虚拟对象在游戏场景中的第二位置信息;
对第一位置信息和第二位置信息进行距离运算,得到第一虚拟对象与第二虚拟对象之间的目标距离信息;以及,
对第一位置信息和第二位置信息进行方位运算,得到第一虚拟对象与第二虚拟对象之间的方位信息。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户时,具体执行如下步骤:
对目标音频进行编码,生成第一音频数据包,并采用第一数据通道将第一音频数据包发送至第二游戏用户;以及,
采用第二数据通道将第一虚拟对象的空间位置信息,发送至第二游戏用户;
其中,第一数据通道与第二数据通道不同。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行将目标音频与第一虚拟对象的空间位置信息发送至第二游戏用户时,具体执行如下步骤:
对目标音频进行编码,生成第一音频数据包;
将第一虚拟对象的空间位置信息附加至第一音频数据包;
将附加第一虚拟对象的空间位置信息的第一音频数据包,发送至第二游戏用户。
在另一个实施例中,该计算机可读存储介质中存储有一条或多条指令;由处理器1401加载并执行计算机可读存储介质中存放的一条或多条指令,以实现上述音频处理方法实施例中的相应步骤;具体实现中,计算机可读存储介质中的一条或多条指令由处理器1401加载并执行如下步骤:
接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,第一虚拟对象是第一游戏用户在游戏场景中操控的虚拟对象;目标音频是对第一游戏用户的语音音频进行转换处理后得到的,且与第一虚拟对象的角色属性相匹配的音频;
根据第一虚拟对象的空间位置信息对目标音频进行播放,其中,第一虚拟对象与第二虚拟对象处于同一游戏场景中,第二虚拟对象是第二游戏用户在游戏场景中操控的虚拟对象。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行根据第一虚拟对象的空间位置信息对目标音频进行播放时,具体执行如下步骤:
基于第一虚拟对象的空间位置信息,确定第一虚拟对象与第二虚拟对象之间的音频播放信息,音频播放信息包括音频音量信息和音频方位信息;
按照音频播放信息播放目标音频。
在一种实现方式中,计算机可读存储介质中的一条或多条指令由处理器1401加载并在执行基于第一虚拟对象的空间位置信息,确定第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息时,具体执行如下步骤:
基于第一虚拟对象的空间位置信息,得到第一虚拟对象与第二虚拟对象之间的目标距离信息;
获取不同距离信息与音量信息之间的映射关系;
根据映射关系以及所述目标距离信息,确定出第一虚拟对象与第二虚拟对象之间的音频音量信息;
根据音频音量信息,确定音频播放信息。
在一种实现方式中,第一虚拟对象的空间位置信息包括:第一虚拟对象在游戏场景中基于坐标原点确定的目标坐标;计算机可读存储介质中的一条或多条指令由处理器1401加载并执行基于第一虚拟对象的空间位置信息,确定第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息时,具体执行如下步骤:
获取第二虚拟对象在游戏场景中的参考坐标;
对目标坐标和参考坐标进行计算,得到第一虚拟对象和第二虚拟对象之间的音频方位信息;
根据音频方位信息,确定音频播放信息。
本申请实施例中,一方面,处理器1401可对第一游戏用户的语音音频进行转换处理,使得转换处理后的目标音频与第一虚拟对象的角色属性相匹配,这在确保目标音频能够准确传递第一游戏用户欲表达的内容的同时,通过将目标音频的音色调整为与第一虚拟对象的角色属性相匹配的音色,避免暴露第一游戏用户的真实声音,提升声音的隐秘性和趣味性。另一方面,处理器1401可获取第一虚拟对象在游戏场景中的空间位置信息,使得基于第一虚拟对象的空间位置信息来播放目标音频时,能够表征第一虚拟对象在游戏场景中的三维位置信息,提供更为真实的立体空间感。
本申请实施例还提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。音频处理设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该音频处理设备执行上述音频处理方法。
本领域普通技术人员可以意识到,结合本申请中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用,使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程设备。计算机指令可以存储在计算机可读存储介质中,或者通过计算机可读存储介质进行传输。计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如,同轴电缆、光纤、数字用户线(DSL))或无线(例如,红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据处理设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种音频处理方法,由第一终端执行,包括:
    获取第一游戏用户的语音音频以及所述第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
    对所述第一游戏用户的语音音频进行转换处理,得到与所述第一虚拟对象的角色属性相匹配的目标音频;
    将所述目标音频与所述第一虚拟对象的空间位置信息发送至第二游戏用户,使所述第二游戏用户根据所述第一虚拟对象的空间位置信息对所述目标音频进行播放,其中,所述第二游戏用户操控的第二虚拟对象与所述第一虚拟对象处于同一游戏场景。
  2. 如权利要求1所述的方法,其中,所述对所述第一游戏用户的语音音频进行转换处理,得到与所述第一虚拟对象的角色属性相匹配的目标音频,包括:
    将所述第一游戏用户的语音音频进行第一变换处理,提取到所述第一游戏用户的语音音频的频域信息,所述频域信息包括基音频域信息和泛音频域信息;
    按照所述第一虚拟对象的角色属性对所述泛音频域信息进行修改,得到修改后的泛音频域信息;
    融合所述基音频域信息和所述修改后的泛音频域信息,并对融合后的频域信息进行第二变换处理,得到与所述第一虚拟对象的角色属性相匹配的目标音频。
  3. 如权利要求2所述的方法,其中,所述按照所述第一虚拟对象的角色属性对所述泛音频域信息进行修改,得到修改后的泛音频域信息,包括:
    获取所述第一虚拟对象的角色属性对应的音频配置信息,所述音频配置信息包括泛音配置信息,所述音频配置信息是根据所述第一虚拟对象的角色属性确定的,或者,所述音频配置信息是根据所述游戏场景生成的;
    根据所述泛音配置信息,对所述泛音频域信息进行修改,得到修改后的泛音频域信息。
  4. 如权利要求1所述的方法,其中,所述第一虚拟对象的空间位置信息包括:所述第一虚拟对象在所述游戏场景中基于坐标原点确定的目标坐标;所述获取所述第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息,包括:
    将所述游戏场景中的目标点确定为坐标原点,所述游戏场景中的目标点包括:摄像头或光源点;
    根据所述坐标原点建立空间坐标系,并基于所述空间坐标系生成所述第一虚拟对象的目标坐标。
  5. 如权利要求1所述的方法,其中,所述第一虚拟对象的空间位置信息包括:所述第一虚拟对象与所述第二虚拟对象之间的目标距离信息和方位信息;所述获取所述第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息,包括:
    获取所述第一虚拟对象在所述游戏场景中的第一位置信息,以及所述第二虚拟对象在所述游戏场景中的第二位置信息;
    对所述第一位置信息和所述第二位置信息进行距离运算,得到所述第一虚拟对象与所述第二虚拟对象之间的目标距离信息;以及,
    对所述第一位置信息和所述第二位置信息进行方位运算,得到所述第一虚拟对象与所述第二虚拟对象之间的方位信息。
  6. 如权利要求1~5任一项所述的方法,其中,所述将所述目标音频与所述第一虚拟对象的空间位置信息发送至第二游戏用户,包括:
    对所述目标音频进行编码,生成第一音频数据包,并采用第一数据通道将所述第一音频数据包发送至第二游戏用户;以及,
    采用第二数据通道将所述第一虚拟对象的空间位置信息,发送至第二游戏用户;
    其中,所述第一数据通道与所述第二数据通道不同。
  7. 如权利要求1~5任一项所述的方法,其中,所述将所述目标音频与所述第一虚拟对象的空间位置信息发送至第二游戏用户,包括:
    对所述目标音频进行编码,生成第一音频数据包;
    将所述第一虚拟对象的空间位置信息附加至所述第一音频数据包;
    将附加所述第一虚拟对象的空间位置信息的第一音频数据包,发送至第二游戏用户。
  8. 一种音频处理方法,由第二终端执行,包括:
    接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,所述第一虚拟对象是所述第一游戏用户在所述游戏场景中操控的虚拟对象;所述目标音频是对所述第一游戏用户的语音音频进行转换处理后得到的,且与所述第一虚拟对象的角色属性相匹配的音频;
    根据所述第一虚拟对象的空间位置信息对所述目标音频进行播放,其中,所述第一虚拟对象与第二虚拟对象处于同一游戏场景中,所述第二虚拟对象是第二游戏用户在所述游戏场景中操控的虚拟对象。
  9. 如权利要求8所述的方法,其中,所述根据所述第一虚拟对象的空间位置信息对所述目标音频进行播放,包括:
    基于所述第一虚拟对象的空间位置信息,确定所述第一虚拟对象与所述第二虚拟对象之间的音频播放信息,所述音频播放信息包括音频音量信息和音频方位信息;
    按照所述音频播放信息对所述目标音频进行播放。
  10. 如权利要求9所述的方法,其中,所述基于所述第一虚拟对象的空间位置信息,确定所述第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息,包括:
    基于所述第一虚拟对象的空间位置信息,得到所述第一虚拟对象与所述第二虚拟对象之间的目标距离信息;
    获取不同距离信息与音量信息之间的映射关系;
    根据所述映射关系以及所述目标距离信息,确定所述第一虚拟对象与所述第二虚拟对象之间的音频音量信息;
    根据所述音频音量信息,确定所述音频播放信息。
  11. 如权利要求9所述的方法,其中,所述第一虚拟对象的空间位置信息包括:所述第一虚拟对象在所述游戏场景中基于坐标原点确定的目标坐标;
    所述基于所述第一虚拟对象的空间位置信息,确定所述第一虚拟对象与游戏场景中第二虚拟对象之间的音频播放信息,包括:
    获取所述第二虚拟对象在游戏场景中的参考坐标;
    对所述目标坐标和所述参考坐标进行计算,得到所述第一虚拟对象和所述第二虚拟对象之间的音频方位信息;
    根据所述音频方位信息,确定所述音频播放信息。
  12. 一种音频处理装置,包括:
    获取单元,用于获取第一游戏用户的语音音频以及所述第一游戏用户在游戏场景中操控的第一虚拟对象的空间位置信息;
    处理单元,用于对所述第一游戏用户的语音音频进行转换处理,得到与所述第一虚拟对象的角色属性相匹配的目标音频;
    所述处理单元,还用于将所述目标音频与所述第一虚拟对象的空间位置信息发送至第二游戏用户,使所述第二游戏用户根据所述第一虚拟对象的空间位置信息对所述目标音频进行播放,其中,所述第二游戏用户操控的第二虚拟对象与所述第一虚拟对象处于同一游戏场景。
  13. 一种音频处理装置,包括:
    接收单元,用于接收第一游戏用户的目标音频及第一虚拟对象的空间位置信息,所述第一虚拟对象是所述第一游戏用户在所述游戏场景中操控的虚拟对象;所述目标音频是对所述第一游戏 用户的语音音频进行转换处理后得到的,且与所述第一虚拟对象的角色属性相匹配的音频;
    处理单元,用于根据所述第一虚拟对象的空间位置信息对所述目标音频进行播放,其中,所述第一虚拟对象与第二虚拟对象处于同一游戏场景中,所述第二虚拟对象是第二游戏用户在所述游戏场景中操控的虚拟对象。
  14. 一种音频处理设备,包括:
    处理器,适于执行计算机程序;
    计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1-7任一项所述的音频处理方法,或实现如权利要求8-11任一项所述的音频处理方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序适于被处理器加载并执行如权利要求1-7任一项所述的音频处理方法,或执行如权利要求8-11任一项所述的音频处理方法。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令被处理器执行时实现如权利要求1-7任一项所述的音频处理方法,或实现如权利要求8-11任一项所述的音频处理方法。
PCT/CN2022/126681 2021-12-01 2022-10-21 一种音频处理方法、装置、设备、介质及程序产品 WO2023098332A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/223,711 US20230364513A1 (en) 2021-12-01 2023-07-19 Audio processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111460896.8A CN114143700B (zh) 2021-12-01 2021-12-01 一种音频处理方法、装置、设备、介质及程序产品
CN202111460896.8 2021-12-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/223,711 Continuation US20230364513A1 (en) 2021-12-01 2023-07-19 Audio processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2023098332A1 true WO2023098332A1 (zh) 2023-06-08

Family

ID=80387087

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126681 WO2023098332A1 (zh) 2021-12-01 2022-10-21 一种音频处理方法、装置、设备、介质及程序产品

Country Status (3)

Country Link
US (1) US20230364513A1 (zh)
CN (1) CN114143700B (zh)
WO (1) WO2023098332A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143700B (zh) * 2021-12-01 2023-01-10 腾讯科技(深圳)有限公司 一种音频处理方法、装置、设备、介质及程序产品
CN114452647A (zh) * 2022-03-23 2022-05-10 北京字节跳动网络技术有限公司 游戏数据生成方法及装置、交互方法及装置
CN115134655B (zh) * 2022-06-28 2023-08-11 中国平安人寿保险股份有限公司 视频生成方法和装置、电子设备、计算机可读存储介质
CN115430156A (zh) * 2022-08-16 2022-12-06 中国联合网络通信集团有限公司 游戏期间的呼叫方法、呼叫装置及主叫用户终端

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020105521A1 (en) * 2000-12-26 2002-08-08 Kurzweil Raymond C. Virtual reality presentation
CN107998658A (zh) * 2017-12-01 2018-05-08 苏州蜗牛数字科技股份有限公司 Vr游戏中实现3d角色口型语音聊天系统及方法
US20200142665A1 (en) * 2018-11-07 2020-05-07 Nvidia Corporation Application of geometric acoustics for immersive virtual reality (vr)
CN114143700A (zh) * 2021-12-01 2022-03-04 腾讯科技(深圳)有限公司 一种音频处理方法、装置、设备、介质及程序产品

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1767445A (zh) * 2004-10-25 2006-05-03 任东海 网络游戏语音交流系统
US11144998B2 (en) * 2018-09-20 2021-10-12 The Toronto-Dominion Bank Dynamic provisioning of data exchanges based on detected relationships within processed image data
CN110070879A (zh) * 2019-05-13 2019-07-30 吴小军 一种基于变声技术制作智能表情及声感游戏的方法
CN112316427B (zh) * 2020-11-05 2022-06-10 腾讯科技(深圳)有限公司 语音播放方法、装置、计算机设备及存储介质
CN113350802A (zh) * 2021-06-16 2021-09-07 网易(杭州)网络有限公司 游戏中的语音交流方法、装置、终端及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020105521A1 (en) * 2000-12-26 2002-08-08 Kurzweil Raymond C. Virtual reality presentation
CN107998658A (zh) * 2017-12-01 2018-05-08 苏州蜗牛数字科技股份有限公司 Vr游戏中实现3d角色口型语音聊天系统及方法
US20200142665A1 (en) * 2018-11-07 2020-05-07 Nvidia Corporation Application of geometric acoustics for immersive virtual reality (vr)
CN114143700A (zh) * 2021-12-01 2022-03-04 腾讯科技(深圳)有限公司 一种音频处理方法、装置、设备、介质及程序产品

Also Published As

Publication number Publication date
CN114143700A (zh) 2022-03-04
CN114143700B (zh) 2023-01-10
US20230364513A1 (en) 2023-11-16

Similar Documents

Publication Publication Date Title
WO2023098332A1 (zh) 一种音频处理方法、装置、设备、介质及程序产品
US9113034B2 (en) Method and apparatus for processing audio in video communication
US9386390B2 (en) Information processing apparatus and sound processing method
CN106937154A (zh) 处理虚拟形象的方法及装置
CN108932948B (zh) 音频数据处理方法、装置、计算机设备和计算机可读存储介质
CN112272170B (zh) 语音通信方法及装置、电子设备、存储介质
WO2019071808A1 (zh) 视频画面显示的方法、装置、系统、终端设备及存储介质
CN109120947A (zh) 一种直播间的语音私聊方法及客户端
KR20190005103A (ko) 전자기기의 웨이크업 방법, 장치, 디바이스 및 컴퓨터 가독 기억매체
US9967668B2 (en) Binaural recording system and earpiece set
US20200211540A1 (en) Context-based speech synthesis
CN108737648B (zh) 音乐音量自适应调节方法、装置、存储介质及终端
JP2005322125A (ja) 情報処理システム、情報処理方法、プログラム
CN107998658A (zh) Vr游戏中实现3d角色口型语音聊天系统及方法
CN109451329A (zh) 混音处理方法及装置
JP2022517562A (ja) スタンドアロンプログラムの実行方法、装置、デバイス及びコンピュータプログラム
CN112954243A (zh) 一种无线耳机直播系统
US20230146871A1 (en) Audio data processing method and apparatus, device, and storage medium
CN113225574B (zh) 信号处理方法及装置
CN109215688A (zh) 同场景音频处理方法、装置、计算机可读存储介质及系统
CN114286274A (zh) 音频处理方法、装置、设备和存储介质
CN113707151A (zh) 语音转写方法、装置、录音设备、系统与存储介质
Ratnaningsih et al. The Analyze of Android's microphone audio streaming BeatME
WO2024027315A1 (zh) 音频处理方法、装置、电子设备、存储介质和程序产品
CN114866856B (zh) 音频信号的处理方法、音频生成模型的训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900137

Country of ref document: EP

Kind code of ref document: A1