CN108668024B

CN108668024B - Voice processing method and terminal

Info

Publication number: CN108668024B
Application number: CN201810425867.XA
Authority: CN
Inventors: 陈立
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2021-01-08
Anticipated expiration: 2038-05-07
Also published as: CN108668024A

Abstract

The embodiment of the invention provides a voice processing method and a terminal, relates to the technical field of communication, and aims to solve the problem that when a user uses original voice to input information in a virtual scene, the process of distinguishing virtual roles corresponding to different users by other users is complicated, and the use experience is poor. The method comprises the following steps: under the condition of receiving voice information of a user, acquiring personalized voice information of a target virtual role, wherein the target virtual role is a virtual role selected by the user; processing the voice information and the personalized voice information to obtain target voice information; and outputting the target voice information. The method provided by the invention can improve the use experience of the user on the terminal.

Description

Voice processing method and terminal

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a voice processing method and a terminal.

Background

With the great increase of the performance of terminals such as mobile phones, mobile games are also developed from leisure games with simple scenes to competitive games and heavy games such as large MMOs (Massively Multiplayer Online) games which pursue more operations and game experiences. Compared with PC-end games, the character input operation in the hand game is lower in convenience and cannot meet the real-time communication requirement of players, so that voice chat becomes an important communication mode for carrying out play communication and seeking tactical coordination among players in the multiplayer online game.

Currently, voice chat in a hand game is based on the user's voice. Then, during the course of the game, the player needs to identify the partner's voice and associate the voice with its game character. Therefore, the existing mode makes the process of identifying the identities of other players by game players complicated, and further influences the use experience of the user on the terminal.

Disclosure of Invention

The embodiment of the invention provides a voice processing method and a terminal, and aims to solve the problem that when a user uses original voice to input information in a virtual scene, the process of distinguishing virtual roles corresponding to different users by other users is complicated, and the use experience is poor.

In a first aspect, an embodiment of the present invention provides a speech processing method, applied to a terminal, including:

under the condition of receiving voice information of a user, acquiring personalized voice information of a target virtual role, wherein the target virtual role is a virtual role selected by the user;

processing the voice information and the personalized voice information to obtain target voice information;

and outputting the target voice information.

In a second aspect, an embodiment of the present invention further provides a terminal, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring personalized voice information of a target virtual role under the condition of receiving voice information of a user, and the target virtual role is a virtual role selected by the user;

the second acquisition module is used for processing the voice information and the personalized voice information to obtain target voice information;

and the output module is used for outputting the target voice information.

In a third aspect, an embodiment of the present invention further provides a terminal, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the speech processing method according to the first aspect.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by the processor, the computer program implements the steps of the speech processing method according to the first aspect.

Thus, in the embodiment of the invention, under the condition of receiving the voice information of the user, the target voice information is obtained according to the personalized voice information of the target virtual character selected by the user and the voice information of the user and is output. Therefore, by using the scheme of the embodiment of the invention, as the target voice information is combined with the personalized voice information of the target virtual role, other users can quickly and accurately distinguish different users by combining the target virtual role, the distinguishing process is simplified, the accuracy of the distinguishing process is improved, and the use experience of the users on the terminal is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of a speech processing method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a speech processing method according to an embodiment of the present invention;

fig. 3 is a flowchart of setting personalized voice information of a virtual character according to an embodiment of the present invention;

fig. 4 is one of the structural diagrams of a terminal provided in the embodiment of the present invention;

fig. 5 is a second structural diagram of a terminal according to an embodiment of the present invention;

fig. 6 is a third structural diagram of a terminal according to an embodiment of the present invention;

fig. 7 is a fourth structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a speech processing method according to an embodiment of the present invention. As shown in fig. 1, the method comprises the following steps:

step 101, obtaining personalized voice information of a target virtual character under the condition of receiving voice information of a user, wherein the target virtual character is a virtual character selected by the user.

In the embodiment of the invention, when the user starts a voice input function such as voice chat, voice information can be input. Accordingly, the terminal may receive voice information of the user. The target virtual character may refer to a character in the game. In practical application, personalized voice information can be set for the virtual roles, namely, a certain virtual role can be represented by the personalized voice information. The personalized voice information may include intonation, language category (mandarin, dialect, etc.).

The user may pre-select a virtual character, which is referred to herein as a target virtual character.

And 102, processing the voice information and the personalized voice information to obtain target voice information.

The same target avatar may be selected by one or more users. Therefore, when the target virtual character is selected by a plurality of users, the voice information and the personalized voice information are synthesized to obtain the target voice information. And under the condition that the target virtual role is selected by one user, converting the voice information by utilizing the personalized voice information to acquire the target voice information.

The method for synthesizing the voice information and the personalized voice information can refer to a voice synthesis method in the prior art. The method for converting the voice information by utilizing the personalized voice information mainly converts the voice information of the user into the personalized voice information. For example, if the word input by the user is "withdraw", then there is also a withdraw word in the personalized voice message, and the "withdraw" input by the user is corresponded to the "withdraw" in the personalized voice message, and the "withdraw" in the personalized voice message is output.

And step 103, outputting the target voice information.

After the target voice information is obtained, the target voice information can be output. For example, the target voice information is played.

In the embodiment of the present invention, the method may be applied to a terminal, for example: a Mobile phone, a Tablet Personal Computer (Tablet Personal Computer), a Laptop Computer (Laptop Computer), a PDA (Personal digital assistant), an MID (Mobile Internet Device), a Wearable Device (Wearable Device), or the like.

In the embodiment of the invention, under the condition of receiving the voice information of the user, the target voice information is obtained and output according to the personalized voice information of the target virtual role selected by the user and the voice information of the user. Therefore, by using the scheme of the embodiment of the invention, as the target voice information is combined with the personalized voice information of the target virtual role, other users can quickly and accurately distinguish different users by combining the target virtual role, the distinguishing process is simplified, the accuracy of the distinguishing process is improved, and the use experience of the users on the terminal is further improved.

Referring to fig. 2, fig. 2 is a flowchart of a speech processing method according to an embodiment of the present invention. As shown in fig. 2, the method comprises the following steps:

step 201, setting personalized voice information for the virtual character.

In this step, a virtual character voice template may be first obtained, and then the virtual character voice template is corrected by using the voice correction parameters corresponding to the virtual character, so as to obtain the personalized voice information of the virtual character.

The voice template of the virtual character can be selected from a voice library. The voice correction parameters corresponding to the virtual roles comprise one or more of the following information: gender corresponding to the virtual character, age corresponding to the virtual character, occupation corresponding to the virtual character, and grade corresponding to the virtual character. Or further, in order to make the personalized voice information more suitable for the character, the voice correction parameters may further include information such as the clothing, home town, character, and the like of the virtual character.

Referring to fig. 3, when setting personalized voice information for virtual characters, one virtual character may be selected first. It is determined whether the selected virtual character has an official dubbing, i.e., has personalized voice information with system default. If so, taking the default personalized voice information of the system as the personalized voice information of the selected role; if not, the following setup process may be performed.

And selecting a voice template from the voice library, and correcting according to the sex information of the virtual character, the age information corresponding to the virtual character, the professional information corresponding to the virtual character, the grade information corresponding to the virtual character and the like. In the correction process, correction may be performed in combination with one or more parameters, and the order of correction using each parameter may be arbitrarily adjusted.

After the personalized voice information is set for the virtual character, a corresponding relationship between the virtual character and the personalized voice information may be set, for example, whether a certain virtual character sets the personalized voice information or not, a storage location of the personalized voice information of the virtual character, and the like are marked in the corresponding relationship.

Through the set personalized voice information, the game experience of the user can be further improved, and the use experience of the user on the terminal can be further improved.

Step 202, after the user starts the voice chat and selects the target virtual character, determining whether the target virtual character has personalized voice information.

For example, when a user starts a game, a virtual character, i.e., a target virtual character, may be selected. According to the setup of step 201, it can be determined whether personalized voice information exists for the target virtual character. If so, go to step 203. Otherwise, the voice input by the user can be directly output. Through this step, the efficiency of voice output can be improved.

And 203, acquiring the personalized voice information of the target virtual role.

And step 204, determining whether the target virtual role is selected by a plurality of users.

If yes, go to step 205; otherwise, step 206 is performed.

And step 205, synthesizing the voice information input by the user and the personalized voice information of the target virtual character to obtain the target voice information.

And step 206, converting the voice information input by the user by using the personalized voice information of the target virtual role to acquire the target voice information.

And step 207, outputting the target voice information.

Referring to fig. 4, fig. 4 is a structural diagram of a terminal according to an embodiment of the present invention. As shown in fig. 4, the terminal 400 includes:

a first obtaining module 401, configured to obtain personalized voice information of a target virtual role when voice information of a user is received, where the target virtual role is a virtual role selected by the user; a second obtaining module 402, configured to process the voice information and the personalized voice information to obtain target voice information; an output module 403, configured to output the target voice information.

Optionally, the second obtaining module 402 includes: the synthesis submodule is used for synthesizing the voice information and the personalized voice information under the condition that the target virtual role is selected by a plurality of users to obtain target voice information; and the conversion sub-module is used for converting the voice information by utilizing the personalized voice information under the condition that the target virtual role is selected by one user to acquire the target voice information.

Optionally, as shown in fig. 5, the terminal 400 may further include: a determining module 404, configured to determine whether personalized voice information exists in the target virtual character; the first obtaining module 401 is specifically configured to, when it is determined that the personalized voice information exists in the target virtual character, obtain the personalized voice information of the target virtual character.

Optionally, as shown in fig. 6, the terminal 400 may further include: a setting module 405, configured to set personalized voice information for the virtual character.

Optionally, the setting module 404 includes: the obtaining submodule is used for obtaining a virtual character voice template; the correction submodule is used for correcting the virtual character voice template by utilizing the voice correction parameters corresponding to the virtual character to acquire the personalized voice information of the virtual character; the voice correction parameters corresponding to the virtual roles comprise one or more of the following information: gender corresponding to the virtual character, age corresponding to the virtual character, occupation corresponding to the virtual character, and grade corresponding to the virtual character.

The terminal 400 can implement each process implemented by the terminal in the method embodiments of fig. 1 to fig. 3, and is not described herein again to avoid repetition.

Fig. 7 is a schematic diagram of a hardware structure of a terminal for implementing various embodiments of the present invention. The terminal 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, a processor 710, a power supply 711, and the like. Those skilled in the art will appreciate that the terminal configuration shown in fig. 7 is not intended to be limiting, and that the terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

The processor 710 is configured to, in a case that voice information of a user is received, obtain personalized voice information of a target virtual character, where the target virtual character is a virtual character selected by the user; processing the voice information and the personalized voice information to obtain target voice information; and outputting the target voice information.

The processor 710 is configured to synthesize the voice information and the personalized voice information to obtain target voice information when the target virtual character is selected by multiple users; and under the condition that the target virtual role is selected by one user, converting the voice information by utilizing the personalized voice information to acquire the target voice information.

Wherein, the processor 710 is configured to determine whether personalized voice information exists in the target virtual character; and acquiring the personalized voice information of the target virtual character under the condition that the personalized voice information of the target virtual character exists.

The processor 710 is configured to set personalized voice information for the virtual character.

The processor 710 is configured to obtain a virtual character voice template; correcting the virtual character voice template by utilizing the voice correction parameters corresponding to the virtual character to acquire personalized voice information of the virtual character; the voice correction parameters corresponding to the virtual roles comprise one or more of the following information: the system comprises gender information corresponding to the virtual character, age information corresponding to the virtual character, occupation information corresponding to the virtual character and grade information corresponding to the virtual character.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 701 may be used for receiving and sending signals during a message transmission and reception process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 710; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 701 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 701 may also communicate with a network and other devices through a wireless communication system.

The terminal provides wireless broadband internet access to the user via the network module 702, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 703 may convert audio data received by the radio frequency unit 701 or the network module 702 or stored in the memory 709 into an audio signal and output as sound. Also, the audio output unit 703 may also provide audio output related to a specific function performed by the terminal 700 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 703 includes a speaker, a buzzer, a receiver, and the like.

The input unit 704 is used to receive audio or video signals. The input Unit 704 may include a GPU (Graphics Processing Unit) 7041 and a microphone 7042, and the Graphics processor 7041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 706. The image frames processed by the graphic processor 7041 may be stored in the memory 709 (or other storage medium) or transmitted via the radio unit 701 or the network module 702. The microphone 7042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 701 in case of a phone call mode.

The terminal 700 also includes at least one sensor 705, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 7061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 7061 and/or a backlight when the terminal 700 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the terminal posture (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration identification related functions (such as pedometer, tapping), and the like; the sensors 705 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 706 is used to display information input by the user or information provided to the user. The Display unit 706 may include a Display panel 7061, and the Display panel 7061 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.

The user input unit 707 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal. Specifically, the user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 7071 (e.g., operations by a user on or near the touch panel 7071 using a finger, a stylus, or any other suitable object or attachment). The touch panel 7071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 710, receives a command from the processor 710, and executes the command. In addition, the touch panel 7071 can be implemented by various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 707 may include other input devices 7072 in addition to the touch panel 7071. In particular, the other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein again.

Further, the touch panel 7071 may be overlaid on the display panel 7061, and when the touch panel 7071 detects a touch operation on or near the touch panel 7071, the touch operation is transmitted to the processor 710 to determine the type of the touch event, and then the processor 710 provides a corresponding visual output on the display panel 7061 according to the type of the touch event. Although the touch panel 7071 and the display panel 7061 are shown in fig. 7 as two separate components to implement the input and output functions of the terminal, in some embodiments, the touch panel 7071 and the display panel 7061 may be integrated to implement the input and output functions of the terminal, which is not limited herein.

The interface unit 708 is an interface for connecting an external device to the terminal 700. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 708 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the terminal 700 or may be used to transmit data between the terminal 700 and the external device.

The memory 709 may be used to store software programs as well as various data. The memory 709 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 709 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 710 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 709 and calling data stored in the memory 709, thereby integrally monitoring the terminal. Processor 710 may include one or more processing units; preferably, the processor 710 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.

The terminal 700 may also include a power supply 711 (e.g., a battery) for providing power to the various components, and preferably, the power supply 711 may be logically coupled to the processor 710 via a power management system, such that functions of managing charging, discharging, and power consumption are performed via the power management system.

In addition, the terminal 700 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides a terminal, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the computer program implements each process of the foregoing speech processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the foregoing speech processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A voice processing method is applied to a terminal, and is characterized by comprising the following steps:

outputting the target voice information;

the processing the voice information and the personalized voice information to obtain target voice information comprises:

under the condition that the target virtual role is selected by a plurality of users, synthesizing the voice information and the personalized voice information to obtain target voice information;

under the condition that the target virtual role is selected by a user, the voice information is converted by utilizing the personalized voice information to obtain target voice information;

before the obtaining of the personalized voice information of the target virtual character, the method further includes:

when a user starts voice chat and selects a target virtual role, determining whether the target virtual role has personalized voice information;

the acquiring of the personalized voice information of the target virtual character comprises the following steps:

under the condition that the target virtual character is determined to have the personalized voice information, obtaining the personalized voice information of the target virtual character;

and if the target virtual character is determined to have no personalized voice information, directly outputting the voice input by the user.

2. The method of claim 1, wherein prior to the obtaining the personalized voice information of the target virtual character, the method further comprises:

and setting personalized voice information for the virtual character.

3. The method of claim 2, wherein setting personalized voice information for the virtual character comprises:

acquiring a virtual character voice template;

correcting the virtual character voice template by utilizing the voice correction parameters corresponding to the virtual character to acquire personalized voice information of the virtual character;

the voice correction parameters corresponding to the virtual roles comprise one or more of the following information:

the system comprises gender information corresponding to the virtual character, age information corresponding to the virtual character, occupation information corresponding to the virtual character and grade information corresponding to the virtual character.

4. A terminal, comprising:

the output module is used for outputting the target voice information;

the second acquisition module includes:

the synthesis submodule is used for synthesizing the voice information and the personalized voice information under the condition that the target virtual role is selected by a plurality of users to obtain target voice information;

the conversion submodule is used for converting the voice information by utilizing the personalized voice information under the condition that the target virtual role is selected by a user to acquire target voice information;

the system comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining whether the target virtual role has personalized voice information or not after a user starts voice chat and selects the target virtual role;

the first obtaining module is specifically configured to obtain the personalized voice information of the target virtual character when it is determined that the personalized voice information exists in the target virtual character;

5. The terminal of claim 4, further comprising:

and the setting module is used for setting personalized voice information for the virtual role.

6. The terminal of claim 5, wherein the setting module comprises:

the obtaining submodule is used for obtaining a virtual character voice template;

the correction submodule is used for correcting the virtual character voice template by utilizing the voice correction parameters corresponding to the virtual character to acquire the personalized voice information of the virtual character;