CN111939558A

CN111939558A - Method and system for driving virtual character action by real-time voice

Info

Publication number: CN111939558A
Application number: CN202010836241.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Shenzhi Technology Co ltd
Current assignee: Beijing Zhongke Shenzhi Technology Co ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-11-17

Abstract

The invention discloses a method and a system for driving virtual character actions by real-time voice, wherein the method comprises the following steps: establishing a virtual character action scene by using a Unity engine (but not limited to a Unity game engine, all real-time game engines support, such as a Unreal game engine and the like); adding corresponding variable conditions for different actions executed by the virtual character; integrating a voice interface into a Unity engine; acquiring voice data; uploading the acquired voice data to a voice recognition system through a voice interface, and outputting a voice recognition result after the voice recognition system performs content recognition on the voice data; the Unity engine receives the voice recognition result through the voice interface and matches the action variable conditions of the virtual character according to the voice recognition result; and the Unity engine drives the virtual character to execute corresponding action according to the matched variable condition. The invention directly drives the virtual character to act in a voice control mode, simplifies the operation process of the virtual character, reduces the limb interaction in reality and ensures that the control mode of the virtual character is simpler and more convenient.

Description

Method and system for driving virtual character action by real-time voice

Technical Field

The invention relates to the technical field of motion simulation and animation games, in particular to a method and a system for driving virtual character actions by real-time voice.

Background

VR (Virtual Reality) Virtual Reality technology, also known as smart technology, is a new practical technology developed in the 20 th century. The virtual reality technology comprises a computer, electronic information and simulation technology, and the basic realization mode is that the computer simulates a virtual environment so as to provide people with environmental immersion.

Along with the development of virtual reality technology, people no longer satisfy as the viewer, and people more hope to participate in the VR scene of viewing, the VR scene interactive mode that is comparatively general at present is that the user immerses in the VR scene with first visual angle through wearing the VR helmet, then utilizes operating handle to carry out gesture transform, action realization such as object snatchs and the interaction with the VR scene. However, the existing interaction mode is established on the basis of limb movement or manual operation, the operation on the virtual character is not simple enough, and the virtual character cannot be directly driven to move in a real-time voice driving mode.

Disclosure of Invention

The invention aims to provide a method and a system for driving a virtual character to act by real-time voice.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for driving virtual character actions by real-time voice is provided, which comprises the following steps:

establishing a virtual character action scene by using a Unity engine (but not limited to a Unity game engine, all real-time game engines support, such as a Unreal game engine and the like);

adding corresponding variable conditions for the virtual character to execute different actions;

integrating a voice interface into a Unity engine;

acquiring voice data;

uploading the acquired voice data to a voice recognition system through the voice interface, and outputting a voice recognition result after the voice recognition system performs content recognition on the voice data;

the Unity engine receives the voice recognition result through the voice interface and matches the action variable conditions of the virtual character according to the voice recognition result;

and the Unity engine drives the virtual character to execute corresponding actions according to the matched variable conditions.

As a preferred aspect of the present invention, the voice interface integrated in the Unity engine is provided by a third party voice platform.

As a preferred scheme of the present invention, the voice interface provided by the third-party voice platform includes but is not limited to a REST API voice interface provided by the Baidu AI open platform or an Android SDK interface provided by google corporation.

As a preferred aspect of the present invention, the speech recognition system performs content recognition on the speech data through a speech recognition model, and the speech recognition model is trained through a RBM restricted boltzmann machine stochastic neural network.

As a preferable aspect of the present invention, the method of driving the virtual character motion is expressed by the following formula (1):

in the formula (1), the first and second groups,

representing a kinematic deformation of a skeletal model of a virtual character;

to represent joints j on a virtual character skeleton model₁A dual quaternion of the motion attitude of (a);

w₁is a joint j₁The weight of (c);

representing joints j on a virtual character skeletal model_nA dual quaternion of the motion attitude of (a);

w_nis a joint j_nThe weight of (c).

As a preferred aspect of the present invention, the dual quaternion expressing the posture of the joint motion is expressed by the following formula (2):

in the above formula, the first and second carbon atoms are,

is a dual quaternion representing the joint pose on the virtual character skeleton model;

s₀a rotation axis for articulation;

θ₀is the angle of rotation of the joint movement;

is a dual operator;

stranslation of the joint along the rotation axis;

s＝r×s₀and r is the center of rotation of the joint.

The invention also provides a system for driving the virtual character to act by real-time voice, which can realize the method for driving the virtual character to act by real-time voice, and the system comprises:

the virtual character action scene establishing module is used for providing designers with a Unity engine (but not limited to a Unity game engine, all real-time game engines support, such as a Unreal game engine) to establish a virtual character action scene;

the virtual character action condition setting module is connected with the virtual character action scene establishing module and is used for providing the designer with variable conditions for adding corresponding different actions to the virtual character;

the voice interface integration module is used for providing designers with voice interfaces integrated into the Unity engine;

the voice data acquisition module is used for automatically acquiring and storing externally input voice data;

the voice data uploading module is connected with the voice data acquisition module and used for uploading the acquired voice data to a voice recognition system, and the voice recognition system performs content recognition on the voice data and then outputs a voice recognition result;

the voice recognition result receiving module is connected with the voice interface integration module and used for receiving the voice recognition result through the voice interface integrated in the Unity engine;

the variable condition matching module is respectively connected with the virtual character action condition setting module and the voice recognition result receiving module and is used for automatically matching the variable conditions of the virtual character actions according to the voice recognition result;

and the virtual character driving module is respectively connected with the variable condition matching module and the virtual character action scene establishing module and is used for generating a driving signal according to the matched variable condition and driving the virtual character to execute corresponding action.

As a preferred solution of the present invention, the voice interface integrated in the Unity engine is provided by a third-party voice platform, and the voice interface provided by the third-party voice platform includes, but is not limited to, a REST API voice interface provided by a Baidu AI open platform or an Android SDK interface provided by google corporation.

The invention directly drives the virtual character to act in a voice control mode, simplifies the operation process of the virtual character, reduces the limb interaction in reality and ensures that the control mode of the virtual character is simpler and more convenient.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a flowchart illustrating a method for real-time voice-driven virtual character movement according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a system for real-time voice-driven virtual character movement according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or may be connected through one or more other components or may be in an interactive relationship with one another. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The method for driving the virtual character to act by the real-time voice provided by an embodiment of the present invention, as shown in fig. 1, includes:

step S1, a Unity engine (but not limited to a Unity game engine, all real-time game engines support, such as a Unreal game engine) is used for establishing a virtual character action scene; the virtual character action scene means that a virtual character is drawn through the Unity engine, and then the virtual character is rendered to execute corresponding actions, such as raising hands, running and the like;

step S2, corresponding variable conditions are added for the virtual character to execute different actions; the variable condition mentioned here refers to model variable parameters required for driving the virtual character to perform different actions, such as "running" of the virtual character, so that each joint point on the skeletal model of the virtual character needs to move according to a desired "running" state, and the movement of the joint points is controlled by corresponding joint variable parameters, such as a variable parameter of a first joint point (such as a knee joint) in the skeletal model in the running state is a rotation angle of 5 °, a translation distance of not more than 10cm, and the like. The rotation angle and the translation distance of the knee joint are variable parameters for controlling the knee joint to enter the running state, namely, the variable conditions.

Step S3, the speech interface is integrated into the Unity engine, and the Unity engine itself does not have a speech interface at present, so the Unity engine itself does not have a function of driving the action of the avatar by real-time speech. The voice interface provided by the third party speech recognition platform may be integrated into the Unity engine through an interface provided by the Unity engine.

Step S4, voice data is obtained;

step S5, the obtained voice data are uploaded to a voice recognition system through a voice interface, and the voice recognition system carries out content recognition on the voice data and then outputs a voice recognition result;

step S6, the Unity engine receives the voice recognition result through the voice interface and matches the action variable conditions of the virtual character according to the voice recognition result;

in step S7, the Unity engine drives the virtual character to execute corresponding actions according to the matched variable conditions.

In view of the high cost and difficulty of independently developing the voice recognition system, the function of recognizing the voice content is realized by a third-party voice platform. The technical innovation point of the invention is that a voice interface provided by a third-party voice platform is integrated into a Unity engine, after the voice content is recognized by the third-party voice platform, the recognized voice content is matched with the variable condition for driving the action of the virtual character, and after the matching is successful, the virtual character is driven to execute the corresponding action. The variable condition for driving the virtual character to act has a matching relation with the voice content, for example, the recognized voice content is 'running', and the variable condition for driving the virtual character 'running' is 'driving strategy one', so that when the recognized voice content is 'running', the real-time voice driving virtual character system provided by the invention generates a signal for driving running, and then the system drives the virtual character 'running' according to the preset 'running' variable condition.

In the above technical solution, the method for integrating the semantic interface provided by the third-party speech platform into the Unity engine may be implemented by software programming, and a specific integration process is not described herein.

At present, the speech recognition technology is mature and there are companies that can provide open interfaces, such as Baidu corporation, Google corporation, etc. The user can upload voice data through an REST API voice interface provided by the Baidu AI open platform, and the Baidu AI open platform identifies the voice data and then outputs a voice content identification result and feeds the voice content identification result back to the user. The REST API supports three languages of Mandarin, Cantonese and English, needs to upload a complete recording file, has the time length of no more than 60s, and supports three file formats of pcm (uncompressed), wav (uncompressed, pcm coding) and amr (compressed format) for uploading.

Before calling a voice recognition interface through REST API in the Unity script, authentication is required to be obtained according to an authentication mechanism. And recording voice after acquiring the correct token, and converting the voice into byte stream data encoded by base 64. And then packaging parameters such as the voice format, the sampling rate, the number of channels, the token and the like in a json data format, uploading the parameters through a POST request, and obtaining a feedback identification result. In addition, the recorded data can be placed in an http body, the data type of the request header is defined, and then the website with the request header is accessed to obtain the identification result. The voice recognition results obtained by the two data uploading modes are consistent. The text of the voice recognition result can be extracted through json analysis.

The accuracy of recognizing the voice content by the Baidu AI open platform is high, but the Baidu AI open platform can perform voice recognition only after the voice data uploaded through the REST API voice interface needs to be uploaded integrally with the recorded voice data, so that the response speed of driving the virtual character to act is influenced to a certain extent. Therefore, in order to solve the problem, the invention provides another voice recognition scheme, voice data is uploaded through an Android SDK interface provided by google, and voice content is recognized by a voice recognition service provided by google. The integration of Unity and Android SDK requires additional knowledge of Android engineering development to provide support, so the integration difficulty is high.

The Android SDK adopts a streaming protocol to identify voice data, and adopts a mode of processing and feeding back at the same time to identify voice. Compared with the full platform support of REST API, the Android SDK only supports the Android platform. And the Android SDK can not be directly called, and developers need to establish Android library engineering to write self-defining classes and call event classes of the SDK, so that the call of the voice recognition function interface is realized. And after the Android library is customized, integrating the voice recognition technology into a Unity engine through a communication mechanism between Unity and Android.

The Android SDK interface supports uploading and voice recognition, and can improve the efficiency of voice recognition, but the Android SDK also has the defects that the Android SDK cannot be directly used, a developer needs to call a required database, and the requirement on a programmer is higher.

In recent years, deep learning technology has been developed rapidly, and some scholars begin to research methods for recognizing speech content by using deep learning models, but deep learning models cannot be integrated into a Unity engine, and virtual characters cannot be directly driven to move by real-time speech in the Unity engine. But the recognized voice content can be converted into a corresponding driving instruction and sent to the Unity engine, the driving instruction is matched with the variable condition for driving the action of the virtual character, and the Unity engine can drive the virtual character to execute the corresponding action after receiving the driving instruction. Although the driving process is complex, a voice interface does not need to be integrated in the Unity engine, and the method has certain application value. Therefore, as a preferred solution, the speech recognition system provided by the present invention performs content recognition on speech data through a speech recognition model, and more preferably, the speech recognition model is obtained through random neural training of RBM limited boltzmann machine. The training process for the speech recognition model is not described here.

As shown in fig. 2, the present invention further provides a system for real-time voice-driven virtual character movement, including:

the virtual character action scene establishing module 1 is used for providing designers with virtual character action scenes established by a Unity engine;

the virtual character action condition setting module 2 is connected with the virtual character action scene establishing module and is used for providing the designer with variable conditions for adding corresponding actions for the virtual character;

the voice interface integration module 3 is used for providing designers with voice interfaces integrated into the Unity engine;

the voice data acquisition module 4 is used for automatically acquiring and storing externally input voice data; the voice data acquisition module can be a microphone;

the voice data uploading module 5 is connected with the voice data acquiring module 4 and is used for uploading the acquired voice data to a voice recognition system 100, and the voice recognition system 100 performs content recognition on the voice data and then outputs a voice recognition result;

the voice recognition result receiving module 6 is connected with the voice interface integration module 3 and used for receiving the voice recognition result through a voice interface integrated in the Unity engine;

the variable condition matching module 7 is respectively connected with the virtual character action condition setting module 2 and the voice recognition result receiving module 6 and is used for automatically matching the variable conditions of the virtual character actions according to the voice recognition result;

and the virtual character driving module 8 is respectively connected with the variable condition matching module 7 and the virtual character action scene and establishing module 1, and is used for generating a driving signal according to the matched variable condition and driving the virtual character to execute a corresponding action.

Preferably, the voice interface integrated in the Unity engine is provided by a third-party voice platform, and the voice interface of the third-party voice platform comprises a REST API voice interface provided by a Baidu AI open platform or an Android SDK interface provided by google corporation.

In order to ensure the fidelity of the action of the virtual character, the invention also provides a virtual character driving method, which is expressed by the following formula (1):

in the formula (1), the first and second groups,

w₁is a joint j₁The weight of (c);

w_nis a joint j_nThe weight of (c).

In the present embodiment, the dual quaternion representing the joint movement posture is expressed by the following formula (2):

in the above formula, the first and second carbon atoms are,

s₀a rotation axis for articulation;

θ₀is the angle of rotation of the joint movement;

is a dual operator;

stranslation of the joint along the rotation axis;

s＝r×s₀and r is the center of rotation of the joint.

In summary, the present invention

It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims

1. A method for driving virtual character actions by real-time voice is characterized by comprising the following steps:

integrating a voice interface into a Unity engine;

acquiring voice data;

2. A method for real-time voice-driven virtual character actions according to claim 1, wherein the voice interface integrated in the Unity engine (but not limited to Unity game engine, all real-time game engines support, such as a urea game engine, etc.) is provided by a third party voice platform.

3. The method of claim 2, wherein the voice interface provided by any third party voice platform includes, but is not limited to, the REST API voice interface provided by Baidu AI open platform or the Android SDK interface provided by Google.

4. The method of claim 1, wherein the speech recognition system performs content recognition on the speech data through a speech recognition model, and the speech recognition model is trained through a stochastic neural network (stochastic neural network) of Restricted Boltzmann Machine (RBM).

5. The method for driving virtual character motion by real-time voice according to claim 1, is characterized in that the method for driving the virtual character motion is expressed by the following formula (1):

in the formula (1), the first and second groups,

w₁is a joint j₁The weight of (c);

w_nis a joint j_nThe weight of (c).

6. The method of real-time voice-driven virtual character movement according to claim 5, wherein the dual quaternion representing the joint movement posture is expressed by the following formula (2):

in the above formula, the first and second carbon atoms are,

s₀a rotation axis for articulation;

θ₀is the angle of rotation of the joint movement;

is a dual operator;

stranslation of the joint along the rotation axis;

s＝r×s₀and r is the center of rotation of the joint.

7. A system for driving virtual character actions by real-time voice, which can realize the method as claimed in any one of claims 1 to 6, is characterized by comprising:

a voice recognition result receiving module, connected to the voice interface integration module, for receiving the voice recognition result through the voice interface integrated in the Unity engine;

8. The system of claim 7, wherein the voice interface integrated into the Unity engine is provided by a third party voice platform, the voice interface provided by a third party voice platform including, but not limited to, the REST API voice interface provided by the Baidu AI open platform or the Android SDK interface provided by Google.