CN115565518B - Method for processing player dubbing in interactive game and related device - Google Patents

Method for processing player dubbing in interactive game and related device Download PDF

Info

Publication number
CN115565518B
CN115565518B CN202211512932.5A CN202211512932A CN115565518B CN 115565518 B CN115565518 B CN 115565518B CN 202211512932 A CN202211512932 A CN 202211512932A CN 115565518 B CN115565518 B CN 115565518B
Authority
CN
China
Prior art keywords
target
player
voice
machine
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211512932.5A
Other languages
Chinese (zh)
Other versions
CN115565518A (en
Inventor
罗正
李进峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Renma Interactive Technology Co Ltd
Original Assignee
Shenzhen Renma Interactive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Renma Interactive Technology Co Ltd filed Critical Shenzhen Renma Interactive Technology Co Ltd
Priority to CN202211512932.5A priority Critical patent/CN115565518B/en
Publication of CN115565518A publication Critical patent/CN115565518A/en
Application granted granted Critical
Publication of CN115565518B publication Critical patent/CN115565518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L2013/021Overlap-add techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application discloses a processing method and a related device for dubbing of a player in an interactive game, wherein the method comprises the following steps: the server calls a man-machine conversation engine to carry out man-machine interaction with a player using the terminal equipment according to a target man-machine conversation script of the target interactive game; sending a target machine output statement to the terminal equipment in the human-computer interaction process, wherein the target machine output statement is used for guiding a player to input the target statement; the server receives a target player voice corresponding to a target sentence and input by a player according to the target machine output sentence and a target corresponding relation between the target player voice and the target sentence; searching all second machine output contents which are the same as the contents of the target sentences in a target man-machine conversation script of the target interactive game; and according to the target corresponding relation, updating the second output mode configuration corresponding to the output content of all the second machines with the same content as the target sentence into the target player voice. The method and the device are beneficial to improving the immersion sense of the player in the interactive game.

Description

Method for processing player dubbing in interactive game and related device
Technical Field
The application belongs to the technical field of general data processing of the Internet industry, and particularly relates to a processing method and a related device for dubbing of players in an interactive game.
Background
There are two main ways of dubbing for interactive games. One is to convert the scenario text of the interactive game into audio output by a speech synthesis technology (TTS). The other method is to export the plot text of the interactive game, divide the lines of characters into texts, and arrange manual dubbing correspondingly to link the lines with the manual dubbing of the lines, so that when the interactive game is played to a certain content in the plot text, the dubbing corresponding to the content can be matched, and the dubbing can be output through equipment. When the dubbing mode is adopted to carry out the interactive game, all the players use the same dubbing, the difference is not strong, and the participation sense of the players is low.
Disclosure of Invention
The embodiment of the application provides a processing method and a related device for dubbing of a player in an interactive game, which can realize that the player uses own voice to participate in plot interaction and is beneficial to improving the immersion sense of the player in the interactive game.
In a first aspect, an embodiment of the present application provides a processing method for dubbing by a player in an interactive game, which is applied to a server in a voice interactive system, where the voice interactive system includes the server and a terminal device for voice interaction by the player, where the server is deployed with a human-machine conversation engine, a human-machine conversation logic of the human-machine conversation engine is given by a target human-machine conversation scenario of a target interactive game, the target human-machine conversation scenario includes a first machine output content for promoting scenario development and a second machine output content for representing parsing semantics of a sentence input by the player, the first machine output content includes a first text content and a first output mode configuration, the second machine output content includes a second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to a speaking mode configuration when the second text content in the first text is output by the terminal device, and the speaking mode configuration includes parameter configurations of tone, tone and volume; the method comprises the following steps:
calling the human-computer conversation engine to carry out human-computer interaction with the player of the terminal equipment according to the target human-computer conversation script of the target interactive game;
in the process of the man-machine interaction, sending a target machine output statement to the terminal equipment, wherein the target machine output statement is used for guiding a player to input the target statement through the terminal equipment;
receiving a target player voice corresponding to the target sentence input by the player from the terminal equipment according to the target machine output sentence;
acquiring a target corresponding relation between the target player voice and the target sentence from the terminal equipment;
searching all second machine output contents which are the same as the contents of the target sentences in a target man-machine conversation scenario of the target interactive game;
and updating the original voice corresponding to the output content of all the second machines with the same content as the target sentence into the voice of the target player according to the target corresponding relation.
In a second aspect, an embodiment of the present application provides a processing method for dubbing of a player in an interactive game, which is applied to a terminal device for performing voice interaction on the player in a voice interaction system, where the voice interaction system includes a server and the terminal device, where the server is deployed with a human-machine conversation engine, a human-machine conversation logic of the human-machine conversation engine is given by a target human-machine conversation scenario of a target interactive game, the target human-machine conversation scenario includes first machine output content for promoting scenario development and second machine output content for representing parsing semantics of a sentence input by the player, the first machine output content includes first text content and a first output mode configuration, the second machine output content includes second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to a speaking mode configuration when the second text content in the first text is output by the terminal device, and the speaking mode configuration includes parameter configurations of tone, volume, or volume; the method comprises the following steps:
receiving and playing a target machine output sentence from the server, wherein the target machine output sentence is used for guiding a player to input a target sentence through the terminal equipment;
acquiring target player voice input by a player according to the target sentence;
comparing the target player voice with the target sentence to obtain a second similarity;
judging whether the second similarity reaches a second preset threshold value or not;
if so, establishing a target corresponding relation between the target player voice and the target sentence, and sending the target player voice and the target corresponding relation to the server;
if not, the voice of the target player is obtained again.
In a third aspect, an embodiment of the present application provides a processing apparatus for dubbing by a player in an interactive game, which is applied to a server in a voice interactive system, where the voice interactive system includes the server and a terminal device for voice interaction between players, where the server is deployed with a human-machine conversation engine, a human-machine conversation logic of the human-machine conversation engine is given by a target human-machine conversation scenario of a target interactive game, the target human-machine conversation scenario includes first machine output content for promoting scenario development and second machine output content for representing parsing semantics of input sentences of the players, the first machine output content includes first text content and a first output mode configuration, the second machine output content includes second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to a speaking mode configuration when the second text content in the first text is output through the terminal device, and the speaking mode configuration includes parameter configurations of tone, volume, or volume; the device comprises:
the calling unit is used for calling the human-computer interaction engine to carry out human-computer interaction with the player of the terminal equipment according to the target human-computer interaction script of the target interaction game;
the sending unit is used for sending a target machine output statement to the terminal equipment in the process of the man-machine interaction, and the target machine output statement is used for guiding a player to input the target statement through the terminal equipment;
a receiving unit configured to receive a target player voice corresponding to the target sentence input according to the target machine output sentence by the player from the terminal device;
the receiving unit is further configured to obtain a target correspondence between the target player voice from the terminal device and the target sentence;
the updating unit is used for searching all second machine output contents which are the same as the contents of the target sentences in a target man-machine conversation scenario of the target interactive game;
the updating unit is further configured to update, according to the target correspondence, the original voices corresponding to all the second machine output contents that are the same as the contents of the target sentence into the target player voice.
In a fourth aspect, embodiments of the present application provide a server comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the first aspect of embodiments of the present application.
In a fifth aspect, an embodiment of the present application provides a computer storage medium storing a computer program for electronic data exchange, where the computer program makes a computer perform some or all of the steps described in the first aspect of the present embodiment.
It can be seen that, in this embodiment, the server invokes the human-computer interaction engine to perform human-computer interaction with the player of the terminal device according to the target human-computer interaction scenario of the target interactive game; in the process of the man-machine interaction, the server sends a target machine output statement to the terminal equipment, wherein the target machine output statement is used for guiding a player to input a target statement through the terminal equipment; after the server receives target player voice corresponding to the target sentence and the target corresponding relation between the target player voice and the target sentence, which are input by a player according to the target machine output sentence, from the terminal equipment, the server searches all second machine output contents which are the same as the contents of the target sentence in a target man-machine interactive script of the target interactive game; and according to the target corresponding relation, configuring and updating a second output mode corresponding to all the second machine output contents which are the same as the contents of the target sentence into the target player voice. Therefore, in the subsequent progress of the target interactive game, the server can play the voice of the target player corresponding to the target sentence through the terminal equipment to replace the second output content corresponding to the target sentence, the realization of the player using the own voice to participate in the plot interaction is facilitated, the participation and immersion of the player in the interactive game are improved, and the game experience of the player is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of an exemplary speech interaction system according to an embodiment of the present disclosure;
fig. 2 is a diagram illustrating an example of an electronic device according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a processing method for player dubbing in an interactive game applied to a server according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a processing method for player dubbing in an interactive game applied to a terminal device according to an embodiment of the present application;
FIG. 5 is an application scenario of an interactive game provided in an embodiment of the present application;
FIG. 6 is an application scenario of another interactive game provided in the embodiments of the present application;
FIG. 7 is a block diagram illustrating functional units of a processing apparatus for dubbing by a player in an interactive game according to an embodiment of the present disclosure;
fig. 8 is a block diagram illustrating functional units of another processing apparatus for dubbing by a player in an interactive game according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the drawings.
The technical solution of the present application can be applied to the voice interaction system 10 shown in fig. 1, where the example voice interaction system 10 includes a server 100 and a terminal device 200. The server 100 can establish a communication connection with the terminal device 200. In practical applications, the terminal device 200 establishing a communication connection with the server 100 in the voice interactive system 10 includes at least one terminal device.
The server 100 may be a server, a server cluster composed of several servers, or a cloud computing service center. The terminal device 200 may include, but is not limited to, various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem, and various forms of User Equipment (UE), mobile Station (MS), terminal device (terminal), and the like, which are not limited in this regard.
The terminal device 200 logs in a game platform of the voice interaction system 10, where the game platform may be a voice interaction novel platform, a voice interaction game platform, an audio-visual game platform, or a comprehensive platform of voice interaction novel, voice interaction game, audio-visual game, and the like. The game platform may include a plurality of voice game scripts, voice interaction novel scripts, audiovisual game scripts, and the like, which are not described herein in detail.
Wherein, the server 100 includes a man-machine interaction engine for providing a voice interaction service of an interactive game to the player through the terminal device 200. The human-machine dialog logic of the human-machine dialog engine is assigned by a target human-machine dialog script of the target interactive game, the target human-machine dialog script comprising first machine output content for promoting the development of the scenario and second machine output content for characterizing the parsing semantics of the player input sentence. The first machine output content comprises first text content and first output mode configuration, the second machine output content comprises second text content and second output mode configuration, the first output mode configuration and the second output mode configuration refer to speaking mode configuration when the first text content or the second text content is output through the terminal equipment, and the speaking mode configuration comprises parameter configuration of tone, tone and volume.
Specifically, the target human-machine dialog scenario includes a plurality of human-machine dialog scenarios, and a single human-machine dialog scenario includes the first machine output content and a second machine output content for characterizing the semantics of the player input sentence parsing. The first machine output content includes output machine statements (e.g., a script and the like) and output script text content statements (e.g., a script character dialog content and the like), and each output machine statement and each output script text content statement includes corresponding first text content and first output mode configuration, which are not described in detail herein.
Through the voice interaction system 10, the server 100 can analyze the semantic meaning of the player according to the input sentence of the player, and match the voice of the target player matched with the current scenario scene according to the semantic meaning of the player, so that the voice of the target player is played to the player through the terminal device 200, the player can participate in the scenario interaction by using the own voice of the player, the participation and immersion of the player in the interactive game can be improved, and the game experience of the player can be improved.
Fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may be the server 100 or the terminal device 200 in the voice interactive system 10. The electronic device may comprise a processor 110, a memory 120, a communication interface 130, and one or more programs 121, wherein the one or more programs 121 are stored in the memory 120 and configured to be executed by the processor 110, and wherein the one or more programs 121 comprise instructions for performing any of the steps of the method embodiments described above. The communication interface 130 is used to support communication between the server 100 and other devices. The processor 110 may be, for example, a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an Application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, units, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, and the like.
The memory 120 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), SDRAM (SDRAM), ddr (double data rate) SDRAM, edraws (DDRSDRAM), enhanced SDRAM (enhanced SDRAM, ESDRAM), SDRAM (synchronous DRAM), SDRAM (SLDRAM), and rdram (DRRAM).
In a specific implementation, the processor 110 is configured to execute any one of the steps executed by the first server in the following method embodiments, and when performing data transmission such as sending, optionally invokes the communication interface 130 to complete the corresponding operation.
It should be noted that the structural diagram of the server 100 is only an example, and the number of the specifically included devices may be more or less, and is not limited herein.
Referring to fig. 3, fig. 3 is a schematic flowchart of a processing method for dubbing by a player in an interactive game, which may be applied to a server 100 in a voice interactive system shown in fig. 1, where the voice interactive system includes the server and a terminal device for performing voice interaction by the player, where the server is deployed with a human-computer dialog engine, human-computer dialog logic of the human-computer dialog engine is given by a target human-computer dialog script of a target interactive game, the target human-computer dialog script includes first machine output content for promoting scenario development and second machine output content for representing parsing semantics of a statement input by the player, the first machine output content includes first text content and a first output mode configuration, the second machine output content includes second text content and a second output mode configuration, and the first output mode configuration and the second output mode configuration refer to a mode configuration when the first text content or the second text content is output by the terminal device, and the mode configuration includes parameters of tone, tone and volume; as shown in fig. 3, the method for processing player dubbing in the interactive game includes:
step S110, the man-machine conversation engine is called to carry out man-machine interaction with the player of the terminal equipment according to the target man-machine conversation script of the target interactive game.
The target interactive game can be an interactive novel, a voice game or an audio-visual game.
The human-machine dialog engine can be used for executing at least one of the following operations: transmitting the first machine output content, the second machine output content, or the stored voice of the target player, etc. to the terminal device; receiving input information from the terminal equipment, wherein the input information can be voice input information and character input information input by a player through the terminal equipment, or user operation performed by the player through a game platform, and the like; the input information is analyzed, and an operation is performed for a purpose corresponding to the player's desire and intention obtained by the analysis. It is understood that the operations that the human machine dialog engine can perform include, but are not limited to, the above-mentioned operations, which are not described in detail herein.
And step S120, in the process of the man-machine interaction, sending a target machine output statement to the terminal equipment.
Wherein the target machine output sentence is used for guiding the player to input the target sentence through the terminal device. Specifically, the target machine output sentence is used to guide the player to input the voice information of the text content corresponding to the target sentence through the terminal device. That is, the terminal device may guide the player to input a voice according to the target machine output sentence, thereby collecting the target player voice.
Wherein the target machine output statement outputs content for the first machine. The first text content of the target machine output sentence comprises the text content corresponding to the target sentence.
In the plot development process of the target interactive game, the text content corresponding to the target sentence appears at least twice. The target sentence which appears at least twice may appear in the same human-computer dialog scenario or may appear in different human-computer dialog scenarios.
In a specific implementation, the target sentence may be a skill name, a character name, and other contents in the target human-computer interaction script of the target interactive game, which is not limited herein. When the server collects the voice of the target player according to the output sentence of the target machine through the terminal equipment, the server does not send out inquiry to the user, so that the continuity of the plot development is improved, and the user experience is improved.
For example, in the process of man-machine interaction for a target interactive game, a scene of learning skills may appear, and the names of the skills are assumed to be 'first type-skyward-raised'. After the player demands the intention to show that the skill is to be learned and the skill is learned, the server sends a target machine output sentence to the terminal device to remind the player to recite the skill name, and explains the invitation after the player recites the skill name. For example, after the player learns the skill, the target machine output sentence played by the terminal device is: you have learned to go to the first form of abstinence-skullar, you can now say "use the first form-skullar" to use this skill. Therefore, the player is guided to input the voice information of the corresponding target sentence, namely the "first type-skyhook", so that the collection of the voice information of the player aiming at the target sentence is realized. Further, if the voice information (i.e. the voice of the target player) input by the player is "use the first form of" raise the heaven palm ", the terminal device may also play an explanation for the skill, for example," you use the first form of "raise the heaven palm" facing a giant tripod around the body to reach the jack, and the giant tripod should sound and break up, and the raise the heaven palm has a destruction force at the jack level through testing. Further explaining the skills can improve the rationality of the plot and improve the experience of the user.
In one possible example, the first text content of the target machine output sentence may further include a content confirmation sentence for the target sentence, the content confirmation sentence being used to confirm to the player whether the content of the target sentence needs to be changed. If the change is needed, the player can input corresponding content in a voice input or character input mode according to the requirement, the server updates the content into the content of a new target sentence, and sends a target machine output sentence for guiding the player to input the updated target sentence through the terminal equipment to the terminal equipment, so that the voice of the target player is collected. Therefore, the player can modify the content corresponding to the target sentence according to personal requirements, the playability of the game can be improved, and the game experience of the player can be further improved.
For example, the target machine output statement may be: you have learned the first form of abstinence-the skyhook, you can now say "use the first form-the skyhook" to use this skill; or, you can now say "change skill name" to change the name of "first-heaven palm" absolutely. The "first type — ascending astrolabe" is the target sentence, or you can say "change skill name" now, and change the "first type — ascending astrolabe" the name of the academia "as the content confirmation sentence. At this time, if the player inputs "change the absolute name". The server then sends an input prompt to the terminal device to prompt the player to input the target sentence to be updated. For example, the input prompt sent by the server is: you can now enter the name of the absolution. The player may change the "first form-heavenly hand" absolute name to "heavenly hand" based on the input prompt. After the name of the dead school is updated, the server can send the updated target machine output statement to the terminal device: you can now say "use the skyhook" to use the first to be an absolution form, skyhook.
Further, if the game platform is a platform capable of playing a game for a plurality of works, such as a voice interactive novel platform, the server may portray the player according to the content of all target sentences used by the player, so as to determine the preference of the player, recommend the works according with the preference of the player for the player, and further improve the user experience.
Step S130, receiving a target player voice corresponding to the target sentence, which is input by the player from the terminal device according to the target machine output sentence.
Specifically, the server may receive, from the terminal device, a target player voice that is voice information corresponding to the target sentence input by the player through the terminal device according to the target machine output sentence.
Specifically, the target player speech may include textual content corresponding to the target sentence, or the target player speech may include textual content corresponding to the target sentence and other content. The other content may be any content, and is not limited herein. For example, if the target sentence is a "first type — skyward" then the content of the target player's voice may be a "first type — skyward". Alternatively, the content of the target player's voice may be "use the first form," skyrocket, "where" use "is other content.
In a specific implementation, when the target player voice includes a target sentence and other content, the server may extract a part of voice identical to the content of the target sentence from the target player voice, and use the part of voice as a target voice for subsequent use. For example, if the content of the target player's voice may be "use the first type — skyward", the server may extract the portion of the voice of the "first type — skyward" and use the portion of the voice as the target voice in the subsequent process.
Step S140, obtaining a target corresponding relationship between the target player voice from the terminal device and the target sentence.
The target corresponding relation refers to a mapping relation between target player voice and target sentences. Specifically, the mapping relationship between the target player voice and the target sentence refers to a target corresponding relationship between the target voice and the target sentence.
In a specific implementation, the target correspondence may be established by the terminal device. Specifically, when the target correspondence relationship is established by the terminal device, the terminal device needs to determine the content of the target sentence. In terms of determining the content of the target sentences by the terminal device, the server may set first marks for the target sentences in the target machine output sentences sent to the terminal device, the terminal device may recognize the first marks to confirm the content of the target sentences, and each target sentence may set a blind first mark for distinguishing the content of the respective target sentence. Or, the server may send the target machine output sentence to the terminal device and send the target sentence to the terminal device separately, so that the terminal device determines the content of the target sentence, thereby reducing the identification process of the terminal device. After the terminal device collects the voice of the target player, the terminal device can establish a target corresponding relation between the target sentence and the voice of the target player, and sends the voice of the target player and the target corresponding relation to the server.
Or, in a specific implementation, the target correspondence may be established by a server. Specifically, the terminal device may send the collected target player voice to the server, and the server may recognize a target sentence in the output sentence of the target machine and establish a target corresponding relationship between the target sentence and the target player voice. In the aspect that the server identifies the target sentence in the target machine output sentence, the server may first obtain the target machine output sentence, and then determine the target sentence by identifying the first mark of the corresponding target sentence in the target output sentence. Alternatively, the server may parse the semantic meaning of the target machine output sentence, and determine the target sentence according to the semantic parsing result. The method can be specifically set according to requirements.
Step S150, searching all the second machine output contents that are the same as the contents of the target sentence in the target man-machine dialog script of the target interactive game.
Specifically, the server stores a target human-computer dialog script of the target interactive game in a text format, and when the server receives the target player voice and the target corresponding relation, the server can search the text of the target human-computer dialog script for the content identical to the text content of the target sentence, and determine the content as the content to be replaced.
Step S160, according to the target correspondence, updating the second output mode configuration corresponding to all the second machine output contents that are the same as the contents of the target sentence into the target player voice.
In a specific implementation, the speaking mode configuration of the first output content and the second output content may be synthesized speech or artificial dubbing. When the speech is synthesized, the parameter configuration of the tone, the volume, and the like of the speaking mode configuration corresponding to the first output content or the second output content may adopt preset parameter configuration, where the preset parameter may be a parameter developed and written by the server, or the preset parameter may also be a parameter generated by adjusting according to the setting of the player.
In specific implementation, in the aspect of updating the second output mode configuration to the target player voice, the server may directly use the acquired target player voice to replace the original second output mode configuration, so that the terminal device directly plays the player recording when playing the second output content later, thereby simplifying the server processing task while ensuring that the tone color, the volume, the tone pause, and the like are all consistent. Alternatively, the server may acquire the tone, volume, and tone of the player from the acquired target player voice, and readjust the parameters to synthesize sound that matches the player tone, volume, and tone, and update the second output mode configuration.
It can be seen that, in this embodiment, the server invokes the human-computer interaction engine to perform human-computer interaction with the player of the terminal device according to the target human-computer interaction scenario of the target interactive game; in the process of the man-machine interaction, the server sends a target machine output statement to the terminal equipment, wherein the target machine output statement is used for guiding a player to input a target statement through the terminal equipment; after the server receives target player voice corresponding to the target sentence and the target corresponding relation between the target player voice and the target sentence, which are input by a player according to the target machine output sentence, from the terminal equipment, the server searches all second machine output contents which are the same as the contents of the target sentence in a target man-machine interactive script of the target interactive game; and according to the target corresponding relation, configuring and updating a second output mode corresponding to all the second machine output contents which are the same as the contents of the target sentence into the target player voice. Therefore, in the subsequent progress of the target interactive game, the server can play the voice of the target player corresponding to the target sentence through the terminal equipment to replace the second output content corresponding to the target sentence, the realization of the player using the own voice to participate in the plot interaction is facilitated, the participation and immersion of the player in the interactive game are improved, and the game experience of the player is improved.
In one possible example, after obtaining the target correspondence between the target player voice and the target sentence from the terminal device, the method includes: comparing the target player voice with the target sentence to obtain a first similarity; judging whether the first similarity reaches a first preset threshold value or not; if so, determining that the target corresponding relation between the target player voice and the target sentence is established; if not, determining that the target corresponding relation between the voice of the target player and the target sentence is not established, and sending a re-input sentence to the terminal equipment, wherein the re-input sentence is used for guiding the player to re-input the target sentence through the terminal equipment.
The first similarity is the similarity between the content corresponding to the target player voice and the content corresponding to the target sentence.
The first preset threshold is a requirement standard for similarity between the content corresponding to the voice of the target player and the content corresponding to the target sentence, and a specific numerical value of the first preset threshold can be set according to requirements without further limitation.
In specific implementation, the server can convert the voice of the target player into a text through the voice recognition model, and then compare the content of the text with the text content of the target sentence, so as to obtain the first similarity through comparison. If the first similarity is greater than or equal to the first preset threshold, it indicates that the target player voice input by the player corresponds to the target sentence, that is, the target player voice is suitable for the scene of the dialogue in which the target sentence is located, so that it can be determined that the established target correspondence is accurate. If the first similarity is smaller than the first preset threshold, it is indicated that the target player voice input by the player does not correspond to the target sentence, that is, the target player voice cannot be used in the scene of the dialogue scenario where the target sentence is located, that is, the established target corresponding relationship is inaccurate.
Further, when the first similarity does not reach the first preset threshold, the content of the re-input sentence sent by the server to the terminal device may be the same as or different from the target machine output sentence. For example, when the target machine outputs a statement: you have learned to learn the first form of abstinence, the skateboarding, and you can now say "use the first form, the skateboarding, to use the skill". The re-input statement may be the same as it, i.e.: you have learned the first form of abstinence-the skyhook, you can now say "use the first form-the skyhook, to use the skill". Or, the re-input sentence may be different from the target machine output sentence, and the re-input sentence is used to instruct the user to re-input the content, for example, the re-input sentence may be: you can now say "use the first form, the skyhook" to use this skill.
As can be seen, in this example, the server may determine that the target correspondence relationship is not satisfied by comparing the similarity between the target player voice and the target sentence, so as to determine whether the target player voice can be used to update the second output mode configuration. The method is beneficial to improving the accuracy of using the target player voice in the target interactive game, ensuring the consistency of the content and improving the experience of the player.
In one possible example, defining the first occurring target sentence in the second machine output content in the target human-computer dialog script of the target interactive game as a first target content and the subsequent occurring target sentence as a second target content, and sending the target machine output sentence to the terminal device includes: and when the second machine output content is detected to be the first target content, sending a target machine output statement to the terminal equipment.
Specifically, in the target human-machine dialog scenario of the target interactive game, the target sentence appears at least twice in all the second machine output contents. And taking the plot development process as a time axis, and sequentially presenting the target sentences which appear at least twice according to the time axis sequence, wherein the first-appearing target sentences are first target contents, and the subsequent-appearing target sentences are second target contents.
In a specific implementation, a second mark may be set for the first target content, and when the server parses the player input information to obtain a second output mode configuration corresponding to the player demand intention, the second mark is included, that is, when the server detects that the second machine output content is the first target content, the server may send a target machine output statement to the terminal device.
Further, the server can be used for updating all subsequent second target contents according to the target player voice from the terminal equipment received by the target machine output sentence, so that the player can hear own voice to participate in interaction in the subsequent development of the plot, and the player experience is improved. Or the server may be configured to update all corresponding target sentences according to the target player voices received by the target machine output sentences from the terminal device, that is, both the first target content and the second target content are updated to the target player voices, so that the player can hear his own voice to participate in the interaction during the subsequent development of the scenario and when the player restarts the target interactive game, thereby further improving the player experience.
In a specific implementation, when it is detected that the output content of the second machine is the second target content, the target player voice is sent to the terminal device. For example, in the process of human-computer interaction, after step S160, if the server receives the voice input information of the player, and analyzes the voice input information to know that the player' S required intent is intent a representing the semantic meaning of the target sentence, the human-computer interaction engine may match the target sentence corresponding to the intent a and the voice of the target player corresponding to the target sentence, and play the voice of the target player through the terminal device.
For example, referring to fig. 5 or fig. 6, when the server plays through the terminal device "you run on the road and meet character a, character a launches an attack to you, and you are now in the first form — rise to the skyhook, or kneel and ask him for something", the player can input voice input information according to the selection, which may be "counterattack". After the server analyzes the voice input information, the player demand intention is obtained as the intention of representing the target sentence in the form of the first expression-ascending antenna, at the moment, the server matches the target player voice in the form of the first expression-ascending antenna corresponding to the target sentence, and the target player voice is played through the terminal equipment. With further reference to fig. 5, fig. 5 shows a game platform interface with interactive novels, which includes a scenario area displaying the content of a portion of the targeted human-machine dialog scenario for the player to obtain and understand the foregoing content for the player to respond. The game platform interface also includes a player input area through which a player may input information by activating the player input area, or through which the current input may be known. With further reference to fig. 6, fig. 6 shows a game platform interface for an audiovisual game that may display an image of a current scene of the audiovisual game to more intuitively engage a user therein. In specific implementation, the dialog box in the game platform interface may not be displayed, and the player can directly know the current progress of the game in a voice playing mode.
Therefore, in this example, when the target sentence appears in the target man-machine dialog script of the target interactive game for the first time, the server collects the voice of the target player and updates the configuration of the second output mode, so that the player can participate in the interaction by using the own voice in the whole game process, and the setting can improve the experience of the player.
In one possible example, after updating the second output mode configuration corresponding to all the second machine output contents identical to the contents of the target sentence to the target player voice, the method includes: when the second machine output content is detected to be second target content, sending the target player voice to the terminal equipment; receiving feedback information from a terminal device, wherein the feedback information is used for representing a feedback operation performed by a player aiming at the voice of the target player; and updating the target player voice according to the feedback information.
Specifically, the feedback operation includes a confirmation operation and a change operation.
In the specific implementation, when a player plays a target player voice through the terminal equipment, if the player is satisfied with the effect of the target player voice, the confirmation information can be input through the terminal equipment; if the player is not satisfied with the effect of the target player's voice, the modification information may be input through the terminal device. The player may input the modification information by means of voice input, or the player may input the modification information by means of triggering a display screen of the terminal device, which will not be described in detail herein.
Specifically, if the feedback information is confirmation information, the feedback operation is null, that is, the server does not process the feedback information, and still uses the current target player voice. If the feedback information is the change information, the feedback operation is as follows: the player's voice is re-captured to update the current target player's voice, or a second output mode configuration prior to the current target player's voice update is used, or the player may re-adjust the parameters to generate a new speaking mode configuration that is different from the previous second output mode configuration.
In a specific implementation, when the feedback information is a change operation and the feedback operation is to re-collect the voice of the player to update the current target player voice, in the aspect of updating the target player voice according to the feedback information, the server may perform a voice collection operation for the target sentence according to the feedback information to obtain a player update voice, and update the target player voice corresponding to all the second machine output contents that are the same as the content of the target sentence into the player update voice. The voice collecting operation for the target sentence can refer to the above step of collecting the voice of the target player, and is not detailed here. Therefore, the player can record and use the satisfied voice for interaction, the interaction effect is further improved, and the participation of the player is improved.
In another implementation, when the feedback information is a change operation and the feedback operation is a second output mode configuration before the current target player voice is updated, the server may update, according to the feedback information, the target player voices corresponding to all the second machine output contents that are the same as the contents of the target sentence to the second output mode configuration in updating the target player voice according to the feedback information. This is advantageous for meeting the needs of some players who do not want to participate in the interaction using their own voice.
Or, in a specific implementation, when the feedback information is a change operation, and the feedback operation is that the player can readjust the parameters to generate a new configuration of the speaking manner different from the previous second output configuration, the player can set parameters of data such as tone, volume, and the like of the new configuration of the speaking manner to configure and generate a new playing voice, and replace the currently used target player voice with the playing voice. Therefore, the requirements of partial players who want to use own voice to participate in interaction are met, the output content of the second machine corresponding to the target sentence can be adjusted to be satisfactory voice, and the experience of the players can be further improved.
In a specific implementation, after the server plays the first and second target contents through the terminal device, the server may send inquiry information to the player to inquire whether the player changes the currently used target player voice. The first second target content refers to second target content that appears for the first time with the plot progress as a time axis. The inquiry information is sent to the player after the first second content is played, so that the player can confirm or update the currently used target player voice in time, and the experience of the player is ensured.
Therefore, in this example, the server executes the feedback operation according to the feedback information, and can update the currently used target player voice according to the player requirement, so that it is ensured that the player can interact with the player by using the favorite sound in the subsequent plot development process, and the player experience is further improved.
Referring to fig. 4, fig. 4 is a schematic flowchart of another processing method for dubbing by a player in an interactive game, which may be applied to a terminal device 200 for performing voice interaction by a player in a voice interaction system 10 shown in fig. 1, where the voice interaction system includes a server and the terminal device, where the server is deployed with a human-computer dialog engine, human-computer dialog logic of the human-computer dialog engine is assigned by a target human-computer dialog script of a target interactive game, the target human-computer dialog script includes first machine output content for promoting scenario development and second machine output content for representing parsing semantics of a statement input by the player, the first machine output content includes first text content and a first output mode configuration, the second machine output content includes second text content and a second output mode configuration, and the first output mode configuration and the second output mode configuration refer to a mode configuration when the first text content or the second text content is output by the terminal device, and the mode configuration includes parameters of tone, tone and volume; the method comprises the following steps:
step S210, receiving and playing the target machine output statement from the server.
Wherein the target machine output sentence is used for guiding the player to input the target sentence through the terminal equipment. The specific contents of the target machine input statement and the target statement may be referred to above, and are not described herein again.
Step S220, obtaining the voice of the target player input by the player according to the target sentence.
In specific implementation, the terminal device can start the voice acquisition function after playing the output sentence of the target machine, so as to acquire and store the voice of the target player.
Step S230, comparing the target player voice with the target sentence to obtain a second similarity.
The second similarity is the similarity between the content corresponding to the target player voice and the content corresponding to the target sentence.
In specific implementation, the terminal device may convert the voice of the target player into a text through the voice recognition model, and then compare the content of the text with the text content of the target sentence, so as to obtain the first similarity through comparison.
Step S240, determining whether the second similarity reaches a second preset threshold.
The second preset threshold is a requirement standard for similarity between the content corresponding to the voice of the target player and the content corresponding to the target sentence, and a specific numerical value of the second preset threshold can be set according to requirements without further limitation. The second preset threshold may be the same as the first preset threshold, so as to ensure consistency of comparison results.
Step S250, if yes, establishing a target corresponding relationship between the target player voice and the target statement, and sending the target player voice and the target corresponding relationship to the server.
In a specific implementation, if the second similarity is greater than or equal to the second preset threshold, it is indicated that the voice of the target player input by the player corresponds to the target sentence, that is, the voice of the target player can be used in a scene of a dialog where the target sentence is located, so that the terminal device can establish a target corresponding relationship according to the comparison result and send the voice of the target player and the target corresponding relationship to the server, so that the server updates the configuration of the second output mode.
Further, when the target corresponding relationship is established by the terminal device, the server does not need to establish the corresponding relationship again.
Further, after receiving the target player voice and target corresponding relation, the server may compare the target player voice and the target sentence again to obtain a first similarity; and judging whether the first similarity reaches a first preset threshold value. Thereby improving the accuracy of the content corresponding to the target player's voice.
And step S260, if not, the voice of the target player is acquired again.
Specifically, if the second similarity is smaller than the second preset threshold, it indicates that the target player voice input by the player does not correspond to the target sentence, that is, the target player voice cannot be used in the scene of the dialogue in which the target sentence is located, that is, it indicates that the currently collected target player voice is unavailable, and therefore, the target player voice needs to be collected again.
It can be seen that, in this embodiment, the terminal device not only allows the player to perform voice interaction with the server, but also can compare the collected voice of the target player with the collected target sentence to obtain a second similarity; and judging whether the second similarity reaches a second preset threshold value. Thereby ensuring the accuracy of the collected target player voice. The accuracy of the target player voice is determined by the terminal equipment, so that the wrong processes of transmission, processing and the like of the target player voice can be reduced, and the man-machine interaction efficiency of the player and the server is improved. Moreover, the establishment of the target corresponding relation on the terminal side can not only reduce the processing pressure of the server, but also improve the updating efficiency of the server.
The present application may perform the division of the functional units for the server according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Further, the terminal device may not delete the stored target player voice and target correspondence relationship after transmitting the target player voice and target correspondence relationship to the server. Therefore, after the server updates the second output mode configuration to the target player voice, if the terminal device is required to play the target player voice in the subsequent plot progress, the server sends the target player voice calling instruction to the terminal device, the target player voice does not need to be transmitted, and the man-machine interaction efficiency can be improved. The target player voice calling instruction is used for instructing the terminal device to play the target player voice.
Fig. 7 is a block diagram illustrating functional units of a processing device for dubbing by a player in an interactive game according to an embodiment of the present disclosure. The processing device 30 for dubbing by a player in an interactive game can be applied to a server 100 in a voice interactive system shown in fig. 1, where the voice interactive system includes the server 100 and a terminal device 200 for voice interaction between players, where the server 100 is deployed with a human-computer dialogue engine, human-computer dialogue logic of the human-computer dialogue engine is given by a target human-computer dialogue script of a target interactive game, the target human-computer dialogue script includes a first machine output content for promoting scenario development and a second machine output content for representing parsing semantics of input sentences of the players, the first machine output content includes a first text content and a first output mode configuration, the second machine output content includes a second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to a speaking mode configuration when the second text content in the first text is output through the terminal device, and the speaking mode configuration includes parameter configurations of tone, volume, or volume; the processing device 30 for dubbing by the player in the interactive game comprises:
a calling unit 310, configured to call the human-computer interaction engine to perform human-computer interaction with a player of the terminal device according to a target human-computer interaction scenario of the target interaction game;
a sending unit 320, configured to send a target machine output statement to the terminal device in the process of the human-computer interaction, where the target machine output statement is used to guide a player to input a target statement through the terminal device;
a receiving unit 330 for receiving a target player voice corresponding to the target sentence input from the player of the terminal device according to the target machine output sentence;
the receiving unit 330 is further configured to obtain a target corresponding relationship between the target player voice from the terminal device and the target sentence;
an updating unit 340, configured to search, in a target human-machine dialog script of the target interactive game, all the second machine output contents that are the same as the contents of the target sentence;
the updating unit 340 is further configured to update, according to the target correspondence, the second output mode configuration corresponding to all the second machine output contents that are the same as the contents of the target sentence into the target player voice.
In a possible example, the processing device 30 for player dubbing in the interactive game further includes a comparison and determination unit, and the comparison and determination unit is specifically configured to: after the target corresponding relation between the target player voice and the target sentence from the terminal equipment is obtained, comparing the target player voice and the target sentence to obtain a first similarity; judging whether the first similarity reaches a first preset threshold value or not; if so, determining that the target corresponding relation between the target player voice and the target sentence is established; and if not, determining that the target corresponding relation between the voice of the target player and the target sentence is not established, and sending a re-input sentence to the terminal equipment, wherein the re-input sentence is used for guiding the player to re-input the target sentence through the terminal equipment.
In a possible example, the first occurring target sentence in the second machine output content in the target human-machine dialog scenario of the target interactive game is defined as a first target content, the subsequent occurring target sentence is defined as a second target content, and in the aspect of sending the target machine output sentence to the terminal device, the sending unit is specifically configured to: and when the second machine output content is detected to be the first target content, sending a target machine output statement to the terminal equipment.
In a possible example, the processing apparatus 30 for player dubbing in the interactive game further includes a feedback processing unit, and the feedback processing unit is specifically configured to: after updating the second output mode configuration corresponding to all the second machine output contents which are the same as the contents of the target sentence into the target player voice, when detecting that the second machine output contents are second target contents, sending the target player voice to the terminal equipment; receiving feedback information from a terminal device, wherein the feedback information is used for representing a feedback operation performed by a player aiming at the voice of the target player; and updating the target player voice according to the feedback information.
In one possible example, in the aspect of updating the target player voice according to the feedback information, the feedback detection unit is specifically configured to: and according to the feedback information, executing voice collection operation aiming at the target sentence to obtain player update voice, and updating the target player voice corresponding to all the second machine output contents which are the same as the contents of the target sentence into the player update voice.
In one possible example, in the aspect of updating the target player voice according to the feedback information, the feedback detection unit is specifically configured to: and updating the target player voices corresponding to all the second machine output contents which are the same as the contents of the target sentences into the second output mode configuration according to the feedback information.
In the case of using an integrated unit, a block diagram of functional units of the processing device 40 for player dubbing in an interactive game provided by the embodiment of the present application is shown in fig. 8. In fig. 8, a processing device 40 for player dubbing in an interactive game includes: a processing module 420 and a communication module 410. The processing module 420 is used for controlling and managing actions of the processing device 30 for player dubbing in the interactive game, such as steps performed by the invoking unit 310, the sending unit 320, the receiving unit 330, the updating unit 340, and/or other processes for performing the techniques described herein. The communication module 410 is used to support the interaction between the processing device 40 for player dubbing and other devices in the interactive game. As shown in fig. 8, the processing device 40 for player dubbing in the interactive game may further comprise a storage module 430, and the storage module 430 is used for storing program codes and data of the processing device 30 for player dubbing in the interactive game.
The processing module 420 may be a processor or a controller, and for example, may be a Central Processing Unit (CPU), a general-purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure of the embodiments of the application. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication module 410 may be a transceiver, an RF circuit or communication interface, or the like. The storage module 430 may be a memory.
All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again. The processing device for the player dubbing in the interactive game can execute the steps executed by the server in the processing method for the player dubbing in the interactive game shown in fig. 2.
Another processing apparatus for dubbing by a player in an interactive game provided in the embodiment of the present application is a terminal device that can be applied to a voice interaction by a player in a voice interaction system, where the voice interaction system includes a server and the terminal device, where the server includes a man-machine dialog engine, a man-machine dialog logic of the man-machine dialog engine is given by a target man-machine dialog scenario of a target interactive game, the target man-machine dialog scenario includes a first machine output content for promoting a scenario development and a second machine output content for representing a sentence parsing semantic of a player input, the first machine output content includes a first text content and a first output mode configuration, the second machine output content includes a second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to a speaking mode configuration when the first text content or the second text content is output through the terminal device, and the speaking mode configuration includes parameter configurations for tone, and volume; the device comprises:
a second receiving unit for receiving and playing a target machine output sentence from the server, the target machine output sentence being used for guiding a player to input a target sentence through the terminal device;
a second acquisition unit configured to acquire a target player voice input by a player according to the target sentence;
the second comparison unit is used for comparing the target player voice with the target sentence to obtain a second similarity;
a second judging unit, configured to judge whether the second similarity reaches a second preset threshold;
if so, establishing a target corresponding relation between the target player voice and the target sentence, and sending the target player voice and the target corresponding relation to the server;
and if not, the voice of the target player is obtained again.
Embodiments of the present application also provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program enables a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes a server.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-mentioned method of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A processing method for dubbing by a player in an interactive game is characterized in that the processing method is applied to a server in a voice interactive system, the voice interactive system comprises the server and terminal equipment for voice interaction of the player, wherein the server is provided with a man-machine conversation engine, man-machine conversation logic of the man-machine conversation engine is given by a target man-machine conversation scenario of a target interactive game, the target man-machine conversation scenario comprises first machine output content for promoting scenario development and second machine output content for representing analysis semantics of input sentences of the player, the first machine output content comprises first text content and a first output mode configuration, the second machine output content comprises second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to speaking mode configuration when the first text content or the second text content is output through the terminal equipment, and the speaking mode configuration comprises parameter configuration of tone, tone and volume; the method comprises the following steps:
calling the human-computer conversation engine to carry out human-computer interaction with the player of the terminal equipment according to the target human-computer conversation script of the target interactive game;
in the process of the man-machine interaction, sending a target machine output statement to the terminal equipment, wherein the target machine output statement is used for guiding a player to input a target statement through the terminal equipment;
receiving a target player voice corresponding to the target sentence input by the player from the terminal equipment according to the target machine output sentence;
acquiring a target corresponding relation between the target player voice and the target sentence from the terminal equipment;
searching all second machine output contents which are the same as the contents of the target sentences in a target man-machine conversation scenario of the target interactive game;
and according to the target corresponding relation, updating the second output mode configuration corresponding to all the second machine output contents which are the same as the contents of the target sentence into the target player voice.
2. The method of claim 1, wherein said obtaining a target correspondence of the target player speech from the terminal device to the target statement comprises:
comparing the target player voice with the target sentence to obtain a first similarity;
judging whether the first similarity reaches a first preset threshold value or not;
if so, determining that the target corresponding relation between the target player voice and the target sentence is established;
and if not, determining that the target corresponding relation between the voice of the target player and the target sentence is not established, and sending a re-input sentence to the terminal equipment, wherein the re-input sentence is used for guiding the player to re-input the target sentence through the terminal equipment.
3. The method of claim 1, wherein defining the first occurring target sentence as a first target content and the subsequent occurring target sentence as a second target content in the second machine output content in the target human machine dialog script of the target interactive game, the sending the target machine output sentence to the terminal device, comprises:
and when the second machine output content is detected to be the first target content, sending a target machine output statement to the terminal equipment.
4. The method of claim 3, wherein after updating the second output modality configuration corresponding to all of the second machine output content that is the same as the content of the target sentence to the target player voice, comprising:
when the second machine output content is detected to be second target content, sending the target player voice to the terminal equipment;
receiving feedback information from a terminal device, wherein the feedback information is used for representing a feedback operation performed by a player aiming at the voice of the target player;
and updating the target player voice according to the feedback information.
5. The method of claim 4, wherein said updating said target player speech based on said feedback information comprises:
and according to the feedback information, executing voice collection operation aiming at the target sentence to obtain player update voice, and updating the target player voice corresponding to all the second machine output contents which are the same as the contents of the target sentence into the player update voice.
6. The method of claim 4, wherein said updating said target player speech based on said feedback information comprises:
and updating the target player voices corresponding to all the second machine output contents which are the same as the contents of the target sentences into the second output mode configuration according to the feedback information.
7. A processing method for dubbing of a player in an interactive game is characterized in that the processing method is applied to a terminal device for voice interaction of the player in a voice interaction system, the voice interaction system comprises a server and the terminal device, wherein the server is provided with a man-machine conversation engine, man-machine conversation logic of the man-machine conversation engine is given by a target man-machine conversation scenario of a target interactive game, the target man-machine conversation scenario comprises first machine output content for promoting scenario development and second machine output content for representing analysis semantics of input sentences of the player, the first machine output content comprises first text content and a first output mode configuration, the second machine output content comprises second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to speaking mode configuration when the first text content or the second text content is output through the terminal device, and the speaking mode configuration comprises parameter configuration of tone, tone and volume; the method comprises the following steps:
receiving and playing a target machine output sentence from the server, wherein the target machine output sentence is used for guiding a player to input a target sentence through the terminal equipment;
acquiring target player voice input by a player according to the target sentence;
comparing the target player voice with the target sentence to obtain a second similarity;
judging whether the second similarity reaches a second preset threshold value or not;
if so, establishing a target corresponding relation between the target player voice and the target sentence, and sending the target player voice and the target corresponding relation to the server;
and if not, the voice of the target player is obtained again.
8. A processing device for dubbing by a player in an interactive game is characterized in that the processing device is applied to a server in a voice interactive system, the voice interactive system comprises the server and terminal equipment for voice interaction of the player, wherein the server is provided with a man-machine conversation engine, man-machine conversation logic of the man-machine conversation engine is given by a target man-machine conversation scenario of a target interactive game, the target man-machine conversation scenario comprises first machine output content for promoting scenario development and second machine output content for representing analysis semantics of input sentences of the player, the first machine output content comprises first text content and a first output mode configuration, the second machine output content comprises second text content and a second output mode configuration, the first output mode configuration and the second output mode configuration refer to speaking mode configuration when the first text content or the second text content is output through the terminal equipment, and the speaking mode configuration comprises parameter configuration of tone, tone and volume; the device comprises:
the calling unit is used for calling the human-computer conversation engine to carry out human-computer interaction with the player of the terminal equipment according to the target human-computer conversation script of the target interactive game;
the sending unit is used for sending a target machine output statement to the terminal equipment in the process of the man-machine interaction, and the target machine output statement is used for guiding a player to input the target statement through the terminal equipment;
a receiving unit configured to receive a target player voice corresponding to the target sentence input from the player of the terminal device according to the target machine output sentence;
the receiving unit is further configured to obtain a target correspondence between the target player voice from the terminal device and the target sentence;
the updating unit is used for searching all second machine output contents which are the same as the contents of the target sentences in a target man-machine conversation scenario of the target interactive game;
the updating unit is further configured to update, according to the target correspondence, the second output mode configuration corresponding to all the second machine output contents that are the same as the contents of the target sentence into the target player voice.
9. A server, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the steps in the method according to any of claims 1-7.
CN202211512932.5A 2022-11-30 2022-11-30 Method for processing player dubbing in interactive game and related device Active CN115565518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211512932.5A CN115565518B (en) 2022-11-30 2022-11-30 Method for processing player dubbing in interactive game and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211512932.5A CN115565518B (en) 2022-11-30 2022-11-30 Method for processing player dubbing in interactive game and related device

Publications (2)

Publication Number Publication Date
CN115565518A CN115565518A (en) 2023-01-03
CN115565518B true CN115565518B (en) 2023-03-24

Family

ID=84770184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211512932.5A Active CN115565518B (en) 2022-11-30 2022-11-30 Method for processing player dubbing in interactive game and related device

Country Status (1)

Country Link
CN (1) CN115565518B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089728B (en) * 2023-03-23 2023-06-20 深圳市人马互动科技有限公司 Method and related device for generating voice interaction novel for children

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792013A (en) * 2016-11-29 2017-05-31 青岛海尔多媒体有限公司 A kind of method, the TV interactive for television broadcast sounds
CN109618223A (en) * 2019-01-28 2019-04-12 北京易捷胜科技有限公司 A kind of sound replacement method
CN113938513A (en) * 2021-07-12 2022-01-14 海南元游信息技术有限公司 Interaction method, device and equipment based on virtual game object
CN114025186A (en) * 2021-10-28 2022-02-08 广州方硅信息技术有限公司 Virtual voice interaction method and device in live broadcast room and computer equipment
CN114011087A (en) * 2021-10-12 2022-02-08 北京天图万境科技有限公司 Interaction system and distribution system for script killer
CN115410601A (en) * 2022-11-01 2022-11-29 深圳市人马互动科技有限公司 Voice interaction method based on scene recognition in man-machine conversation scene

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7103089B2 (en) * 2018-09-06 2022-07-20 トヨタ自動車株式会社 Voice dialogue device, voice dialogue method and voice dialogue program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792013A (en) * 2016-11-29 2017-05-31 青岛海尔多媒体有限公司 A kind of method, the TV interactive for television broadcast sounds
CN109618223A (en) * 2019-01-28 2019-04-12 北京易捷胜科技有限公司 A kind of sound replacement method
CN113938513A (en) * 2021-07-12 2022-01-14 海南元游信息技术有限公司 Interaction method, device and equipment based on virtual game object
CN114011087A (en) * 2021-10-12 2022-02-08 北京天图万境科技有限公司 Interaction system and distribution system for script killer
CN114025186A (en) * 2021-10-28 2022-02-08 广州方硅信息技术有限公司 Virtual voice interaction method and device in live broadcast room and computer equipment
CN115410601A (en) * 2022-11-01 2022-11-29 深圳市人马互动科技有限公司 Voice interaction method based on scene recognition in man-machine conversation scene

Also Published As

Publication number Publication date
CN115565518A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN110288077B (en) Method and related device for synthesizing speaking expression based on artificial intelligence
US10438586B2 (en) Voice dialog device and voice dialog method
CN104488027A (en) Speech processing system and terminal device
CN107403011B (en) Virtual reality environment language learning implementation method and automatic recording control method
CN110808038B (en) Mandarin evaluating method, device, equipment and storage medium
CN115565518B (en) Method for processing player dubbing in interactive game and related device
CN112908308B (en) Audio processing method, device, equipment and medium
CN112165627B (en) Information processing method, device, storage medium, terminal and system
CN110781329A (en) Image searching method and device, terminal equipment and storage medium
US10529333B2 (en) Command processing program, image command processing apparatus, and image command processing method
CN110111782A (en) Voice interactive method and equipment
CN113643684B (en) Speech synthesis method, device, electronic equipment and storage medium
CN109725798B (en) Intelligent role switching method and related device
CN115206342A (en) Data processing method and device, computer equipment and readable storage medium
KR100898104B1 (en) Learning system and method by interactive conversation
CN110516043B (en) Answer generation method and device for question-answering system
CN114999457A (en) Voice system testing method and device, storage medium and electronic equipment
CN114842710A (en) Pronunciation training method, program storage medium, and terminal device
CN114783408A (en) Audio data processing method and device, computer equipment and medium
KR20190070682A (en) System and method for constructing and providing lecture contents
CN108364631B (en) Speech synthesis method and device
CN111966803A (en) Dialogue simulation method, dialogue simulation device, storage medium and electronic equipment
CN112309183A (en) Interactive listening and speaking exercise system suitable for foreign language teaching
CN110717020A (en) Voice question-answering method, device, computer equipment and storage medium
CN112148861B (en) Intelligent voice broadcasting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant