CN117292682A - Virtual world-oriented voice interaction system and method - Google Patents

Virtual world-oriented voice interaction system and method Download PDF

Info

Publication number
CN117292682A
CN117292682A CN202311253812.2A CN202311253812A CN117292682A CN 117292682 A CN117292682 A CN 117292682A CN 202311253812 A CN202311253812 A CN 202311253812A CN 117292682 A CN117292682 A CN 117292682A
Authority
CN
China
Prior art keywords
module
intention
data
resource
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311253812.2A
Other languages
Chinese (zh)
Inventor
李光宇
梁培力
郭培智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhixing Qidian Suzhou Computer Technology Co ltd
Original Assignee
Zhixing Qidian Suzhou Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhixing Qidian Suzhou Computer Technology Co ltd filed Critical Zhixing Qidian Suzhou Computer Technology Co ltd
Priority to CN202311253812.2A priority Critical patent/CN117292682A/en
Publication of CN117292682A publication Critical patent/CN117292682A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a virtual world-oriented voice interaction system and a method, wherein the system comprises the following steps: the system comprises a client and a server, wherein the client is configured to a voice input module and a resource loading rendering module, and the server is configured to a voice transcription module, an intention recognition module and a vector database retrieval module; the voice input module is used for acquiring a voice instruction of a user; the voice transcription module is used for obtaining readable text instructions; the intention recognition module is used for acquiring intention data of a user; the vector database retrieval module is used for obtaining an optimal retrieval result and corresponding resource link data; the resource loading rendering module is used for downloading the resource file, analyzing and processing the resource file, loading and rendering the resource file on the target object so as to create and present the virtual character and the virtual scene which accord with the intention of the user. The invention provides a personalized interaction experience, so that a user can interact with the virtual world in an immersive manner, and the interestingness and attraction of the interaction are increased.

Description

Virtual world-oriented voice interaction system and method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a virtual world-oriented voice interaction system and a virtual world-oriented voice interaction method.
Background
With the development of modern society economy and technology, the entertainment demands and requirements of people are higher and higher. Traditional media-style movies, shows, etc. are static, linear, limiting the freedom and sense of participation in the user's entertainment experience. At the same time, conventional input devices also place restrictions on the connection of viewers to the virtual world, resulting in a lack of naturalness and deep interaction in the entertainment experience. Conventional speech recognition systems typically require predefined commands and phrases that make it difficult to understand the user's natural language instructions accurately, especially when complex scene changes. Virtual reality and augmented reality technologies have evolved to provide a more immersive visual experience for the viewer, yet they remain limited in interactivity by some input hardware.
Disclosure of Invention
Therefore, the invention aims to solve the technical problems that the intelligent degree of a voice interaction system is low in the prior art, and the technologies of language transcription, a distributed large language model, a vector database, resource loading rendering and the like are introduced.
In order to solve the above technical problems, the present invention provides a virtual world-oriented voice interaction system, which includes: the system comprises a client and a server, wherein the client is configured to a voice input module and a resource loading rendering module, and the server is configured to a voice transcription module, an intention recognition module and a vector database retrieval module;
the voice input module is used for acquiring a voice instruction of a user;
the voice transcription module is used for identifying and analyzing the voice command to obtain a readable text command;
the intention recognition module is used for understanding and processing the readable text instruction so as to acquire intention data of a user;
the vector database retrieval module is used for vectorizing the intention data, taking the vectorized intention data as input, retrieving in a vector database, calculating the similarity between vectors, obtaining an optimal retrieval result and corresponding resource link data according to the similarity, and storing;
the resource loading rendering module is used for downloading a resource file through the resource link data according to the intention data and the optimal search result, analyzing and processing the resource file, loading and rendering the resource file onto a corresponding user intention target object in the virtual world so as to create and present a virtual role and a virtual scene which accord with the intention of a user;
the vector database comprises a behavior action database, a resource material database and a text content database, wherein the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
In one embodiment of the present invention, the intention data is key value pair data, the key includes at least one of a target entity, a target entity type, a target entity feature, an intention, a place, and a time, and the value is a text entry corresponding to the key.
In one embodiment of the invention, the vector database retrieval module comprises: a calculation and retrieval module and a storage module; the computing and searching module is used for vectorizing the intention data through an embedding technology, taking the vectorized intention data as input, searching in the vector database by utilizing a searching algorithm, calculating the similarity between vectors, obtaining an optimal searching result and corresponding resource link data according to the similarity, and the storage module is used for storing the optimal searching result and the corresponding resource link data and sending the optimal searching result and the resource link data to the resource loading and rendering module.
In one embodiment of the present invention, the resource loading rendering module includes: the system comprises an action processing sub-module, a material processing sub-module and a word processing sub-module; the action processing sub-module, the material processing sub-module and the word processing sub-module are all used for downloading corresponding material files, analyzing and processing the material files, and loading and rendering the material files on corresponding user intention target objects in the virtual world.
Based on the same inventive concept, the invention also provides a virtual world-oriented voice interaction method, which comprises the following steps:
s1: acquiring a voice instruction of a user;
s2: identifying and analyzing the voice instruction to obtain a readable text instruction;
s3: the readable text instruction is understood and processed to acquire intention data of a user;
s4: vectorizing the intention data, taking vectorized intention data as input, searching in a vector database, calculating the similarity between vectors, obtaining an optimal search result and corresponding resource link data according to the similarity, and storing;
s5: downloading a resource file through the resource link data according to the intention data and the optimal search result, analyzing and processing the resource file, loading and rendering the resource file on a corresponding user intention target object in the virtual world so as to create and present a virtual role and a virtual scene which accord with the intention of a user;
the system comprises a vector database, a character database, a resource database and a text content database, wherein the vector database comprises a behavior action database, a resource material database and a text content database, the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
In one embodiment of the present invention, in S3, a specific method for obtaining intention data of a user is: and analyzing the readable text instruction through a large language model, and deducing the intention data of the user.
In one embodiment of the present invention, the intention data is key value pair data, the key includes at least one of a target entity, a target entity type, a target entity feature, an intention, a place, and a time, and the value is a text entry corresponding to the key.
In one embodiment of the present invention, in S4, the specific method for obtaining the optimal search result and the corresponding resource link data thereof is: extracting text entries of the values, splicing the text entries, converting the text entries into vector texts through an embedding technology, taking the vector texts as input, searching in a vector database by utilizing a search algorithm, and sorting in an inverted sequence according to the similarity to obtain an optimal search result and corresponding resource link data.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes any virtual world-oriented voice interaction method when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the virtual world oriented voice interaction methods.
Compared with the prior art, the technical scheme of the invention has the following advantages:
1. the semantics of the user voice instruction are fully understood through a large language model based on deep learning and natural language processing technology, the reasoning accuracy of the intention and interaction requirements of the user is improved, replies or data in corresponding formats are generated, and a basis is provided for the follow-up accurate execution of the user instruction.
2. The historical interaction data of the user and the personal preference data of the user are retrieved through the vector database, a personalized intelligent preloading function is realized in a resource loading and rendering link, and related resources are loaded in advance according to the expected actions and possible interaction paths of the user, so that waiting time is reduced, and the continuity and fluency of interaction are increased.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which
FIG. 1 is a diagram of a virtual world oriented voice interactive system architecture in an embodiment of the present invention;
FIG. 2 is a block diagram of the vector database retrieval module depicted in FIG. 1;
FIG. 3 is a flowchart of a virtual world oriented voice interaction method implementation in an embodiment of the present invention;
description of the specification reference numerals: 100. a client; 101. a voice input module; 102. a resource loading rendering module; 1021. an action processing sub-module; 1022. a material processing sub-module; 1023. a word processing sub-module; 200. a server; 201. a voice transcription module; 202. an intention recognition module; 203. a vector database retrieval module; 301. a calculation and retrieval module; 302. and a storage module.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
Example 1
The invention also provides a virtual world-oriented voice interaction system, which is shown in fig. 1, and comprises: a client 100 and a server 200, wherein the client 100 is configured as a voice input module 101 and a resource loading rendering module 102, and the server 200 is configured as a voice transcription module 201, an intention recognition module 202 and a vector database retrieval module 203;
the voice input module 101 is configured to obtain a voice command of a user;
the voice transcription module 201 is configured to recognize and parse the voice command to obtain a readable text command;
the intention recognition module 202 is configured to understand and process the readable text instruction to obtain intention data of a user;
the vector database retrieval module 203 is configured to vector the intent data, take the vector intent data as input, retrieve in a vector database, calculate similarity between vectors, obtain an optimal retrieval result and corresponding resource link data according to the similarity, and store the optimal retrieval result;
the resource loading rendering module 102 is configured to download a resource file according to the intention data and the optimal search result through the resource link data, parse and process the resource file, load and render the resource file onto a corresponding user intention target object in the virtual world, so as to create and present a virtual character and a virtual scene that conform to the intention of the user;
the vector database comprises a behavior action database, a resource material database and a text content database, wherein the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
According to the technical scheme, the voice interaction system replaces the input mode of uploading traditional characters or pictures with voice input of a user through a voice transcription technology, then text obtained by voice transcription of the user is used as input based on a large language model, semantic relation is understood, intention of the user is deduced, data items with high matching degree with the intention of the user are searched in a vector database, corresponding resource files are downloaded, and the resource files are loaded and rendered on a user intention target object in a virtual world so as to create and present virtual roles and virtual scenes which accord with the intention of the user. By the method, the user can freely interact with the virtual world and change the virtual world in real time, so that the interest and the attraction of the interaction are improved.
The client 100 includes, but is not limited to, a terminal device with a display screen, such as a smart phone, a tablet computer, a VR head display, etc.; the server 200 is a network server with high computing power and supporting network communication.
As shown in fig. 1, a user expresses voice instructions of personal willingness and interaction requirements at a client 100 through a voice input module 101, and the instructions include, but are not limited to, interaction with a virtual character, influencing or changing a virtual scene, and the like.
After the voice input module receives the voice of the user, the client 100 performs a series of processing methods such as noise reduction and splicing on the voice data through a voice processing algorithm, and then sends the processed voice data to the server 200. After receiving the voice data, the server 200 converts the received voice data into natural language text data through the voice transcription module 201. The speech transcription module 201 adopts a deep learning algorithm and a speech signal processing technology, can recognize and transcribe the input of various voices, and converts the input of various voices into corresponding text expressions by analyzing the information such as the frequency spectrum, the sound intensity, the speech characteristics and the like of the speech signals.
When natural language text content is input to the intent recognition module 202 of the server 200, the module utilizes the understanding and reasoning capabilities of the large language model to analyze the semantics and context information of the text to recognize the specific intent of the user's voice content. In the meaning identification process, the large language model can carry out semantic analysis on the text, try to understand the intention and the requirement of the user, identify information such as keywords, phrases, sentence structures and the like in the text, and infer and judge according to the context so as to acquire intention data of the user.
The intention data is key value pair data, the key comprises at least one of a target entity, a target entity type, a target entity characteristic, an intention, a place and a time, and the value is a text entry corresponding to the key. For example, according to text content entered by a user: "I want to lay a table cloth with flower pattern on the table", and perform entity recognition and intention recognition to obtain the following key value data result: target entity: a table; target entity type: a material; target entity characteristics: dining table cloth with flower patterns; the intention is: replacing tablecloth of a table; location: the method is free; time: and no.
When the user intention is obtained, the intention is input to the vector database retrieval module 203 of the server 200. As shown in fig. 2, the vector database retrieval module 203 includes: a calculation retrieval module 301 and a storage module 302; the computing and searching module 301 is configured to vector the intent data by using an embedding technique, take the vector intent data as input, search in the vector database by using a searching algorithm, calculate a similarity between vectors, and obtain an optimal search result and corresponding resource link data according to the similarity; the storage module 302 is configured to store the optimal search result and the corresponding resource link data, and send the optimal search result to the resource loading rendering module 102. In particular, the search algorithm includes, but is not limited to, a nearest neighbor search algorithm, an approximate nearest neighbor search algorithm, and a hierarchical navigable search algorithm.
Specifically, when retrieving through the vector database, the system considers the historical interaction data of the user and the personal preference data of the user, including previous virtual scene selection, role actions, recording and role style of environment interaction, game difficulty preference, and the like, so as to further personalize the virtual world experience of the user.
For example, in the above example, the text content input by the user is "i want to lay a table with a flower pattern on the table", and the calculation and search module 301 splices the keyword text such as "table" and "table with flower pattern" as the initial text input, and then converts the text input into the vector representation to perform the search calculation, so as to obtain the resource link of the material with a higher matching value with the spliced text input. The resource link data is temporarily stored in the storage module 302, and after waiting for all the search calculations, the resource data (the resource data is allowed to be action resources, material resources or text resources) temporarily stored in the storage module 302 is sent to the client 100 in the form of data stream by taking the first several resource data with higher matching degree.
After receiving the data returned by the server 200, the client 100 inputs the data to the resource loading rendering module 102 of the user client 100 in fig. 1, and the resource loading rendering module 102 completes the process of loading and rendering resources on the client 100. According to the intention information of the user, the module can match and acquire corresponding data, wherein the data can be behavior action data, resource material data or text material data, and the data is rendered on a specified target entity object.
The resource loading rendering module 102 includes: an action processing sub-module 1021, a material processing sub-module 1022, and a word processing sub-module 1023. The action processing sub-module 1021, the material processing sub-module 1022 and the word processing sub-module 1023 are all configured to download corresponding material files, parse and process the material files, and load and render the material files on corresponding user intention target objects in the virtual world.
Specifically, for behavioral action data, the client 100 downloads corresponding action files including, but not limited to, running, jumping, punching, etc. actions through the action processing submodule 1021. When the resource file is downloaded, the action processing submodule 1021 analyzes and processes the action resource file according to the user intention information, and finally renders the action resource file to a corresponding user intention target object. Roles in the virtual world are controlled and animated according to the user's voice instructions, thereby providing an immersive user experience.
For resource material data, the client 100 downloads corresponding material files including, but not limited to, information of geometric shape, texture mapping information, material properties, etc. of the model through the material processing sub-module 1022. According to the user intention information, the material processing sub-module 1022 analyzes and processes the material files, loads and renders the material files onto corresponding user intention target objects in the virtual world, so as to create and present roles, scenes and models of the virtual world which meet the user requirements.
For text material data, the client 100 downloads corresponding text files including, but not limited to, storyline, character information, and dialogue data, etc. that can be used to construct scenes, characters, and episodes in the virtual world through the word processing submodule 1023. According to the user intention information, the text sub-processing module 1023 will parse and process the text data, load and apply it to the corresponding user intention target object in the virtual world, including but not limited to changing the atmosphere of the scene, adjusting the behavior and emotion of the character, guiding the trend of the plot, etc.
The user terminal device can locally cache the downloaded resource materials of the virtual scene and the virtual character so as to improve the response speed of the subsequent virtual world interaction. The resource loading rendering module 102 may dynamically adjust the resource loading and rendering policies according to the user's device type and performance level to achieve smooth virtual world presentation under various hardware conditions.
In the course of resource rendering, graphics rendering techniques used by the client 100 include, but are not limited to, 3D rendering, shadow effect, and material rendering to provide a more realistic presentation of virtual scenes. In addition, the user can perform rendering setting according to the performance and personal preference of the terminal device so as to obtain the best visual effect.
Taking "i want to lay a table with flower pattern on this table" as an example, on the client 100, the material processing sub-module 1022 processes the data item with user intention and resource link obtained from the server 200, and downloads the corresponding material file "table with flower pattern", including texture mapping information, material attribute, and the like of the table with flower pattern. The material processing sub-module 1022 parses and processes the material file, loads it and renders it on the target object "table" of the user intention information. Through the steps, the interactive requirement that the user sees that the table is paved with the dining table cloth with the flower patterns in the virtual world is met.
Example two
The invention also provides a voice interaction method facing the virtual world, which uses the voice interaction system described in the first embodiment to interact with the virtual world, as shown in fig. 3, and comprises the following steps:
s1: acquiring a voice instruction of a user;
s2: identifying and analyzing the voice instruction to obtain a readable text instruction;
s3: the readable text instruction is understood and processed to acquire intention data of a user;
s4: vectorizing the intention data, taking vectorized intention data as input, searching in a vector database, calculating the similarity between vectors, obtaining an optimal search result and corresponding resource link data according to the similarity, and storing;
s5: downloading a resource file through the resource link data according to the intention data and the optimal search result, analyzing and processing the resource file, loading and rendering the resource file on a corresponding user intention target object in the virtual world so as to create and present a virtual role and a virtual scene which accord with the intention of a user;
the system comprises a vector database, a character database, a resource database and a text content database, wherein the vector database comprises a behavior action database, a resource material database and a text content database, the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
In this embodiment, in S3, a specific method for acquiring intention data of a user is as follows: and analyzing the readable text instruction through a large language model, and deducing the intention data of the user.
In this embodiment, the intention data is key value pair data, and the key includes at least one of a target entity, a target entity type, a target entity feature, an intention, a place, and a time, and the value is a text entry corresponding to the key.
In this embodiment, in S4, the specific method for obtaining the optimal search result and the corresponding resource link data thereof is as follows: extracting text entries of the values, splicing the text entries, converting the text entries into vector texts through an embedding technology, taking the vector texts as input, searching in a vector database by utilizing a search algorithm, and sorting in an inverted sequence according to the similarity to obtain an optimal search result and corresponding resource link data.
Example III
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the voice interaction method facing the virtual world in any one of the second embodiment when executing the computer program.
Example IV
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the virtual world oriented voice interaction method of any of the second embodiments.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. A virtual world oriented voice interaction system, comprising:
the system comprises a client and a server, wherein the client is configured to a voice input module and a resource loading rendering module, and the server is configured to a voice transcription module, an intention recognition module and a vector database retrieval module;
the voice input module is used for acquiring a voice instruction of a user;
the voice transcription module is used for identifying and analyzing the voice command to obtain a readable text command;
the intention recognition module is used for understanding and processing the readable text instruction so as to acquire intention data of a user;
the vector database retrieval module is used for vectorizing the intention data, taking the vectorized intention data as input, retrieving in a vector database, calculating the similarity between vectors, obtaining an optimal retrieval result and corresponding resource link data according to the similarity, and storing;
the resource loading rendering module is used for downloading a resource file through the resource link data according to the intention data and the optimal search result, analyzing and processing the resource file, loading and rendering the resource file onto a corresponding user intention target object in the virtual world so as to create and present a virtual role and a virtual scene which accord with the intention of a user;
the vector database comprises a behavior action database, a resource material database and a text content database, wherein the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
2. The virtual world oriented voice interaction system of claim 1, wherein: the intention data is key value pair data, the key comprises at least one of a target entity, a target entity type, a target entity characteristic, an intention, a place and a time, and the value is a text entry corresponding to the key.
3. The virtual world oriented voice interaction system of claim 1, wherein: the vector database retrieval module comprises: a calculation and retrieval module and a storage module; the computing and searching module is used for vectorizing the intention data through an embedding technology, taking the vectorized intention data as input, searching in the vector database by utilizing a searching algorithm, calculating the similarity between vectors, obtaining an optimal searching result and corresponding resource link data according to the similarity, and the storage module is used for storing the optimal searching result and the corresponding resource link data and sending the optimal searching result and the resource link data to the resource loading and rendering module.
4. The virtual world oriented voice interaction system of claim 1, wherein: the resource loading rendering module comprises: the system comprises an action processing sub-module, a material processing sub-module and a word processing sub-module; the action processing sub-module, the material processing sub-module and the word processing sub-module are all used for downloading corresponding material files, analyzing and processing the material files, and loading and rendering the material files on corresponding user intention target objects in the virtual world.
5. A virtual world-oriented voice interaction method for interaction by using the virtual world-oriented voice interaction system according to claims 1 to 4, comprising the steps of:
s1: acquiring a voice instruction of a user;
s2: identifying and analyzing the voice instruction to obtain a readable text instruction;
s3: the readable text instruction is understood and processed to acquire intention data of a user;
s4: vectorizing the intention data, taking vectorized intention data as input, searching in a vector database, calculating the similarity between vectors, obtaining an optimal search result and corresponding resource link data according to the similarity, and storing;
s5: downloading a resource file through the resource link data according to the intention data and the optimal search result, analyzing and processing the resource file, loading and rendering the resource file on a corresponding user intention target object in the virtual world so as to create and present a virtual role and a virtual scene which accord with the intention of a user;
the system comprises a vector database, a character database, a resource database and a text content database, wherein the vector database comprises a behavior action database, a resource material database and a text content database, the behavior action database is used for storing word description vectors of various behavior actions and corresponding resource link data, the resource material database is used for storing word description vectors of character models, object models and scene models and corresponding resource link data, and the text content database is used for storing story lines, character information, word description vectors of white data and corresponding resource link data.
6. The virtual world oriented voice interaction method of claim 5, wherein: in S3, the specific method for obtaining the intention data of the user is as follows: and analyzing the readable text instruction through a large language model, and deducing the intention data of the user.
7. The virtual world oriented voice interaction method according to claim 5 or 6, wherein: the intention data is key value pair data, the key comprises at least one of a target entity, a target entity type, a target entity characteristic, an intention, a place and a time, and the value is a text entry corresponding to the key.
8. The virtual world oriented voice interaction method of claim 7, wherein: s4, the specific method for obtaining the optimal search result and the corresponding resource link data comprises the following steps: extracting text entries of the values, splicing the text entries, converting the text entries into vector texts through an embedding technology, taking the vector texts as input, searching in a vector database by utilizing a search algorithm, and sorting in an inverted sequence according to the similarity to obtain an optimal search result and corresponding resource link data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the virtual world oriented voice interaction method of any one of claims 5-8 when the computer program is executed.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the virtual world oriented voice interaction method of any of claims 5-8.
CN202311253812.2A 2023-09-26 2023-09-26 Virtual world-oriented voice interaction system and method Pending CN117292682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311253812.2A CN117292682A (en) 2023-09-26 2023-09-26 Virtual world-oriented voice interaction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311253812.2A CN117292682A (en) 2023-09-26 2023-09-26 Virtual world-oriented voice interaction system and method

Publications (1)

Publication Number Publication Date
CN117292682A true CN117292682A (en) 2023-12-26

Family

ID=89244001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311253812.2A Pending CN117292682A (en) 2023-09-26 2023-09-26 Virtual world-oriented voice interaction system and method

Country Status (1)

Country Link
CN (1) CN117292682A (en)

Similar Documents

Publication Publication Date Title
US20230049135A1 (en) Deep learning-based video editing method, related device, and storage medium
US10762678B2 (en) Representing an immersive content feed using extended reality based on relevancy
CN110609955B (en) Video recommendation method and related equipment
JP2021168139A (en) Method, device, apparatus and medium for man-machine interactions
CN106227792B (en) Method and apparatus for pushed information
CN111414506B (en) Emotion processing method and device based on artificial intelligence, electronic equipment and storage medium
CN114880441B (en) Visual content generation method, device, system, equipment and medium
WO2023197979A1 (en) Data processing method and apparatus, and computer device and storage medium
US9639633B2 (en) Providing information services related to multimodal inputs
CN114895817B (en) Interactive information processing method, network model training method and device
CN111626049A (en) Title correction method and device for multimedia information, electronic equipment and storage medium
CN112085120B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN114419205B (en) Driving method of virtual digital person and training method of pose acquisition model
CN116958342A (en) Method for generating actions of virtual image, method and device for constructing action library
CN114187405B (en) Method, apparatus, medium and product for determining avatar
CN114969282A (en) Intelligent interaction method based on rich media knowledge graph multi-modal emotion analysis model
CN117132690A (en) Image generation method and related device
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
CN117292682A (en) Virtual world-oriented voice interaction system and method
US11830526B2 (en) Selecting supplemental audio segments based on video analysis
CN115965791A (en) Image generation method and device and electronic equipment
CN110555207A (en) Sentence recognition method, sentence recognition device, machine equipment and computer-readable storage medium
CN114529635A (en) Image generation method, device, storage medium and equipment
CN112328751A (en) Method and device for processing text
CN117575894B (en) Image generation method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination