CN113850898A - Scene rendering method and device, storage medium and electronic equipment - Google Patents

Scene rendering method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113850898A
CN113850898A CN202111212378.4A CN202111212378A CN113850898A CN 113850898 A CN113850898 A CN 113850898A CN 202111212378 A CN202111212378 A CN 202111212378A CN 113850898 A CN113850898 A CN 113850898A
Authority
CN
China
Prior art keywords
scene
rendering
semantic
text
picture sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111212378.4A
Other languages
Chinese (zh)
Inventor
常向月
王雨辰
穆少垒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuiyi Technology Co Ltd
Original Assignee
Shenzhen Zhuiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuiyi Technology Co Ltd filed Critical Shenzhen Zhuiyi Technology Co Ltd
Priority to CN202111212378.4A priority Critical patent/CN113850898A/en
Publication of CN113850898A publication Critical patent/CN113850898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computer Graphics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a scene rendering method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: receiving a rendering instruction sent by a user through a client, and acquiring a question text and a response text corresponding to the rendering instruction; performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the rendering instruction; the method comprises the steps of obtaining a scene picture sequence corresponding to a semantic scene, generating a scene rendering data stream based on the scene picture sequence, and sending the scene rendering data stream to a client, so that the client applies the scene rendering data stream to render response data. The problem text and the response text are analyzed to determine the semantic scene, and the scene rendering data stream is generated based on the scene picture sequence corresponding to the semantic scene, so that the client renders the response scene based on the scene rendering data stream, a neural network does not need to be trained for each scene in the process, time and cost investment can be reduced, and the scene rendering cost is reduced.

Description

Scene rendering method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of human-computer interaction, in particular to a scene rendering method and device, a storage medium and electronic equipment.
Background
With the development of computer technology, more and more scenes are available for digital people, and when the digital people are used for providing services for customers, immersive experiences can be provided for the customers by rendering scenes involved in the interaction process of the customers and the digital people, so that better application experiences and services are provided for the customers.
At the present stage, digital people in a 2D form are rendered by using a neural network, in order to render scenes for clients and achieve an immersive effect, the neural network needs to be trained for each scene to render the corresponding scene, the number and the types of the scenes are numerous, and training the neural network takes a lot of time and cost, so that a lot of cost is required to be invested in rendering the scenes by using the current method.
Disclosure of Invention
In view of this, the present invention provides a scene rendering method and apparatus, a storage medium, and an electronic device, and the method and apparatus use a scene rendering data stream generated based on a scene picture sequence to render a scene without training a corresponding neural network for each scene, thereby reducing the cost and time invested in rendering the scene and reducing the cost required for rendering the scene.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the present invention discloses a scene rendering method, including:
receiving a rendering instruction sent by a user through a client, and acquiring a question text and a response text corresponding to the rendering instruction;
performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the rendering instruction;
acquiring a scene picture sequence corresponding to the semantic scene, and generating a scene rendering data stream based on the scene picture sequence;
and sending the scene rendering data stream to the client, so that the client renders a response scene by applying the scene rendering data stream.
Optionally, the obtaining of the question text and the response text corresponding to the rendering instruction includes:
analyzing the rendering instruction, acquiring problem data in the rendering instruction, and determining a problem text based on the problem data;
and processing the question text to obtain response audio data and a response text corresponding to the question text.
Optionally, the method for determining a question text based on the question data includes:
determining a data format of the problem data;
and processing the question data according to the data format to obtain a question text.
Optionally, in the method, performing semantic scene analysis on the question text and the response text to determine a semantic scene corresponding to the rendering instruction includes:
extracting semantic scene keywords from the question text and the response text, and determining whether a scene personalization requirement exists based on the semantic scene keywords;
if the scene individuation requirement is determined to exist, an individuation scene corresponding to the semantic scene keywords is determined, and the individuation scene is determined to be the semantic scene corresponding to the rendering instruction;
and if the scene individuation requirement does not exist, taking a preset default scene as a semantic scene corresponding to the rendering instruction.
In the foregoing method, optionally, the generating a scene rendering data stream based on the scene picture sequence includes:
determining a sequence of digital human rendering pictures based on the response text;
and applying a preset silent video generation server to perform data synthesis processing on the scene picture sequence, the response audio data and the digital human rendering picture sequence to obtain a scene rendering data stream.
Optionally, in the method, the obtaining a scene picture sequence corresponding to the semantic scene includes:
when the personalized scene is determined to be a semantic scene corresponding to the rendering instruction, determining whether a personalized picture sequence corresponding to the personalized scene exists in a scene background picture storage server or not;
if the scene background picture storage server is determined to have the personalized picture sequence corresponding to the personalized scene, taking the preset personalized picture sequence corresponding to the personalized scene as the scene picture sequence corresponding to the semantic scene;
if it is determined that a preset picture sequence corresponding to the personalized scene does not exist in the scene background picture storage server, taking the preset default picture sequence as a scene picture sequence corresponding to the semantic scene;
and when a preset default scene is taken as a semantic scene corresponding to the rendering instruction, taking the default picture sequence as a scene picture sequence corresponding to the semantic scene.
Optionally, in the method, the sending the scene rendering data stream to the client includes:
and sending the scene rendering data stream to the client by applying a preset streaming media server.
A second aspect of the present invention discloses a scene rendering apparatus, including:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for receiving a rendering instruction sent by a user through a client and acquiring a question text and a response text corresponding to the rendering instruction;
a semantic scene analysis unit, configured to perform semantic scene analysis on the question text and the response text, and determine a semantic scene corresponding to the rendering instruction;
a generating unit, configured to acquire a scene picture sequence corresponding to the semantic scene, and generate a scene rendering data stream based on the scene picture sequence;
and the rendering unit is used for sending the scene rendering data stream to the client so that the client can render a response scene by applying the scene rendering data stream.
The above apparatus, optionally, the obtaining unit includes:
the analysis subunit is used for analyzing the rendering instruction, acquiring problem data in the rendering instruction and determining a problem text based on the problem data;
and the processing subunit is used for processing the question text to obtain response audio data and a response text corresponding to the question text.
The above apparatus, optionally, the parsing subunit includes:
the determining module is used for determining the data format of the problem data;
and the processing module is used for processing the question data according to the data format to obtain a question text.
The above apparatus, optionally, the semantic scene analyzing unit includes:
the extraction subunit is used for extracting semantic scene keywords from the question text and the response text and determining whether a scene individuation requirement exists or not based on the semantic scene keywords;
the first determining subunit is used for determining an individualized scene corresponding to the semantic scene keyword if the scene individualized requirement is determined to exist, and determining the individualized scene as the semantic scene corresponding to the rendering instruction;
and the second determining subunit is used for taking a preset default scene as a semantic scene corresponding to the rendering instruction if the scene individuation requirement does not exist.
The above apparatus, optionally, the generating unit includes:
a third determining subunit, configured to determine a sequence of digital human rendering pictures based on the response text;
and the synthesis subunit is used for applying a preset silent video generation server to perform data synthesis processing on the scene picture sequence, the response audio data and the digital human rendering picture sequence to obtain a scene rendering data stream.
The above apparatus, optionally, the generating unit includes:
a fourth determining subunit, configured to determine, when the personalized scene is determined to be a semantic scene corresponding to the rendering instruction, whether a personalized picture sequence corresponding to the personalized scene exists in a scene background picture storage server;
a fifth determining subunit, configured to, if it is determined that an personalized picture sequence corresponding to the personalized scene exists in the scene background picture storage server, take a preset personalized picture sequence corresponding to the personalized scene as a scene picture sequence corresponding to the semantic scene;
a sixth determining subunit, configured to, if it is determined that the preset picture sequence corresponding to the personalized scene does not exist in the scene background picture storage server, take the preset default picture sequence as the scene picture sequence corresponding to the semantic scene;
and the seventh determining subunit is configured to, when a preset default scene is used as the semantic scene corresponding to the rendering instruction, use the default picture sequence as a scene picture sequence corresponding to the semantic scene.
The above apparatus, optionally, the rendering unit includes:
and the sending subunit is configured to send the scene rendering data stream to the client by using a preset streaming media server.
A third aspect of the present invention discloses a storage medium, which includes stored instructions, wherein when the instructions are executed, a device on which the storage medium is located is controlled to execute the scene rendering method as described above.
A fourth aspect of the present invention discloses an electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the scene rendering method as described above.
Compared with the prior art, the invention has the following advantages:
the invention provides a scene rendering method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a rendering instruction sent by a user through a client, and acquiring a question text and a response text corresponding to the rendering instruction; performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the voice; the method comprises the steps of obtaining a scene picture sequence corresponding to a semantic scene, generating a scene rendering data stream based on the scene picture sequence, and sending the scene rendering data stream to a client, so that the client renders a response scene of a digital person when the digital person replies to a user based on the scene rendering data stream. According to the method and the device, after the semantic scene is determined according to the question text and the response text of the voice carrying the question information, the rendering data stream for rendering the response scene is generated based on the scene picture sequence corresponding to the semantic scene, the process does not need to train a neural network corresponding to the scene, a convenient scene rendering mode is provided, and the cost for rendering the scene is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is an environment application diagram of a scene rendering method according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for rendering a scene according to an embodiment of the present invention;
fig. 3 is a flowchart of another method of a scene rendering method according to an embodiment of the present invention;
fig. 4 is a flowchart of another method of a scene rendering method according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a system architecture of a scene rendering method according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a scene rendering apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In order to better understand a scene rendering method and apparatus, a storage medium, and an electronic device provided in the embodiments of the present application, an application environment for the embodiments of the present application is described below.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The scene rendering method provided by the embodiment of the present application may be applied to the scene rendering system 100 shown in fig. 1. The scene rendering system 100 includes a terminal device 101 and a server 102, and the server 102 is connected to the terminal device 101 in communication. The server 102 may be a conventional server or a cloud server, and is not limited herein.
The terminal device 101 may be various electronic devices that have a display screen, a data processing module, a camera, an audio input/output function, and the like, and support data input, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, a self-service terminal, a wearable electronic device, and the like. Specifically, the data input may be inputting voice based on a voice module provided on the electronic device, inputting characters based on a character input module, and the like.
Specifically, the user may register a user account in the server 102 based on the client application, and communicate with the server 102 based on the user account, for example, the user logs in the user account in the client application, inputs information through the client application based on the user account, may input text information or voice information, and the like, after receiving the information input by the user, the client application may send the information to the server 102, so that the server 102 may receive the information and process and store the information, and the server 102 may also receive the information and return a corresponding output information to the terminal device 101 according to the information.
In some embodiments, the client application may be configured to provide customer service to the user, communicate with the user through the customer service, and interact with the user based on the digital person, and render a corresponding scene for the user during interaction between the digital person and the user. Specifically, the client application program may receive information input by a user, send the information input by the user to the scene rendering system for processing, receive scene rendering data fed back by the scene rendering system, and render a scene related to the digital human interaction process for the client based on the scene rendering data. The digital human is a software program based on visual graphics, and the software program can present a robot shape simulating biological behaviors or ideas to a user after being executed. The digital person may be a simulated digital person simulating a real person, such as a simulated digital person resembling a real person, which is created according to the form of the user himself or others, or a simulated digital person simulating an animation effect, such as a simulated digital person simulating an animal form or a cartoon character form.
In some embodiment methods, the terminal device 101 sends the interactive information of the client to the server 102, and the server 102 processes the interactive information to generate rendering scene data responded by the digital person to the client; the terminal device 101 receives the rendering scene data sent by the server 102, and renders a scene of the digital person when answering the client on a display screen or other image output device connected with the display screen based on the rendering scene data, wherein the scene can be a page background of the digital person when answering the client. As another implementation manner, when a scene when the digital person answers the customer is rendered, the audio corresponding to the simulated digital person image may be played through the speaker of the intelligent terminal 101 or other audio output devices connected thereto, and the text or the graphic corresponding to the reply information may be displayed on the display screen of the terminal device 101, so that multi-state interaction with the user in multiple aspects of image, voice, text, and the like is realized.
In some embodiments, the device for processing the data to be recognized may also be disposed on the terminal device 101, so that the terminal device 101 may implement interaction with the user without relying on the server 102 to establish communication, and in this case, the scene rendering system 100 may only include the terminal device 101.
The above application environments are only examples for convenience of understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.
The following describes in detail a scene rendering method and apparatus, a storage medium, and an electronic device according to embodiments of the present invention with specific embodiments.
Referring to fig. 2, an embodiment of the present invention provides a scene rendering method, which can be applied to the scene rendering system 100. Specifically, the scene rendering system may include the following steps S201 to S204.
The present invention can be applied to a scene rendering system supporting rendering of a response scene of a digital person, and referring to fig. 2, is a method flowchart of a scene rendering method provided in an embodiment of the present invention, and the following is specifically described:
s201, receiving a rendering instruction sent by a user through a client, and acquiring a question text and a response text corresponding to the rendering instruction.
The method comprises the steps that a user sends a rendering instruction to a scene rendering system through a client, it needs to be explained that the rendering instruction comprises data for interaction from the user to a digital person, and a question text and a response text are obtained after the rendering instruction is processed.
It should be noted that, a user inputs a rendering instruction to the client, the client sends the rendering instruction to the scene rendering system, and the scene rendering system processes rendering execution, so as to obtain a question text and a response text.
Furthermore, there are various ways for the user to input the rendering instruction to the client, specifically, the rendering instruction is input by a voice method, input by a text method, or input after selecting an option on an operation interface of the client.
Referring to fig. 3, for obtaining the question text and the response text corresponding to the rendering instruction provided in the embodiment of the present invention, the following description is specifically provided:
s301, analyzing the rendering instruction, acquiring problem data in the rendering instruction, and determining a problem text based on the problem data.
The rendering instruction is a carrier of the question data, and the question data can be obtained after the rendering instruction is analyzed, wherein the question data can be understood as data for interaction between the user and the digital person, specifically, data for asking questions of the digital person by the user or an expression package sent to the digital person by the user; further, the data format of the question data in the embodiment of the present invention supports multiple types, specifically, various data formats such as text, voice, and picture. Supporting multiple data formats can increase the supportable application scenarios of the invention, so that the universality and the applicability of the invention are stronger.
Specifically, when the problem data is processed, an operation corresponding to the data format of the problem data needs to be executed, specifically, when the problem data is in a text format, the problem data is directly determined as a problem text corresponding to the rendering instruction, further, when the problem data is in a text format, the problem data may be text information related to the user input on an operation page of the client, or specific text information directly generated after the user selects a specific option on the operation page of the client, for example, after the user selects a weather query option in the operation page of the client, text information corresponding to the weather query option is directly generated.
For another example, when the data format of the question data is a voice format, the question data is converted into a corresponding question text by using a voice-to-text technique, which may be a voice recognition program, a voice recognition algorithm, or a voice recognition model. The specific process of processing the problem data to obtain the problem text includes, but is not limited to, the above example, and when the data format of the problem data is other formats, only the operation corresponding to the data format needs to be executed, and the corresponding problem text is determined from the problem data.
S302, the question text is processed, and response audio data and a response text corresponding to the question text are obtained.
Processing the question text by using a preset response module, so that the response module outputs response audio data and a response text corresponding to the question text; specifically, the answer audio uses a preset answer robot to process the question text, so that the answer robot outputs the answer text; after the response audio obtains the response text output by the response robot, the response text is processed by using a text-to-speech technology to obtain response audio data corresponding to the response text, wherein the text-to-speech technology can be realized by a special text-to-speech algorithm or a model.
It should be noted that, the responding robot is a model constructed by using a deep neural network, the responding robot can obtain corresponding responding data after processing the problem text, and the responding module processes the responding data into responding audio data.
Furthermore, the response robot can be put into use after the training is successful, and the subsequent response robot can also independently learn, so that the output data is more accurate.
S202, semantic scene analysis is carried out on the question text and the response text, and a semantic scene corresponding to the rendering instruction is determined.
Referring to fig. 4, a flowchart of a method for determining a semantic scene corresponding to a rendering instruction according to an embodiment of the present invention is specifically described as follows:
s401, semantic scene keywords are extracted from the question text and the response text.
Extracting semantic scene keywords from the problem text and the response text by using a preset keyword extraction algorithm; for example, the semantic scene keyword may be a word of weather, cloudiness, olympic games, and the like.
S402, determining whether a scene personalized demand exists based on the semantic scene keywords; if the scene individuation requirement is determined to exist, executing S403; if it is determined that the scene personalization requirement does not exist, S404 is performed.
It should be noted that the scene personalization requirement is a scene that needs to be personalized, that is, a default scene set in the system is not applied.
If the scene individuation requirement exists, the situation that the digital person needs to render an individualized scene when answering can be determined, and the individualized scene is related to semantics in an answer text and a question text; if the scene individuation requirement does not exist, it can be determined that the digital person does not need to render an individualized scene in response, in other words, semantic representations in the response text and the question text are the scenes which do not need to be rendered.
And S403, determining a personalized scene corresponding to the semantic scene keywords, and determining the personalized scene as the semantic scene corresponding to the rendering instruction.
When the scene personalization requirement is determined to exist, the personalized scene can be determined according to the semantic scene keywords, for example, when the semantic scene keywords are weather and sunny days, the personalized scene can be determined to be a scene with rendered weather being sunny days.
S404, taking the preset default scene as a semantic scene corresponding to the rendering instruction.
Preferably, the default scenario is a scenario used by default in the system.
In the method provided by the embodiment of the invention, when the semantic scene is determined, the semantic scene keywords can be extracted from the question text and the response text, and the semantic scene is determined based on the semantic scene keywords, so that the scenes related to the question text and the response text can be accurately determined, and when a digital person replies to a client, a proper scene is displayed to the client, and the immersive service is provided for the client.
S203, a scene picture sequence corresponding to the semantic scene is obtained, and a scene rendering data stream is generated based on the scene picture sequence.
The specific process of acquiring the scene picture sequence corresponding to the semantic scene is as follows:
when the personalized scene is determined to be a semantic scene corresponding to the rendering instruction, determining whether a personalized picture sequence corresponding to the personalized scene exists in a scene background picture storage server or not;
if the scene background picture storage server is determined to have the personalized picture sequence corresponding to the personalized scene, taking the preset personalized picture sequence corresponding to the personalized scene as the scene picture sequence corresponding to the semantic scene;
if it is determined that a preset picture sequence corresponding to the personalized scene does not exist in the scene background picture storage server, taking the preset default picture sequence as a scene picture sequence corresponding to the semantic scene;
and when a preset default scene is taken as a semantic scene corresponding to the rendering instruction, taking the default picture sequence as a scene picture sequence corresponding to the semantic scene.
It should be noted that the scene background picture storage server stores personalized picture sequences of a plurality of personalized scenes and a default picture sequence of a default scene, and further, the personalized picture sequence includes at least one background picture; preferably, the default picture sequence may be a null sequence or a sequence including a plurality of background pictures.
Determining a sequence of digital person rendering pictures based on the response text when generating the scene rendering data stream; and applying a preset silent video generation server to perform data synthesis processing on the scene picture sequence, the response audio data and the digital human rendering picture sequence to obtain a slave scene rendering data stream. Specifically, the digital human rendering picture sequence is a picture for rendering a digital human, and the picture sequence includes, but is not limited to, pictures for rendering the expression, the action and the mouth shape of the digital human.
S204, sending the scene rendering data stream to the client, and enabling the client to apply the scene rendering data stream to render a response scene.
In the method provided by the embodiment of the invention, the application streaming media server sends the scene rendering data stream to the client so that the client can render the response scene by applying the scene rendering data stream, the response scene rendered by the client is a scene when the digital person interacts with the user, specifically, the scene is a page background displayed to the client when the digital person interacts with the user, and preferably, the response scene can be a dynamic scene or a static scene. Optionally, the client renders the digital person while rendering the response scene, and plays the response voice, so that an immersive scene can be provided for the user when the digital person interacts with the user, thereby improving the user experience.
In the method provided by the embodiment of the invention, a rendering instruction sent by a user through a client is received, and a question text and a response text corresponding to the rendering instruction are obtained; performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the rendering instruction; the method comprises the steps of obtaining a scene picture sequence corresponding to a semantic scene, generating a scene rendering data stream based on the scene picture sequence, and sending the scene rendering data stream to a client, so that the client applies the scene rendering data stream to render response data. The problem text and the response text are analyzed to determine the semantic scene, and the scene rendering data stream is generated based on the scene picture sequence corresponding to the semantic scene, so that the client renders the response scene based on the scene rendering data stream, a simple scene rendering mode is provided, and a corresponding neural network does not need to be trained for each scene in the process, so that the investment of time and cost can be reduced, and the cost required by scene rendering is further reduced.
Referring to fig. 5, a schematic diagram of a system architecture supporting the above method according to an embodiment of the present invention is specifically described as follows:
the schematic diagram of the system architecture comprises a client 501, a gateway 502, a central control module 503, a response module 504, a picture inference server 505, a scene semantic analysis server 506, a scene background picture storage server 507, a silent video generation server 508 and a streaming media server 509; the work flow of the system architecture diagram and the work content of each device in the system architecture diagram are described in a specific scene embodiment.
The client sends a rendering instruction to the gateway, and the gateway sends the rendering instruction to the central control module; the response module receives a rendering instruction sent by the central control module, analyzes the rendering instruction, acquires problem data in the rendering instruction, determines a problem text corresponding to the problem data, processes the problem text to obtain a response text and response audio data, and sends the problem text, the response text and the response audio data to the central control module; the central control module sends the response text to the picture reasoning server, so that the picture reasoning server processes the response text to obtain a digital person rendering picture sequence, and feeds back digital person rendering data to the central control module; the scene semantic analysis server receives the question text and the response text sent by the central control module, determines a semantic scene based on the question text and the response text, and sends information of the determined semantic scene to the scene background picture storage server, so that the scene background picture server determines a scene picture sequence based on the information of the determined semantic scene and sends the scene picture sequence to the scene semantic analysis server, and the scene semantic analysis server sends the scene picture sequence to the central control module; the central control module sends the response audio data, the digital human rendering picture sequence and the scene picture sequence to a silent video generation server; the silent video generation server synthesizes the response audio data, the digital human rendering picture sequence and the scene picture sequence into a scene rendering data stream; the silent video generation server sends the scene rendering data stream to a streaming media server, and the streaming media server sends the scene rendering data stream to the client; the client renders the response scene using the scene rendering data stream.
For other contents in the workflow described in the system architecture diagram shown in fig. 5, reference may be made to corresponding contents described in the above-provided embodiments of the present invention, and details are not repeated here.
The system architecture diagram shown in fig. 5 is only one example, and the system architecture diagram to which the present invention can be applied is not limited to the above-described system architecture diagram.
Corresponding to fig. 1, the present invention further provides a scene rendering apparatus, which can support the implementation of the method shown in fig. 1 in real life, and the apparatus can be disposed in a scene rendering system, and the scene rendering system can be composed of an intelligent terminal or a distributed computing environment.
Referring to fig. 6, a schematic structural diagram of a scene rendering apparatus according to an embodiment of the present invention is specifically described as follows:
an obtaining unit 601, configured to receive a rendering instruction sent by a user through a client, and obtain a question text and a response text corresponding to the rendering instruction;
a semantic scene analysis unit 602, configured to perform semantic scene analysis on the question text and the response text, and determine a semantic scene corresponding to the rendering instruction;
a generating unit 603, configured to acquire a scene picture sequence corresponding to the semantic scene, and generate a scene rendering data stream based on the scene picture sequence;
a rendering unit 604, configured to send the scene rendering data stream to the client, so that the client renders a response scene by applying the scene rendering data stream.
In the device provided by the embodiment of the invention, a rendering instruction sent by a user through a client is received, and a question text and a response text corresponding to the rendering instruction are obtained; performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the rendering instruction; the method comprises the steps of obtaining a scene picture sequence corresponding to a semantic scene, generating a scene rendering data stream based on the scene picture sequence, and sending the scene rendering data stream to a client, so that the client applies the scene rendering data stream to render response data. The problem text and the response text are analyzed to determine the semantic scene, and the scene rendering data stream is generated based on the scene picture sequence corresponding to the semantic scene, so that the client renders the response scene based on the scene rendering data stream, a simple scene rendering mode is provided, and a corresponding neural network does not need to be trained for each scene in the process, so that the investment of time and cost can be reduced, and the cost required by scene rendering is further reduced.
In the apparatus provided in the embodiment of the present invention, the obtaining unit 601 may be configured to:
the analysis subunit is used for analyzing the rendering instruction, acquiring problem data in the rendering instruction and determining a problem text based on the problem data;
and the processing subunit is used for processing the question text to obtain response audio data and a response text corresponding to the question text.
In the apparatus provided in the embodiment of the present invention, the parsing subunit may be configured to:
the determining module is used for determining the data format of the problem data;
and the processing module is used for processing the question data according to the data format to obtain a question text.
In the apparatus provided in the embodiment of the present invention, the semantic scene analyzing unit 602 may be configured to:
the extraction subunit is used for extracting semantic scene keywords from the question text and the response text and determining whether a scene individuation requirement exists or not based on the semantic scene keywords;
the first determining subunit is used for determining an individualized scene corresponding to the semantic scene keyword if the scene individualized requirement is determined to exist, and determining the individualized scene as the semantic scene corresponding to the rendering instruction;
and the second determining subunit is used for taking a preset default scene as a semantic scene corresponding to the rendering instruction if the scene individuation requirement does not exist.
In the apparatus provided in the embodiment of the present invention, the generating unit 603 may be configured to:
a third determining subunit, configured to determine a sequence of digital human rendering pictures based on the response text;
and the synthesis subunit is used for applying a preset silent video generation server to perform data synthesis processing on the scene picture sequence, the response audio data and the digital human rendering picture sequence to obtain a scene rendering data stream.
In the apparatus provided in the embodiment of the present invention, the generating unit 603 may be configured to:
a fourth determining subunit, configured to determine, when the personalized scene is determined to be a semantic scene corresponding to the rendering instruction, whether a personalized picture sequence corresponding to the personalized scene exists in a scene background picture storage server;
a fifth determining subunit, configured to, if it is determined that an personalized picture sequence corresponding to the personalized scene exists in the scene background picture storage server, take a preset personalized picture sequence corresponding to the personalized scene as a scene picture sequence corresponding to the semantic scene;
a sixth determining subunit, configured to, if it is determined that the preset picture sequence corresponding to the personalized scene does not exist in the scene background picture storage server, take the preset default picture sequence as the scene picture sequence corresponding to the semantic scene;
and the seventh determining subunit is configured to, when a preset default scene is used as the semantic scene corresponding to the rendering instruction, use the default picture sequence as a scene picture sequence corresponding to the semantic scene.
In the apparatus provided in the embodiment of the present invention, the rendering unit 604 may be configured to:
and the sending subunit is configured to send the scene rendering data stream to the client by using a preset streaming media server.
The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the scene rendering method.
The structural diagram of the electronic device is shown in fig. 7, and specifically includes a memory 701 and one or more instructions 702, where the one or more instructions 702 are stored in the memory 701, and are configured to be executed by the one or more processors 703 to execute the one or more instructions 702 to perform the scene rendering method.
The specific implementation procedures and derivatives thereof of the above embodiments are within the scope of the present invention.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of scene rendering, comprising:
receiving a rendering instruction sent by a user through a client, and acquiring a question text and a response text corresponding to the rendering instruction;
performing semantic scene analysis on the question text and the response text, and determining a semantic scene corresponding to the rendering instruction;
acquiring a scene picture sequence corresponding to the semantic scene, and generating a scene rendering data stream based on the scene picture sequence;
and sending the scene rendering data stream to the client, so that the client renders a response scene by applying the scene rendering data stream.
2. The method of claim 1, wherein the obtaining of the question text and the response text corresponding to the rendering instruction comprises:
analyzing the rendering instruction, acquiring problem data in the rendering instruction, and determining a problem text based on the problem data;
and processing the question text to obtain response audio data and a response text corresponding to the question text.
3. The method of claim 2, wherein determining a question text based on the question data comprises:
determining a data format of the problem data;
and processing the question data according to the data format to obtain a question text.
4. The method of claim 1, wherein performing semantic scene analysis on the question text and the response text to determine a semantic scene corresponding to the rendering instruction comprises:
extracting semantic scene keywords from the question text and the response text, and determining whether a scene personalization requirement exists based on the semantic scene keywords;
if the scene individuation requirement is determined to exist, an individuation scene corresponding to the semantic scene keywords is determined, and the individuation scene is determined to be the semantic scene corresponding to the rendering instruction;
and if the scene individuation requirement does not exist, taking a preset default scene as a semantic scene corresponding to the rendering instruction.
5. The method of claim 2, wherein generating a scene rendering data stream based on the sequence of scene pictures comprises:
determining a sequence of digital human rendering pictures based on the response text;
and applying a preset silent video generation server to perform data synthesis processing on the scene picture sequence, the response audio data and the digital human rendering picture sequence to obtain a scene rendering data stream.
6. The method of claim 4, wherein the obtaining the sequence of scene pictures corresponding to the semantic scene comprises:
when the personalized scene is determined to be a semantic scene corresponding to the rendering instruction, determining whether a personalized picture sequence corresponding to the personalized scene exists in a scene background picture storage server or not;
if the scene background picture storage server is determined to have the personalized picture sequence corresponding to the personalized scene, taking the preset personalized picture sequence corresponding to the personalized scene as the scene picture sequence corresponding to the semantic scene;
if it is determined that a preset picture sequence corresponding to the personalized scene does not exist in the scene background picture storage server, taking the preset default picture sequence as a scene picture sequence corresponding to the semantic scene;
and when a preset default scene is taken as a semantic scene corresponding to the rendering instruction, taking the default picture sequence as a scene picture sequence corresponding to the semantic scene.
7. The method of claim 1, wherein sending the scene rendering data stream to the client comprises:
and sending the scene rendering data stream to the client by applying a preset streaming media server.
8. A scene rendering apparatus, comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for receiving a rendering instruction sent by a user through a client and acquiring a question text and a response text corresponding to the rendering instruction;
a semantic scene analysis unit, configured to perform semantic scene analysis on the question text and the response text, and determine a semantic scene corresponding to the rendering instruction;
a generating unit, configured to acquire a scene picture sequence corresponding to the semantic scene, and generate a scene rendering data stream based on the scene picture sequence;
and the rendering unit is used for sending the scene rendering data stream to the client so that the client can render a response scene by applying the scene rendering data stream.
9. A storage medium comprising stored instructions, wherein the instructions, when executed, control a device on which the storage medium resides to perform a scene rendering method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, and one or more instructions stored in the memory and configured to be executed by one or more processors to perform a scene rendering method according to any one of claims 1-7.
CN202111212378.4A 2021-10-18 2021-10-18 Scene rendering method and device, storage medium and electronic equipment Pending CN113850898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111212378.4A CN113850898A (en) 2021-10-18 2021-10-18 Scene rendering method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111212378.4A CN113850898A (en) 2021-10-18 2021-10-18 Scene rendering method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113850898A true CN113850898A (en) 2021-12-28

Family

ID=78978761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111212378.4A Pending CN113850898A (en) 2021-10-18 2021-10-18 Scene rendering method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113850898A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449310A (en) * 2022-02-15 2022-05-06 平安科技(深圳)有限公司 Video editing method and device, computer equipment and storage medium
CN114860358A (en) * 2022-03-31 2022-08-05 北京达佳互联信息技术有限公司 Object processing method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095403A (en) * 2016-05-30 2016-11-09 努比亚技术有限公司 The exhibiting device of chat message and method
CN112529992A (en) * 2019-08-30 2021-03-19 阿里巴巴集团控股有限公司 Dialogue processing method, device, equipment and storage medium of virtual image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095403A (en) * 2016-05-30 2016-11-09 努比亚技术有限公司 The exhibiting device of chat message and method
CN112529992A (en) * 2019-08-30 2021-03-19 阿里巴巴集团控股有限公司 Dialogue processing method, device, equipment and storage medium of virtual image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449310A (en) * 2022-02-15 2022-05-06 平安科技(深圳)有限公司 Video editing method and device, computer equipment and storage medium
CN114860358A (en) * 2022-03-31 2022-08-05 北京达佳互联信息技术有限公司 Object processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11158102B2 (en) Method and apparatus for processing information
US11151765B2 (en) Method and apparatus for generating information
CN110400251A (en) Method for processing video frequency, device, terminal device and storage medium
EP3889912B1 (en) Method and apparatus for generating video
CN108022586A (en) Method and apparatus for controlling the page
CN107294837A (en) Engaged in the dialogue interactive method and system using virtual robot
CN117669605A (en) Parsing electronic conversations for presentation in alternative interfaces
WO2022170848A1 (en) Human-computer interaction method, apparatus and system, electronic device and computer medium
CN109885277A (en) Human-computer interaction device, mthods, systems and devices
CN110674398A (en) Virtual character interaction method and device, terminal equipment and storage medium
CN114495927A (en) Multi-modal interactive virtual digital person generation method and device, storage medium and terminal
CN112364144A (en) Interaction method, device, equipment and computer readable medium
CN114974253A (en) Natural language interpretation method and device based on character image and storage medium
CN104820662A (en) Service server device
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN113850898A (en) Scene rendering method and device, storage medium and electronic equipment
KR102441456B1 (en) Method and system for mimicking tone and style of real person
CN110288683B (en) Method and device for generating information
CN107783650A (en) A kind of man-machine interaction method and device based on virtual robot
CN113157241A (en) Interaction equipment, interaction device and interaction system
CN107608718A (en) Information processing method and device
CN110111793A (en) Processing method, device, storage medium and the electronic device of audio-frequency information
CN113742473A (en) Digital virtual human interaction system and calculation transmission optimization method thereof
CN113961680A (en) Human-computer interaction based session processing method and device, medium and electronic equipment
CN113312928A (en) Text translation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination