CN116489203A

CN116489203A - Virtual reality content making system based on artificial intelligence image recognition

Info

Publication number: CN116489203A
Application number: CN202310505577.7A
Authority: CN
Inventors: 苏鹤琦
Original assignee: Beijing Youli Culture Technology Co ltd
Current assignee: Beijing Youli Culture Technology Co ltd
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-07-25

Abstract

The invention discloses a virtual reality content making system based on artificial intelligent image recognition, which comprises a 360 panoramic video making unit, a three-dimensional model generating unit, an animation special effect making unit, a virtual role generating unit, a voice recognition unit, a script generating unit and a content export terminal, wherein the output end of the 360 panoramic video making unit is in communication connection with the input end of the three-dimensional model generating unit, and the invention has the beneficial effects that: through having added 360 panorama video production units, realized splice the virtual reality image through a plurality of cameras, according to the image adjustment of camera guarantee virtual reality image's display effect, promote the processing to the authenticity of demonstration, through having added virtual role generation unit for establish the virtual role image in the virtual reality, carry out cooperation interactive processing with personnel in the virtual reality.

Description

Virtual reality content making system based on artificial intelligence image recognition

Technical Field

The invention relates to the technical field of virtual reality content production, in particular to a virtual reality content production system based on artificial intelligent image recognition.

Background

The virtual reality is a virtual world simulated by computer technology, and the virtual world can be from the actual existence or from imagination. Virtual reality must be able to provide the user with a sense of reality, providing a simulation of vision, hearing and even touch, smell, the creation of virtual reality being divided into two cases: live-action is shot and 3D modeling scene preparation, wherein 3D modeling scene preparation has contained the two kinds of circumstances that can walk in VR and can no longer walk in VR again, but current virtual reality content preparation system is when carrying out virtual reality with personnel and interacting, be inconvenient for gathering personnel's dialogue information, can't carry out the change to virtual reality content according to personnel's dialogue information fast execution, and the effect adaptability is relatively poor to virtual reality content's preparation, the corner appears in each spliced image combination's the visual screen easily and shows incomplete phenomenon, virtual reality experience feel relatively poor.

Disclosure of Invention

The invention aims to provide a virtual reality content manufacturing system based on artificial intelligent image recognition, which aims to solve the problems that the conventional virtual reality content manufacturing system provided in the background art is inconvenient to collect dialogue information of personnel when the conventional virtual reality content manufacturing system performs virtual reality interaction with the personnel, the virtual reality content cannot be quickly changed according to the dialogue information of the personnel, the adaptability to the manufacturing effect of the virtual reality content is poor, the phenomenon of incomplete corners and display in a video screen combined by all spliced images is easy to occur, and the virtual reality experience is relatively good.

In order to achieve the above purpose, the present invention provides the following technical solutions: the virtual reality content making system based on artificial intelligent image recognition comprises a 360-degree panoramic video making unit, a three-dimensional model generating unit, an animation special effect making unit, a virtual character generating unit, a voice recognition unit, a script generating unit and a content export terminal, wherein the output end of the 360-degree panoramic video making unit is in communication connection with the input end of the three-dimensional model generating unit, the output end of the three-dimensional model generating unit is in communication connection with the input end of the animation special effect making unit, the output end of the animation special effect making unit is in communication connection with the output end of the virtual character generating unit, the output end of the virtual character generating unit is in communication connection with the input end of the voice recognition unit, the output end of the voice recognition unit is in communication connection with the input end of the script generating unit, and the output end of the script generating unit is in communication connection with the input end of the content export terminal;

the 360-degree panoramic video production unit is used for automatically identifying images of a plurality of cameras and performing image stitching processing, and generating and processing 360-degree panoramic video;

the three-dimensional model generating unit is used for constructing and processing a three-dimensional virtual model in the collected 2D images and videos;

the animation special effect making unit is used for making and processing animation special effects in video content according to AI prediction and driving role animation;

the virtual character generating unit is used for generating virtual character images in the virtual content and establishing three-dimensional images of the angles;

the voice recognition unit is used for extracting characteristics of the audio input, and the collected voice enables the virtual character to understand and execute the command of the user;

the scenario generation unit is used for briefly providing the scenario and automatically generating the scenario and the required scenario in the scenario according to the scenario briefly provided;

the content export terminal is used for exporting the produced virtual content and generating the preview version.

As a preferred embodiment of the present invention: the 360-degree panoramic video production unit comprises a panoramic image AI splicing module, a panoramic image correction shooting module and a video blurring elimination module, wherein the output end of the panoramic image AI splicing module is in communication connection with the input end of the panoramic image correction shooting module, and the output end of the panoramic image correction shooting module is in communication connection with the input end of the video blurring elimination module;

the panoramic image AI stitching module is used for stitching images shot by a plurality of cameras through an AI technology, so that panoramic images are acquired, and 360-degree panoramic video is generated;

the panoramic image correction shooting module is used for carrying out splicing correction processing on panoramic images shot by the cameras;

the video blurring elimination module is used for analyzing motion information in a video sequence through an AI algorithm, stabilizing jitter of pictures shot by a camera and eliminating jitter blurring.

As a preferred embodiment of the present invention: the three-dimensional model generation unit comprises an AIGC construction module, a 3D model construction module and a VR equipment adaptation module, wherein the output end of the AIGC construction module is in communication connection with the input end of the 3D model construction module, and the output end of the 3D model construction module is in communication connection with the input end of the VR equipment adaptation module;

the AIGC construction module is used for manufacturing and processing the virtual content image based on AIGC technology;

the 3D model building module is used for building a 3D model from the 2D image and the image in the video based on AIGC;

the VR equipment adaptation module is used for simplifying the 3D model according to VR equipment of different types and optimizing corners of the model.

As a preferred embodiment of the present invention: the animation special effect making unit comprises a driving role animation making module, a physical special effect automatic making module and a scene interactive animation making module, wherein the output end of the driving role animation making module is in communication connection with the input end of the physical special effect automatic making module, and the output end of the physical special effect automatic making module is in communication connection with the input end of the scene interactive animation making module;

the driving role animation production module is used for carrying out animation production processing on the 3D model according to AI prediction and role animation driving;

the physical special effect automatic generation module is used for simulating physical phenomena of the real world by using a physical engine, and simultaneously performing performance processing on visual effects of various physical special effects by using a graphic shader technology;

the scene interactive animation generation module is used for automatically identifying the operation habit and the requirement of a user and making scenes and interactive animations.

As a preferred embodiment of the present invention: the voice recognition unit comprises a voice receiving and releasing module, a natural language processing module, a natural deep learning module and a user command executing module, wherein the output end of the voice receiving and releasing module is in communication connection with the input end of the natural language processing module, the output end of the natural language processing module is in communication connection with the input end of the natural deep learning module, and the output end of the natural deep learning module is in communication connection with the input end of the user command executing module;

the voice receiving and transmitting module is used for collecting and transmitting the voice information of the dialogue;

the natural language processing module is used for carrying out natural language information recognition processing on the voice data;

the natural deep learning module is used for identifying and understanding the collection of the user voice in real time through the deep learning and natural language processing module;

the user command execution module is used for executing processing after recognizing the voice command spoken by the user.

As a preferred embodiment of the present invention: the scenario generation unit comprises a scenario brief description providing module and a scenario AI generation module, wherein the output end of the scenario brief description providing module is in communication connection with the input end of the scenario AI generation module;

the scenario brief description providing module is used for providing scenario brief description according to a user;

the script scene AI generation module is used for automatically generating and setting complete scripts and scenes by utilizing the natural language processing module and the knowledge graph.

As a preferred embodiment of the present invention: the three-dimensional model generating unit is connected with a visual effect adjusting module in a bidirectional manner;

the visual effect adjustment module is used for automatically adjusting texture mapping and illumination calculation of the established three-dimensional virtual model.

As a preferred embodiment of the present invention: the voice recognition unit is connected with a virtual character expression generation unit in a bidirectional mode, and the virtual character expression generation unit is used for recording voice recognition results according to voice data sent by a user and automatically generating facial expression and mouth shape animation reactions of the virtual character by using the generated dialogue information.

As a preferred embodiment of the present invention: the script generation unit is bidirectionally connected with a script visual effect adjustment unit, and the script visual effect adjustment unit is used for adjusting the script and the visual effect according to the theme and the style appointed by the user.

As a preferred embodiment of the present invention: the output end of the content export terminal is electrically connected with a content preview unit, and the content preview unit is used for sending the generated preview version to a user, checking the effect on a platform and carrying out user sharing and cooperation operation.

Compared with the prior art, the invention has the beneficial effects that: the invention realizes the splicing of the virtual reality images through a plurality of cameras and ensures the display effect of the virtual reality images according to the image adjustment of the cameras, improves the display authenticity, establishes the virtual character image in the virtual reality by adding the virtual character generating unit, carries out the cooperation interaction processing with personnel in the virtual reality, realizes the automatic establishment of the scene and character information required in the script by utilizing the brief description by adding the script generating unit, improves the efficiency of virtual reality content production and saves the time.

Drawings

FIG. 1 is a block diagram of a system of the present invention;

FIG. 2 is a block diagram of a panoramic video production unit system of the present invention 360;

FIG. 3 is a block diagram of a speech recognition unit system according to the present invention.

Description of the embodiments

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 to 3, the present invention provides a technical solution: the virtual reality content making system based on artificial intelligent image recognition comprises a 360 panoramic video making unit, a three-dimensional model generating unit, an animation special effect making unit, a virtual character generating unit, a voice identifying unit, a script generating unit and a content exporting terminal, wherein the output end of the 360 panoramic video making unit is in communication connection with the input end of the three-dimensional model generating unit, the output end of the three-dimensional model generating unit is in communication connection with the input end of the animation special effect making unit, the output end of the animation special effect making unit is in communication connection with the output end of the virtual character generating unit, the output end of the virtual character generating unit is in communication connection with the input end of the voice identifying unit, the output end of the voice identifying unit is in communication connection with the input end of the script generating unit, and the output end of the script generating unit is in communication connection with the input end of the content exporting terminal;

the 360-degree panoramic video production unit is used for automatically identifying images of a plurality of cameras and carrying out image stitching processing, and generating and processing 360-degree panoramic video, wherein the processing comprises feature extraction and matching, luminosity correction, distortion correction and optical flow estimation technology so as to realize panoramic video stitching, video stabilization and motion blur elimination functions;

the three-dimensional model generating unit is used for constructing and processing the three-dimensional virtual model in the collected 2D image and video, the AI algorithm evaluates the complexity and performance of the model, and the model is automatically simplified and optimized to adapt to VR hardware with different performances, including operations of reducing the number of top points and reducing the texture quality, but the optimization is realized on the premise of maintaining the visual effect;

the animation special effect making unit is used for making animation special effects in video content according to AI prediction and driving character animation, and optimizing and adjusting action tracks and key frame data of the predicted characters through deep learning and time sequence analysis, so that the character animation making is smoother and natural, the AI automatically generates various physical special effects including cloth, water flow, smoke and flame, and an creator is helped to make the special effects in the real world more quickly;

the voice recognition unit is used for extracting characteristics of the audio input, and enabling the virtual character to understand and execute the command of the user through the collected voice;

the scenario generation unit is used for briefly providing the scenario and automatically generating the scenario and the required scene in the scenario according to the scenario briefly provided;

the content export terminal is used for exporting the manufactured virtual content, generating and processing the preview version, and rapidly generating the preview version, so that a user can conveniently view the effect on the equipment or the platform; support sharing and collaboration with other users to improve work efficiency.

The 360 panoramic video manufacturing unit comprises a panoramic image AI splicing module, a panoramic image correction shooting module and a video blurring elimination module, wherein the output end of the panoramic image AI splicing module is in communication connection with the input end of the panoramic image correction shooting module, and the output end of the panoramic image correction shooting module is in communication connection with the input end of the video blurring elimination module;

the video blurring elimination module is used for analyzing motion information in a video sequence through an AI algorithm, stabilizing jitter of pictures shot by a camera, eliminating jitter blurring, and comprises feature extraction and matching, luminosity correction, distortion correction and optical flow estimation technologies so as to realize panoramic video splicing, video stabilization and motion blurring elimination functions.

The three-dimensional model generation unit comprises an AIGC construction module, a 3D model construction module and a VR equipment adaptation module, wherein the output end of the AIGC construction module is in communication connection with the input end of the 3D model construction module, and the output end of the 3D model construction module is in communication connection with the input end of the VR equipment adaptation module;

the VR equipment adaptation module is used for simplifying and optimizing the 3D model according to VR equipment of different types and model corners, the AI algorithm evaluates the complexity and performance of the model, and the model is automatically simplified and optimized to adapt to VR hardware of different performances, including operations of reducing the number of top points and reducing the texture quality, but optimization is realized on the premise of maintaining visual effect.

The animation special effect making unit comprises a driving role animation making module, a physical special effect automatic making module and a scene interactive animation making module, wherein the output end of the driving role animation making module is in communication connection with the input end of the physical special effect automatic making module, and the output end of the physical special effect automatic making module is in communication connection with the input end of the scene interactive animation making module;

the scene interactive animation generation module is used for automatically identifying the operation habit and the requirement of a user, making scenes and interactive animations, predicting the action track and key frame data of a role through deep learning and time sequence analysis, and optimizing and adjusting, so that the role animation making is smoother and natural, and the AI automatically generates various physical special effects including cloth, water flow, smoke and flame, thereby helping an creator to make the special effects in the real world more quickly.

The voice recognition unit comprises a voice receiving and releasing module, a natural language processing module, a natural deep learning module and a user command execution module, wherein the output end of the voice receiving and releasing module is in communication connection with the input end of the natural language processing module, the output end of the natural language processing module is in communication connection with the input end of the natural deep learning module, and the output end of the natural deep learning module is in communication connection with the input end of the user command execution module;

the natural deep learning module is used for recognizing and understanding the collection of the user voice in real time through the deep learning and natural language processing module;

The script generation unit comprises a script brief description providing module and a script scene AI generation module, wherein the output end of the script brief description providing module is in communication connection with the input end of the script scene AI generation module;

and the scenario AI generation module is used for automatically generating and setting the complete scenario and scenario by utilizing the natural language processing module and the knowledge graph.

The three-dimensional model generating unit is connected with a visual effect adjusting module in a bidirectional manner;

the visual effect adjusting module is used for automatically adjusting texture mapping and illumination calculation of the established three-dimensional virtual model.

The system comprises a voice recognition unit, a virtual character expression generation unit, a voice recognition unit and a system, wherein the voice recognition unit is bidirectionally connected with the virtual character expression generation unit, the virtual character expression generation unit is used for recording voice data sent by a user, automatically generating facial expressions and mouth shape animation reactions of the virtual character by using generated dialogue information, and automatically generating vivid facial expressions, mouth shape animation and corresponding reactions of the virtual character according to the voice recognition result.

The script generation unit is bidirectionally connected with a script visual effect adjustment unit, and the script visual effect adjustment unit is used for adjusting the script and the visual effect according to the theme and the style appointed by the user.

The output end of the content export terminal is electrically connected with a content preview unit, and the content preview unit is used for transmitting the generated preview version to a user to view effects on a platform, carrying out user sharing and collaboration operations, and rapidly generating the preview version, so that the user can conveniently view the effects on the equipment or the platform; support sharing and collaboration with other users to improve work efficiency.

Specifically, when in use, real-time panoramic video stitching: the system adopts a deep learning model (such as a convolutional neural network) to extract image characteristics, calculates the best matching between adjacent images by minimizing an energy function, models and corrects lens attributes of the cameras by using a distortion correction algorithm for geometric distortion, and calculates motion vectors in a scene by using an optical flow estimation and feature point matching technology. Based on the motion information, the Kalman filtering algorithm is used for video stabilization, meanwhile, the motion blur model and the deceleration exposure algorithm are adopted for eliminating the motion blur, the scene depth information is recovered from the picture by using the structured light method and the binocular vision method, the extracted depth information is combined with image features, dense point cloud data is constructed, shape reconstruction is carried out on the point cloud data, a 3D model is generated, the model is optimized by using the grid simplification algorithm (Quadric Error Metrics algorithm), the number of top points and triangle is reduced, the texture map is downsampled and compressed for reducing the rendering burden, the deep learning model is utilized for training the existing animation data, new key frame animation data are generated, when the model is trained, an external animation database and a user custom data set are used, the physical phenomenon of the real world is simulated by using a physical engine (Nvidia PhysX and Havok), the visual effect of various physical effects is represented by using the graphic shader technology, the audio input is extracted by using the deep learning model (mixed DNN-HMM model and LSTM), finally, the probability of each node is calculated by using the Hidden Markov Model (HMM) is calculated, the corresponding character map is generated, the character map is generated by using the corresponding character map, the character map is generated, the natural language map is generated, and the character map is generated by using the corresponding character map is generated, and the character map is processed based on the corresponding character map. The method comprises the steps of generating a scenario, a scene and a role self-description through a model, adjusting the theme and the style, automatically adjusting the styles of images, models and animations by using a style migration algorithm (CycleGAN), automatically adjusting the contents of scenario and role conversations by combining keyword analysis and emotion analysis technologies, setting different rendering parameters, texture resolution and model optimization levels based on target platforms and equipment selected by users, adopting a multithreading technology and asynchronous calculation, accelerating a derivation process, realizing cross-platform and cross-equipment real-time preview by using a network transmission technology (WebSocket, webRTC), and supporting sharing and cooperation of users and team members by using a cloud storage technology.

In the description of the present invention, it should be understood that the terms "coaxial," "bottom," "one end," "top," "middle," "another end," "upper," "one side," "top," "inner," "front," "center," "two ends," etc. indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, whereby features defining "first," "second," "third," "fourth" may explicitly or implicitly include at least one such feature.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "configured," "connected," "secured," "screwed," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intermediaries, or in communication with each other or in interaction with each other, unless explicitly defined otherwise, the meaning of the terms described above in this application will be understood by those of ordinary skill in the art in view of the specific circumstances.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The virtual reality content making system based on artificial intelligent image recognition is characterized by comprising a 360-degree panoramic video making unit, a three-dimensional model generating unit, an animation special effect making unit, a virtual role generating unit, a voice recognition unit, a script generating unit and a content export terminal, wherein the output end of the 360-degree panoramic video making unit is in communication connection with the input end of the three-dimensional model generating unit, the output end of the three-dimensional model generating unit is in communication connection with the input end of the animation special effect making unit, the output end of the animation special effect making unit is in communication connection with the output end of the virtual role generating unit, the output end of the virtual role generating unit is in communication connection with the input end of the voice recognition unit, the output end of the voice recognition unit is in communication connection with the input end of the script generating unit, and the output end of the script generating unit is in communication connection with the input end of the content export terminal;

2. The virtual reality content creating system based on artificial intelligence image recognition of claim 1, wherein: the 360-degree panoramic video production unit comprises a panoramic image AI splicing module, a panoramic image correction shooting module and a video blurring elimination module, wherein the output end of the panoramic image AI splicing module is in communication connection with the input end of the panoramic image correction shooting module, and the output end of the panoramic image correction shooting module is in communication connection with the input end of the video blurring elimination module;

3. The virtual reality content making system based on artificial intelligence image recognition of claim 2, wherein: the three-dimensional model generation unit comprises an AIGC construction module, a 3D model construction module and a VR equipment adaptation module, wherein the output end of the AIGC construction module is in communication connection with the input end of the 3D model construction module, and the output end of the 3D model construction module is in communication connection with the input end of the VR equipment adaptation module;

4. A virtual reality content creation system based on artificial intelligence image recognition according to claim 3, characterized in that: the animation special effect making unit comprises a driving role animation making module, a physical special effect automatic making module and a scene interactive animation making module, wherein the output end of the driving role animation making module is in communication connection with the input end of the physical special effect automatic making module, and the output end of the physical special effect automatic making module is in communication connection with the input end of the scene interactive animation making module;

5. The virtual reality content creating system based on artificial intelligence image recognition of claim 4, wherein: the voice recognition unit comprises a voice receiving and releasing module, a natural language processing module, a natural deep learning module and a user command executing module, wherein the output end of the voice receiving and releasing module is in communication connection with the input end of the natural language processing module, the output end of the natural language processing module is in communication connection with the input end of the natural deep learning module, and the output end of the natural deep learning module is in communication connection with the input end of the user command executing module;

6. The virtual reality content creating system based on artificial intelligence image recognition of claim 5, wherein: the scenario generation unit comprises a scenario brief description providing module and a scenario AI generation module, wherein the output end of the scenario brief description providing module is in communication connection with the input end of the scenario AI generation module;

7. The virtual reality content creating system based on artificial intelligence image recognition of claim 6, wherein: the three-dimensional model generating unit is connected with a visual effect adjusting module in a bidirectional manner;

8. The virtual reality content creating system based on artificial intelligence image recognition of claim 7, wherein: the voice recognition unit is connected with a virtual character expression generation unit in a bidirectional mode, and the virtual character expression generation unit is used for recording voice recognition results according to voice data sent by a user and automatically generating facial expression and mouth shape animation reactions of the virtual character by using the generated dialogue information.

9. The virtual reality content creating system based on artificial intelligence image recognition of claim 8, wherein: the script generation unit is bidirectionally connected with a script visual effect adjustment unit, and the script visual effect adjustment unit is used for adjusting the script and the visual effect according to the theme and the style appointed by the user.

10. The virtual reality content creating system based on artificial intelligence image recognition of claim 9, wherein: the output end of the content export terminal is electrically connected with a content preview unit, and the content preview unit is used for sending the generated preview version to a user, checking the effect on a platform and carrying out user sharing and cooperation operation.