CN117635830A

CN117635830A - Metaverse scene construction method, system, server and storage medium

Info

Publication number: CN117635830A
Application number: CN202311595119.3A
Authority: CN
Inventors: 曹汝帅; 胡苏�; 黄琼峰; 赵家成
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-03-01

Abstract

The invention provides a method, system, server and storage medium for constructing a metaverse scene. The method includes: constructing a three-dimensional space model of the metaverse scene corresponding to the real scene based on multiple multi-angle real scene pictures uploaded by the user; receiving the The uploaded original sound file and the recorded file, the recorded file is a file recorded when the user plays the original sound file in the real scene; perform sound rendering processing on the original sound file in the three-dimensional space model; The recorded file and the rendered audio file are subjected to spectrum analysis and comparison, and based on the comparison result, it is judged whether the constructed three-dimensional space model is accurate. In this invention, the dependence on the UI designer is greatly reduced under the premise of restoring reality, and after creating the three-dimensional space model of the metaverse scene, the sound space verification of the three-dimensional space model is performed based on the audio file, ensuring that the constructed three-dimensional space model is accuracy.

Description

Metaverse scene construction method, system, server and storage medium

技术领域Technical field

本发明实施例涉及三维建模技术领域，尤其涉及一种元宇宙场景的构建方法、系统、服务器及存储介质。Embodiments of the present invention relate to the technical field of three-dimensional modeling, and in particular to a method, system, server and storage medium for constructing a metaverse scene.

背景技术Background technique

现有技术中，创建元宇宙场景时，主要依靠界面设计(UI)设计师提供素材并绘制，且场景多为臆想构建，缺乏与现实的对应。In the existing technology, when creating metaverse scenes, the interface design (UI) designer is mainly relied on to provide materials and draw them, and the scenes are mostly constructed from imagination and lack correspondence with reality.

发明内容Contents of the invention

本发明实施例提供一种元宇宙场景的构建方法、系统、服务器及存储介质，用于解决现有的元宇宙场景多为臆想构建，缺乏与现实的对应的问题。Embodiments of the present invention provide a method, system, server and storage medium for constructing metaverse scenes, which are used to solve the problem that existing metaverse scenes are mostly constructed based on imagination and lack correspondence with reality.

为了解决上述技术问题，本发明是这样实现的：In order to solve the above technical problems, the present invention is implemented as follows:

第一方面，本发明实施例提供了一种元宇宙场景的构建方法，包括：In the first aspect, embodiments of the present invention provide a method for constructing a metaverse scene, including:

根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型；Construct a three-dimensional space model of the metaverse scene corresponding to the real scene scene based on multiple multi-angle real scene pictures uploaded by the user;

接收用户上传的原声文件和录制文件，所述录制文件为用户在所述实景场景播放所述原声文件时录制得到的文件；Receive original sound files and recorded files uploaded by users, where the recorded files are files recorded when the user plays the original sound files in the real-life scene;

将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件；Perform sound rendering processing on the original sound file within the three-dimensional space model to obtain a rendered audio file;

对所述录制文件和所述渲染后的音频文件进行频谱分析并比对，根据比对结果判断构建的所述三维空间模型是否准确。Perform spectrum analysis and comparison on the recorded file and the rendered audio file, and determine whether the constructed three-dimensional space model is accurate based on the comparison results.

可选地，所述实景场景图片包括以下至少一项：用户拍摄的图片、用户从本地相册上传的图片、用户从图片库中与所述实景场景匹配的图片中选择的图片，所述图片库中的图片包括以下至少一项：通过网络爬虫获取的图片，用户上传的图片。Optionally, the real scene pictures include at least one of the following: pictures taken by the user, pictures uploaded by the user from the local photo album, pictures selected by the user from pictures matching the real scene in the picture library, the picture library The pictures in include at least one of the following: pictures obtained through web crawlers, pictures uploaded by users.

可选地，所述方法还包括：Optionally, the method also includes:

构建图片库，其中，所述构建图片库包括：Build a picture library, where the picture library includes:

对通过网络爬虫获取和/或用户上传的目标图片进行粗分类，所述粗分类包括：对所述目标图片中的部分目标图片标注场景大类；利用神经网络模型提取已标注场景大类的目标图片和待分类的目标图片的数字特征，并计算所述待分类的目标图片与每一类已标注场景大类的目标图片的距离，根据所述距离确定所述待分类的目标图片所属的场景大类；Carry out a rough classification on target pictures obtained through web crawlers and/or uploaded by users. The rough classification includes: labeling some of the target pictures with scene categories; using a neural network model to extract targets with labeled scene categories. digital features of the picture and the target picture to be classified, and calculate the distance between the target picture to be classified and the target picture of each category of marked scene categories, and determine the scene to which the target picture to be classified belongs based on the distance categories;

对粗分类后的目标图片进行细分类，所述细分类包括：对属于目标场景大类的第一目标图片和第二目标图片进行物体识别，得到物体列表，并提取所述第一目标图片和所述第二目标图片中的物体中的规则形状的物体；若所述第一目标图片和所述第二目标图片的物体列表的交集中的物体个数与所述第一目标图片和所述第二目标图片的物体列表的并集的比值大于或等于第一预设比例，确定所述第一目标图片和所述第二目标图片属于同一场景小类；若所述交集为空，确定所述第一目标图片和所述第二目标图片不属于同一场景小类；若所述交集与所述并集的比值小于所述第一预设比例，对所述交集中的所述规则形状的物体进行相比度比较，若所述交集中的所述规则形状的物体的相似度大于或等于第二预设比例，确定所述第一目标图片和所述第二目标图片不属于同一场景小类。Subdividing the roughly classified target pictures. The subdivided classification includes: performing object recognition on the first target picture and the second target picture belonging to the target scene category, obtaining an object list, and extracting the first target picture and the second target picture. regular-shaped objects among the objects in the second target picture; if the number of objects in the intersection of the object lists of the first target picture and the second target picture is equal to the number of objects in the first target picture and the If the ratio of the union of the object lists of the second target picture is greater than or equal to the first preset ratio, it is determined that the first target picture and the second target picture belong to the same scene subcategory; if the intersection is empty, it is determined that the The first target picture and the second target picture do not belong to the same scene subcategory; if the ratio of the intersection and the union is less than the first preset ratio, the regular shape in the intersection is The objects are compared for comparison. If the similarity of the regular-shaped objects in the intersection is greater than or equal to the second preset ratio, it is determined that the first target picture and the second target picture do not belong to the same scene. kind.

可选地，所述方法还包括：Optionally, the method also includes:

获取通过网络爬虫获取的图片所在的网页中的相关文本；Obtain the relevant text in the web page where the image obtained through the web crawler is located;

对所述文本进行分析，根据分析结果确定所述通过网络爬虫获取的图片的场景分类。The text is analyzed, and the scene classification of the pictures obtained through the web crawler is determined according to the analysis results.

可选地，所述根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型包括：Optionally, building a three-dimensional space model of the metaverse scene corresponding to the real scene based on multiple multi-angle real scene pictures uploaded by the user includes:

对所述多张多角度实景场景图片进行物体识别，得到每一张所述实景场景图片的物体列表；Perform object recognition on the multiple multi-angle real scene pictures to obtain an object list for each of the real scene pictures;

合并同一场景分类下的实景场景图片的物体列表，得到物体列表集合；Merge the object lists of real scene pictures under the same scene classification to obtain a collection of object lists;

确定所述同一场景分类下的每张所述实景场景图片中的物体的尺寸数据；Determine the size data of objects in each of the real scene pictures under the same scene classification;

根据所述物体列表集合中的每个物体在不同的所述实景场景图片中的所有尺寸数据，确定所述物体列表集合中的每个物体的目标尺寸数据；Determine the target size data of each object in the object list set according to all size data of each object in the object list set in different real scene pictures;

根据所述多张多角度实景场景图片中的每个物体的目标尺寸数据，构建所述实景场景对应的元宇宙场景的三维空间模型。According to the target size data of each object in the multiple multi-angle real scene pictures, a three-dimensional space model of the metaverse scene corresponding to the real scene is constructed.

可选地，根据所述多张多角度实景场景图片中的每个物体的目标尺寸数据，构建所述实景场景对应的元宇宙场景的三维空间模型，包括：Optionally, based on the target size data of each object in the multiple multi-angle real scene pictures, construct a three-dimensional space model of the metaverse scene corresponding to the real scene, including:

获取所述实景场景图片中的物体的外轮廓和内轮廓；Obtain the outer contour and inner contour of the object in the real scene picture;

获取所述实景场景图片中的物体的深度信息；Obtain depth information of objects in the real scene picture;

根据所述实景场景图片中的物体的目标尺寸数据、外轮廓、内轮廓和深度信息，构建所述实景场景对应的元宇宙场景的三维空间模型。According to the target size data, outer contour, inner contour and depth information of the object in the real scene picture, a three-dimensional space model of the metaverse scene corresponding to the real scene scene is constructed.

可选地，将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件，包括：Optionally, the original sound file is subjected to sound rendering processing in the three-dimensional space model to obtain a rendered audio file, including:

确定所述原声文件对应的声源以及所述录制文件对应的听者在所述三维空间模型内空间坐标，所述声源的空间坐标与用户在所述实景场景播放所述原声文件时的位置对应，所述听者的空间坐标与用户在所述实景场景录制所述录制文件时的位置对应；Determine the spatial coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model. The spatial coordinates of the sound source are related to the position of the user when the original sound file is played in the real scene. Correspondingly, the spatial coordinates of the listener correspond to the user's position when recording the recording file in the real scene;

根据所述原声文件对应的声源以及所述录制文件对应的听者在所述三维空间模型内空间坐标，将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件。According to the spatial coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recording file in the three-dimensional space model, the original sound file is subjected to sound rendering processing in the three-dimensional space model to obtain the rendered audio document.

第二方面，本发明实施例提供了一种元宇宙场景的构建系统，其特征在于，包括：In the second aspect, embodiments of the present invention provide a metaverse scene construction system, which is characterized by including:

第一构建模块，用于根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型；The first building module is used to construct a three-dimensional space model of the metaverse scene corresponding to the real-life scene based on multiple multi-angle real-life scene pictures uploaded by the user;

接收模块，用于接收用户上传的原声文件和录制文件，所述录制文件为用户在所述实景场景播放所述原声文件时录制得到的文件；A receiving module, configured to receive original sound files and recorded files uploaded by users. The recorded files are files recorded when the user plays the original sound file in the real-life scene;

渲染模块，用于将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件；A rendering module, used to perform sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;

校验模块，用于对所述录制文件和所述渲染后的音频文件进行频谱分析并比对，根据比对结果判断构建的所述三维空间模型是否准确。A verification module, configured to perform spectrum analysis and comparison on the recorded file and the rendered audio file, and determine whether the constructed three-dimensional space model is accurate based on the comparison results.

第三方面，本发明实施例提供了一种服务器，其特征在于，包括：处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序，所述程序被所述处理器执行时实现如上述第一方面所述的元宇宙场景的构建方法的步骤。In a third aspect, an embodiment of the present invention provides a server, which is characterized in that it includes: a processor, a memory, and a program stored on the memory and executable on the processor, and the program is processed by the When the processor is executed, the steps of the method for constructing a metaverse scene as described in the first aspect are implemented.

第四方面，本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上述第一方面所述的元宇宙场景的构建方法的步骤。In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the elements described in the first aspect are implemented. Steps in the construction method of universe scene.

在本发明实施例中，采用用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型，在还原现实的前提下大幅降低对UI设计师的依赖，并且创建元宇宙场景的三维空间模型之后，根据音频文件进行三维空间模型的声音空间校验，保证了构建的三维空间模型的准确性。In the embodiment of the present invention, multiple multi-angle real scene pictures uploaded by users are used to construct a three-dimensional space model of the metaverse scene corresponding to the real scene scene, which greatly reduces the dependence on UI designers on the premise of restoring reality, and After creating the three-dimensional space model of the metaverse scene, the sound space verification of the three-dimensional space model is performed based on the audio file to ensure the accuracy of the constructed three-dimensional space model.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be construed as limiting the invention. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:

图1为本发明实施例的元宇宙的构建方法的流程示意图；Figure 1 is a schematic flow chart of a metaverse construction method according to an embodiment of the present invention;

图2为本发明实施例的用于用户上传实景场景图片的用户界面的示意图；Figure 2 is a schematic diagram of a user interface for users to upload real scene pictures according to an embodiment of the present invention;

图3为本发明实施例的保存构建的三维空间模型的用户界面示意图；Figure 3 is a schematic diagram of a user interface for saving a constructed three-dimensional space model according to an embodiment of the present invention;

图4和图5为本发明实施例的四维光场函数的原理示意图；Figures 4 and 5 are schematic diagrams of the principles of the four-dimensional light field function according to the embodiment of the present invention;

图6为本发明实施例的通过用户上传的音频数据对三维空间模型进行自动化微调的用户界面的示意图；Figure 6 is a schematic diagram of a user interface for automatically fine-tuning a three-dimensional space model through audio data uploaded by users according to an embodiment of the present invention;

图7为本发明实施例的元宇宙的构建系统的结构示意图；Figure 7 is a schematic structural diagram of a metaverse construction system according to an embodiment of the present invention;

图8为本发明实施例的服务器的结构示意图。Figure 8 is a schematic structural diagram of a server according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

请参考图1，本发明实施例提供一种元宇宙场景的构建方法，包括：Please refer to Figure 1. An embodiment of the present invention provides a method for constructing a metaverse scene, including:

步骤11：根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型；Step 11: Construct a three-dimensional space model of the metaverse scene corresponding to the real-life scene based on multiple multi-angle real-life scene pictures uploaded by the user;

本发明实施例中，所述多张多角度实景场景图片是同一实景场景空间下的图片。In the embodiment of the present invention, the plurality of multi-angle real scene pictures are pictures in the same real scene space.

本发明实施例中，所述角度可以包括以下至少一项：整面、左侧面、右侧面等。In the embodiment of the present invention, the angle may include at least one of the following: the entire surface, the left side, the right side, etc.

步骤12：接收用户上传的原声文件和录制文件，所述录制文件为用户在所述实景场景播放所述原声文件时录制得到的文件；Step 12: Receive the original sound file and the recorded file uploaded by the user. The recorded file is the file recorded when the user plays the original sound file in the real-life scene;

步骤13：将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件；Step 13: Perform sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;

步骤14：对所述录制文件和所述渲染后的音频文件进行频谱分析并比对，根据比对结果判断构建的所述三维空间模型是否准确。Step 14: Perform spectrum analysis and comparison on the recorded file and the rendered audio file, and determine whether the constructed three-dimensional space model is accurate based on the comparison results.

本发明实施例中，如图2所示，可以提供一用户界面用于用户上传实景场景图片。用户可以在该用户界面执行以下操作：In this embodiment of the present invention, as shown in Figure 2, a user interface can be provided for users to upload real scene pictures. Users can perform the following operations in this user interface:

1.用户可以在文本输入框中输入场景名，如某某宫或某某门或是其他自定义的场景名，如我的理想城堡等。或者，更具体的某某宫书房等。1. The user can enter the scene name in the text input box, such as a certain palace or a certain gate, or other customized scene names, such as my ideal castle, etc. Or, more specifically, the study room of a certain palace, etc.

2.用户上传实景场景图片。2. Users upload real scene pictures.

本发明实施例中，所述实景场景图片包括以下至少一项：用户拍摄的图片、用户从本地相册上传的图片、用户从图片库中与所述实景场景匹配的图片中选择的图片。In the embodiment of the present invention, the real scene pictures include at least one of the following: pictures taken by the user, pictures uploaded by the user from the local photo album, and pictures selected by the user from the pictures in the picture library that match the real scene.

即，用户可以通过拍照或浏览本地相册的方式，上传实景场景图片。如果用户勾选了使用图片库，系统将根据第一步用户输入的场景名在图片库中搜索与所述场景名匹配的图片，可选地，图片库中每个图片具有场景分类标签，该场景分类标签用于指示该图片所属的实景场景。如果在图片库中搜索到与用户输入的场景名匹配的图片，可以将图片显示在图2所示的用户界面的右侧的浏览区域内，供用户选择使用。用户可以通过点击或者拖拽等方式，选择浏览区域内的图片。在浏览区域内，用户还可以通过滑动等方式浏览更多的图片。若用户的场景名是“我的理想城堡”这类自定义的场景名，有可能图库中并没有匹配的照片，则可以在该浏览区域给出提示，例如提示“没有找到匹配的图片”。That is, users can upload real scene pictures by taking photos or browsing local photo albums. If the user checks to use the picture library, the system will search the picture library for pictures matching the scene name according to the scene name entered by the user in the first step. Optionally, each picture in the picture library has a scene classification label. The scene classification label is used to indicate the real scene to which the image belongs. If a picture matching the scene name input by the user is found in the picture library, the picture can be displayed in the browsing area on the right side of the user interface shown in Figure 2 for the user to select and use. Users can select pictures in the browsing area by clicking or dragging. In the browsing area, users can also browse more pictures by sliding and other methods. If the user's scene name is a customized scene name such as "My Ideal Castle", there may be no matching photos in the gallery, and a prompt can be given in the browsing area, for example, "No matching picture found."

本发明实施例中，所述实景场景的角度可以包括以下至少一项：整面、左侧面、右侧面等。为了建模的准确性，本发明实施例中，可以对每个角度的图片的数量进行限定，如每个角度的图片至少有两张。除了规定角度的照片之外，用户也可以上传其他角度的照片。In this embodiment of the present invention, the angle of the real scene may include at least one of the following: the entire surface, the left side, the right side, etc. For the sake of modeling accuracy, in the embodiment of the present invention, the number of pictures from each angle can be limited, for example, there are at least two pictures from each angle. In addition to photos from specified angles, users can also upload photos from other angles.

3.用户点击确认提交按钮，将实景场景图片上传至服务器。3. The user clicks the Confirm Submit button to upload the real scene picture to the server.

可选地，上传到服务器之前，还可以经过简单数据核实，例如，数据核实包括以下至少一项：图片张数是否达标，场景名是否输入等。Optionally, before uploading to the server, a simple data verification can also be performed. For example, the data verification includes at least one of the following: whether the number of pictures meets the standard, whether the scene name is entered, etc.

下面对本发明实施例的图片库的构建方法进行说明。The following describes the method for constructing a picture library according to the embodiment of the present invention.

本发明实施例中，所述图片库中的图片包括以下至少一项：通过网络爬虫获取的图片，用户上传的图片。In this embodiment of the present invention, the pictures in the picture library include at least one of the following: pictures obtained through web crawlers, and pictures uploaded by users.

通过网络爬虫获取图片时，可以先设定场景，根据场景进行网络图片的爬虫获取，并将获取的图片下载到本地特定资源文件夹内，并设置该些图片的场景分类标签。When obtaining pictures through a web crawler, you can first set the scene, crawl the web pictures according to the scene, download the obtained pictures to a local specific resource folder, and set the scene classification tags of these pictures.

本发明实施例中，请参考图3，可以在保存构建的三维空间模型的用户界面上，提示用户是否将图片授权共享到图片库中。若用户没有点击授权使用，当用户点击完成按钮后，用户在图2所示的用户界面上传的图片将在后台被视为临时数据删除掉；若用户点击了授权使用，则将图片上传至服务器，经后续处理(如设置场景分类标签)后，添加到自有图片库中。In the embodiment of the present invention, please refer to Figure 3. On the user interface for saving the constructed three-dimensional space model, the user can be prompted whether to authorize the sharing of pictures to the picture library. If the user does not click to authorize use, when the user clicks the completion button, the pictures uploaded by the user in the user interface shown in Figure 2 will be regarded as temporary data and deleted in the background; if the user clicks to authorize use, the pictures will be uploaded to the server , and after subsequent processing (such as setting scene classification labels), add it to your own picture library.

本发明实施例中，针对通过网络爬虫获取的图片，还需要对这些图片进行场景分类，并设置场景分类标签，当然，为了准确起见，对于用户上传的图片，也可以进行场景分类，并设置场景分类标签。In the embodiment of the present invention, for pictures obtained through web crawlers, it is also necessary to perform scene classification on these pictures and set scene classification tags. Of course, for the sake of accuracy, scene classification can also be performed on pictures uploaded by users and scenes can be set. Classification tags.

本发明实施例中，构建图片库包括：In this embodiment of the present invention, building a picture library includes:

1.对通过网络爬虫获取和/或用户上传的目标图片进行粗分类，所述粗分类包括：1. Roughly classify the target images obtained through web crawlers and/or uploaded by users. The rough classification includes:

1.1对所述目标图片中的部分目标图片标注场景大类；可选地，对所述部分目标图片人工标注场景大类时，需要保证每种场景大类对应的图片不少于预设数值张，例如，不少于5张，场景大类例如可以为：某某宫外部环境、某某宫建筑、某某宫内部环境、某某宫其他、其他错误图片共五种场景大类。1.1 Mark scene categories for some of the target images in the target images; optionally, when manually annotating scene categories for some of the target images, it is necessary to ensure that the number of images corresponding to each scene category is no less than the preset value. , for example, there are no less than 5 pictures, and the scene categories can be, for example, five scene categories: external environment of XX Palace, building of XX Palace, internal environment of XX Palace, XX Palace other, and other wrong pictures.

1.2利用神经网络模型提取已标注场景大类的目标图片和待分类的目标图片的数字特征，并计算所述待分类的目标图片与每一类已标注场景大类的目标图片的距离，根据所述距离确定所述待分类的目标图片所属的场景大类；1.2 Use the neural network model to extract the digital features of the target images that have been labeled with the scene categories and the target images to be classified, and calculate the distance between the target images to be classified and the target images that have been labeled with the scene categories of each category. According to the The distance determines the scene category to which the target picture to be classified belongs;

本发明实施例中，所述距离可以为欧几里德距离，欧几里德距离d(x,μ)的计算公式如下：In the embodiment of the present invention, the distance may be a Euclidean distance, and the calculation formula of the Euclidean distance d(x, μ) is as follows:

其中，x为待分类的目标图片的数字特征，u_i为人工标注的第i类的图片的数字特征均值(因为每类下面不是只有一张图片，故取均值)，n为人工标注的场景大类的数量。Among them, x is the digital feature of the target image to be classified, u _i is the mean digital feature of the i-th category of images manually labeled (because there is not only one image under each category, the average is taken), n is the manually labeled scene The number of categories.

即，计算待分类的目标图片的数字特征距离上述哪一类人工标注的场景大类的特征距离均值最近，则将该待分类的目标图片判定为属于哪一类场景大类。That is, calculate which of the above-mentioned manually labeled scene categories the digital features of the target image to be classified have the closest mean distance to, and then determine which scene category the target image to be classified belongs to.

以上述五类场景大类为例，分类完成后，可以删除“其他错误图片”分类下的图片资源，余下四类图片进入后续处理过程。Taking the above five major scene categories as an example, after the classification is completed, the image resources under the "Other Error Pictures" category can be deleted, and the remaining four categories of images will enter the subsequent processing process.

2.对粗分类后的目标图片进行细分类。2. Carry out detailed classification on the target images after rough classification.

所述细分类包括：The subdivisions include:

2.1对属于目标场景大类的第一目标图片和第二目标图片进行物体识别，得到物体列表，并提取所述第一目标图片和所述第二目标图片中的物体中的规则形状的物体；2.1 Perform object recognition on the first target picture and the second target picture belonging to the target scene category, obtain an object list, and extract regular-shaped objects among the objects in the first target picture and the second target picture;

可选地，所述规则形状包括以下至少一项：矩形、圆形、三角形。Optionally, the regular shape includes at least one of the following: rectangle, circle, triangle.

2.2若所述第一目标图片和所述第二目标图片的物体列表的交集中的物体个数与所述第一目标图片和所述第二目标图片的物体列表的并集的比值大于或等于第一预设比例(如40％)，确定所述第一目标图片和所述第二目标图片属于同一场景小类；2.2 If the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to A first preset ratio (such as 40%) determines that the first target picture and the second target picture belong to the same scene subcategory;

若所述交集为空，确定所述第一目标图片和所述第二目标图片不属于同一场景小类；If the intersection is empty, it is determined that the first target picture and the second target picture do not belong to the same scene subcategory;

若所述交集与所述并集的比值小于所述第一预设比例，对所述交集中的所述规则形状的物体进行相比度比较，若所述交集中的所述规则形状的物体的相似度大于或等于第二预设比例(如60％)，确定所述第一目标图片和所述第二目标图片不属于同一场景小类。If the ratio of the intersection and the union is less than the first preset ratio, compare the regular-shaped objects in the intersection, and if the regular-shaped objects in the intersection If the similarity is greater than or equal to the second preset ratio (such as 60%), it is determined that the first target picture and the second target picture do not belong to the same scene subcategory.

以某某宫内部环境的场景大类为例，细分类后得到的场景小类可以包括：餐厅、书房、卧室、厨房、盥洗室等。Taking the scene category of the internal environment of a certain palace as an example, the scene subcategories obtained after subdivision can include: dining room, study room, bedroom, kitchen, bathroom, etc.

本发明实施例中，可选地，所述元宇宙场景的创建方法还包括：获取通过网络爬虫获取的图片所在的网页中的相关文本；对所述文本进行分析(如分词处理)，根据分析结果确定所述通过网络爬虫获取的图片的场景分类。该种反向追溯的方式适用于上述无法通过细分类确定图片的场景分类的情况。举例来说，假设图片ImgA是从网址h1中爬取的，可以提取出该<image>(ImgA)标签前后中有<p>/<div>等标签的文本内容中的文本，记为ImgA_h1_T0，对文本进行分词处理，获得其中的名词。判断场景分类是否出现在名词列表里，若存在，则通过分析句子语法结构找到图片的标签。若不存在，则直接分析名词列表，结合上下文语境给出图片的场景分类标签。若图片ImgA与ImgB的场景分类标签相同，即使没有重叠的物品列表，也可以正确的被判定为一类。In the embodiment of the present invention, optionally, the creation method of the metaverse scene also includes: obtaining relevant text in the web page where the picture obtained through a web crawler is located; analyzing the text (such as word segmentation processing), and based on the analysis The result determines the scene classification of the pictures obtained through the web crawler. This reverse tracing method is suitable for the above-mentioned situation where the scene classification of the picture cannot be determined through subdivision. For example, assuming that the image ImgA is crawled from the URL h1, the text in the text content with tags such as <p>/<div> before and after the <image> (ImgA) tag can be extracted and recorded as ImgA_h1_T0. Perform word segmentation processing on the text to obtain the nouns in it. Determine whether the scene classification appears in the noun list. If it exists, find the label of the picture by analyzing the grammatical structure of the sentence. If it does not exist, the noun list is directly analyzed and the scene classification label of the picture is given based on the context. If the scene classification labels of images ImgA and ImgB are the same, they can be correctly judged as one category even if there are no overlapping item lists.

本发明实施例中，可选地，所述根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型包括：In the embodiment of the present invention, optionally, building a three-dimensional space model of the metaverse scene corresponding to the real-life scene based on multiple multi-angle real-life scene pictures uploaded by the user includes:

步骤111：对所述多张多角度实景场景图片进行物体识别，得到每一张所述实景场景图片的物体列表；Step 111: Perform object recognition on the multiple multi-angle real scene pictures to obtain an object list for each of the real scene pictures;

步骤112：合并同一场景分类下的实景场景图片的物体列表，得到物体列表集合；Step 112: Merge the object lists of real scene pictures under the same scene classification to obtain a set of object lists;

即，将属于同一场景分类的图片看成一个整体，每张图片是这个整体的不同角度，为绘制出整体提供了不同的角度细节和数据。That is, pictures belonging to the same scene category are viewed as a whole. Each picture is a different angle of the whole, providing different angle details and data for drawing the whole.

某一场景分类的物体列表集合可以表示为：The object list collection of a certain scene classification can be expressed as:

其中，i表示该场景分类(比如某某宫建筑场景大类下的“外墙”场景小类，或是内部环境场景大类下的“餐厅”场景小类)中的第i张图片，n为该场景分类下图片的总数量。Among them, i represents the i-th picture in the scene category (such as the "exterior wall" scene subcategory under the broad category of XX palace architectural scene, or the "restaurant" scene subcategory under the major category of interior environment scenes), n The total number of pictures in this scene category.

步骤113：确定所述同一场景分类下的每张所述实景场景图片中的物体的尺寸数据；所述尺寸数据可以包括长、宽、高。Step 113: Determine the size data of the objects in each real scene picture under the same scene classification; the size data may include length, width, and height.

假设物体列表集合ObjList＝{a,b,c,d,e,f}，图片ImgA只有{b,e,f}三个物体，则针对图片ImgA只求出该三个物体{b,e,f}的尺寸(长、宽、高)即可。Assume that the object list set ObjList={a, b, c, d, e, f}, and the picture ImgA has only three objects {b, e, f}, then only the three objects {b, e, The dimensions of f} (length, width, height) are enough.

步骤114：根据所述物体列表集合中的每个物体在不同的所述实景场景图片中的所有尺寸数据，确定所述物体列表集合中的每个物体的目标尺寸数据；Step 114: Determine the target size data of each object in the object list set based on all size data of each object in the object list set in different real scene pictures;

本发明实施例中，可以采用数据结构中的邻接表存储物体的尺寸数据。In the embodiment of the present invention, the adjacency list in the data structure may be used to store the size data of the object.

例如对于某场景分类的物体列表集合中的物体a，经过步骤113后，可以得到ka(1<＝ka<＝n)组尺寸数据(n为该场景分类下图片的总数量)；同理，对于该场景分类的物体列表集合中的物体b，可以得到kb(1<＝kb<＝n)组尺寸数据，ka与kb没有数量关系。For example, for object a in the object list set of a certain scene classification, after step 113, a ka (1<=ka<=n) group of size data can be obtained (n is the total number of pictures under the scene classification); similarly, For object b in the object list set of this scene classification, kb (1<=kb<=n) group size data can be obtained, and ka has no quantitative relationship with kb.

本发明实施例中，确定所述物体列表集合中的每个物体的目标尺寸数据可以是：所述物体列表集合中的目标物体的所有尺寸数据进行归一化，并归一化后的尺寸数据求众数和平均数，当众数值的个数占比超过预设阈值时，则直接使用众数作为物体归一化后的尺寸数据；否则使用众数和平均数的平均数作为物体归一化后的尺寸数据。In the embodiment of the present invention, determining the target size data of each object in the object list set may be: normalizing all the size data of the target objects in the object list set, and normalizing the size data Find the mode and average. When the proportion of the mode value exceeds the preset threshold, the mode is directly used as the normalized size data of the object; otherwise, the average of the mode and the average is used as the normalized size data of the object. The final size data.

举例来说，对物体a的ka组尺寸数据分别进行归一化(例如将宽统一到100)，完成比例尺的统一。因为虽然不同图片中同一物体的大小可能是不一样的，但物体自身的长宽高比例是相似的。对归一化后的ka组尺寸数据中的长(或高或宽)求众数和平均数，当众数值的个数占比超过50％时，则直接使用众数作为物体归一化后的长(或高或宽)；否则使用众数和平均数的平均数作为物体归一化后的长(或高或宽)。For example, the ka group size data of object a are normalized respectively (for example, the width is unified to 100) to complete the unification of the scale. Because although the size of the same object in different pictures may be different, the length, width, and height ratio of the object itself is similar. Calculate the mode and average of the length (or height or width) in the normalized ka group size data. When the number of mode values exceeds 50%, the mode is directly used as the normalized size of the object. Length (or height or width); otherwise use the average of the mode and mean as the normalized length (or height or width) of the object.

步骤115：根据所述多张多角度实景场景图片中的每个物体的目标尺寸数据，构建所述实景场景对应的元宇宙场景的三维空间模型。Step 115: Construct a three-dimensional space model of the metaverse scene corresponding to the real scene scene based on the target size data of each object in the multiple multi-angle real scene scene pictures.

因实景场景通常是现实中著名的场馆等建筑，可以获取到该实景场景的部分真实数据，通过部分真实数据与上述求得的物体的目标尺寸数据(预估尺寸数据)对比，可以对预估尺寸数据进行修正，得到物体的真实的尺寸数据。假设上述求得物体a的预估尺寸数据为(La,100,Ha)，物体b的预估尺寸数据为(Lb,100,Hb)，当仅存在物体a的真实尺寸数据时，物体b的真实尺寸数据也可以映射求得。Because the real scene is usually a famous stadium and other buildings in reality, some real data of the real scene can be obtained. By comparing the partial real data with the target size data (estimated size data) of the object obtained above, the estimate can be made. The size data is corrected to obtain the true size data of the object. Assume that the estimated size data of object a obtained above is (La,100,Ha), and the estimated size data of object b is (Lb,100,Hb). When only the real size data of object a exists, the estimated size data of object b Real size data can also be obtained by mapping.

本发明实施例中，可选地，根据所述多张多角度实景场景图片中的每个物体的目标尺寸数据，构建所述实景场景对应的元宇宙场景的三维空间模型，包括：In the embodiment of the present invention, optionally, based on the target size data of each object in the multiple multi-angle real scene pictures, a three-dimensional space model of the metaverse scene corresponding to the real scene scene is constructed, including:

步骤116：获取所述实景场景图片中的物体的外轮廓和内轮廓；Step 116: Obtain the outer contour and inner contour of the object in the real scene picture;

获取物体的外轮廓是指：对同一场景分类下(前文得到的细分类，即场景小类)的同一物体(存在多张图片包含这一物体)进行描边。可选地，将物体识别的识别框乘以预设倍数(如1.3倍)后进行裁剪(倍数可根据实际情况进行调整，但是必须大于1，以保留其覆盖到足够的区域)，得到物体的主图(图片近乎不包含其他物体)，然后经过神经网络类ReSeg模型进行图像分割，得到二维平面上物体的外轮廓。Obtaining the outer contour of an object means: tracing the same object (there are multiple pictures containing this object) under the same scene category (the subdivision obtained previously, that is, the scene subcategory). Optionally, multiply the recognition frame of the object recognition by a preset multiple (such as 1.3 times) before cropping (the multiple can be adjusted according to the actual situation, but must be greater than 1 to retain it to cover a sufficient area), and obtain the object's The main image (the image contains almost no other objects) is then segmented through the neural network ReSeg model to obtain the outer contour of the object on the two-dimensional plane.

获取物体的内轮廓的方法可以为：借助主动轮廓模型的思想和边缘检测算法，首先定义一条曲线，然后根据图片数据得到能量函数，通过最小化能量函数来引发曲线变化，利用相邻区域的像素值不连续的性质，采用一阶或者二阶导数来检测边缘点，使其向目标边缘逐渐逼近，最终找到目标边缘，进而得到二维平面上物体的内轮廓。The method of obtaining the inner contour of an object can be as follows: with the help of the idea of active contour model and edge detection algorithm, first define a curve, and then obtain the energy function based on the picture data, trigger the curve change by minimizing the energy function, and use the pixels in adjacent areas Due to the discontinuous nature of the value, the first-order or second-order derivative is used to detect the edge point, so that it gradually approaches the target edge, and finally finds the target edge, and then obtains the inner contour of the object on the two-dimensional plane.

步骤117：获取所述实景场景图片中的物体的深度信息；Step 117: Obtain the depth information of the objects in the real scene picture;

可选地，可以利用四维光场函数得到二维切片的极平面图像(Epipolar PlaneImage，EPI)，其包括一个空间尺寸维度与一个角度维度，场景中一个物点在EPI中显示为一条倾斜的直线，该直线的倾斜度与该物点到相机的距离成比例。因此，可以通过对应物点的斜率来获取物体的深度信息。Alternatively, a four-dimensional light field function can be used to obtain a two-dimensional sliced epipolar plane image (EPI), which includes a spatial size dimension and an angle dimension. An object point in the scene is displayed as an inclined straight line in the EPI , the slope of the straight line is proportional to the distance from the object point to the camera. Therefore, the depth information of an object can be obtained through the slope of the corresponding object point.

请参考图4和图5，四维光场函数的原理如下：一条光线穿过两个平行平面上的两点，坐标分别为(u,v)和(s,t)，两平行平面之间的间距为F，由两个点的位置坐标即可获取光线的二维角度信息，因此可用四维光场函数LF(s,t,u,v)表示。由透镜面和传感器面定义的光场函数LF(s,t,u,v)与由透镜面和重聚焦面定义的光场函数LF′(s,t,u,v)之间的关系可通过几何变换获得，四维光场函数在空间中的传播相当于一个切变变换。根据四维光场函数，从u,v面射向s,t面上某一点的图像辐射照度可以由光线辐射度的积分公式表示，从而可以得到二维切片表述。Please refer to Figure 4 and Figure 5. The principle of the four-dimensional light field function is as follows: a ray passes through two points on two parallel planes, the coordinates are (u, v) and (s, t) respectively, and the distance between the two parallel planes The distance is F, and the two-dimensional angle information of the light can be obtained from the position coordinates of the two points, so it can be represented by the four-dimensional light field function LF (s, t, u, v). The relationship between the light field function LF(s,t,u,v) defined by the lens surface and the sensor surface and the light field function LF′(s,t,u,v) defined by the lens surface and the refocusing surface can be Obtained through geometric transformation, the propagation of the four-dimensional light field function in space is equivalent to a shear transformation. According to the four-dimensional light field function, the image radiance from the u, v plane to a certain point on the s, t plane can be expressed by the integral formula of the light radiance, so that a two-dimensional slice expression can be obtained.

步骤118：根据所述实景场景图片中的物体的目标尺寸数据、外轮廓、内轮廓和深度信息，构建所述实景场景对应的元宇宙场景的三维空间模型。Step 118: Construct a three-dimensional space model of the metaverse scene corresponding to the real scene scene based on the target size data, outer contour, inner contour and depth information of the object in the real scene scene picture.

本发明实施例中，可选地，将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件，包括：In the embodiment of the present invention, optionally, the original sound file is subjected to sound rendering processing in the three-dimensional space model to obtain a rendered audio file, including:

请参考图6，图6为通过用户上传的音频数据对三维空间模型进行自动化微调的用户界面的示意图。在该用户界面下，用户可以通过拖动声源(图6中的小太阳)和听者(图6中的笑脸)的图标定位到三维空间模型中。拖动过程中，右侧的(X,Y,Z)坐标实时更新。除了拖动外，用户还可以在右侧的(X,Y,Z)中输出具体的数值以确定声源和听者的在所述三维空间模型内空间坐标。所述声源的空间坐标与用户在所述实景场景播放所述原声文件时的位置对应，所述听者的空间坐标与用户在所述实景场景录制所述录制文件时的位置对应。此外，在拖动过程中，还可以判断用户设置的两点之间的距离是否超过最近距离的阈值。若没有超过，则提醒用户重新设置。将录制文件和原声文件通过浏览上传的方式提交，点击确认优化按钮，对三维空间模型进行校验。Please refer to Figure 6, which is a schematic diagram of a user interface for automatically fine-tuning a three-dimensional spatial model through audio data uploaded by the user. In this user interface, the user can position the sound source (the little sun in Figure 6) and the listener (the smiling face in Figure 6) into the three-dimensional space model by dragging the icons. During the dragging process, the (X, Y, Z) coordinates on the right side are updated in real time. In addition to dragging, the user can also output specific values in (X, Y, Z) on the right to determine the spatial coordinates of the sound source and the listener within the three-dimensional space model. The spatial coordinates of the sound source correspond to the position of the user when playing the original sound file in the real scene, and the spatial coordinates of the listener correspond to the position of the user when recording the recording file in the real scene. In addition, during the dragging process, it can also be judged whether the distance between two points set by the user exceeds the nearest distance threshold. If it does not exceed the value, the user will be prompted to reset it. Submit the recording file and original sound file by browsing and uploading, and click the Confirm Optimization button to verify the three-dimensional space model.

本发明实施例中，可选地，用户在所述实景场景播放所述原声文件时的位置和用户在所述实景场景录制所述录制文件时的位置为多组。In the embodiment of the present invention, optionally, the user's position when playing the original sound file in the real scene scene and the user's position when recording the recorded file in the real scene scene are multiple groups.

下面举例进行说明。首先，取场景中的多组双点位置作为录制文件的采集点，包含但不局限于以下几组双点位置：<边界，对边界>、<边界，中心点>。其次，在上述双点位置的其中一个点的位置上播放原声文件，在另一个点的位置上放置麦克风进行录音，得到录制文件。本发明实施例中，录制文件的长度可以根据需要设定，例如为30秒至1分钟之内的音频数据。对采集到的录制文件进行频谱分析并采样，记录为数据Table。在构建的三维空间模型内，确定原声文件对应的声源以及录制文件对应的听者在三维空间模型内的空间坐标，所述声源的空间坐标与用户在所述实景场景播放所述原声文件时的位置对应，所述听者的空间坐标与用户在所述实景场景录制所述录制文件时的位置对应，确定空间坐标后，对原声文件进行声音渲染处理。将渲染生成的数据同样进行频谱分析并采样，记录为数据Test。对比数据Table与数据Test的相似度，若两者相差很近，则判定三维空间模型构建成功且准确；若两者相差很大，则判定三维空间模型构建不准确。至此，可以有效保证三维空间模型构建的准确性。Examples are given below. First, multiple sets of double-point positions in the scene are taken as the collection points of the recording file, including but not limited to the following sets of double-point positions: <boundary, opposite boundary>, <boundary, center point>. Secondly, play the original sound file at one of the above-mentioned two-point positions, place a microphone at the other point for recording, and obtain the recording file. In this embodiment of the present invention, the length of the recording file can be set as needed, for example, audio data within 30 seconds to 1 minute. Perform spectrum analysis and sampling on the collected recording files, and record them as data tables. In the constructed three-dimensional space model, determine the sound source corresponding to the original sound file and the spatial coordinates of the listener corresponding to the recorded file in the three-dimensional space model. The spatial coordinates of the sound source are consistent with the user playing the original sound file in the real scene. The spatial coordinates of the listener correspond to the position of the user when recording the recording file in the real scene. After the spatial coordinates are determined, sound rendering processing is performed on the original sound file. The data generated by rendering are also subjected to spectrum analysis and sampling, and recorded as data Test. Compare the similarity between the data Table and the data Test. If the two are very close, it is judged that the three-dimensional space model is successfully constructed and accurate; if there is a large difference between the two, it is judged that the three-dimensional space model is not constructed accurately. At this point, the accuracy of three-dimensional spatial model construction can be effectively guaranteed.

本发明实施例中，在通过音频数据校验构建的三维空间模型是否准确的同时，也可以对分析上述录制文件和渲染后的音频文件的差异，根据分析结果自动调整三维空间模型。In the embodiment of the present invention, while verifying whether the three-dimensional space model constructed through audio data is accurate, the difference between the above-mentioned recorded file and the rendered audio file can also be analyzed, and the three-dimensional space model can be automatically adjusted based on the analysis results.

本发明实施例中，构建三维空间模型之后，用户也可以自己手动调整构建的三维空间模型，例如，提供一用户界面，在该用户界面上显示三维空间模型，用户选定三维空间模型内待调整的物品后，可以通过滑动滑块等操作进行长度的增加或减少。另外，也可以进行颜色、位置等参数的调整。In the embodiment of the present invention, after the three-dimensional space model is constructed, the user can also manually adjust the constructed three-dimensional space model. For example, a user interface is provided, the three-dimensional space model is displayed on the user interface, and the user selects the three-dimensional space model to be adjusted. After selecting the item, you can increase or decrease the length by sliding the slider and other operations. In addition, parameters such as color and position can also be adjusted.

请参考图7，本发明实施例提供一种元宇宙场景的构建系统70，包括：Referring to Figure 7, an embodiment of the present invention provides a metaverse scene construction system 70, which includes:

第一构建模块71，用于根据用户上传的多张多角度实景场景图片，构建所述实景场景对应的元宇宙场景的三维空间模型；The first building module 71 is used to construct a three-dimensional space model of the metaverse scene corresponding to the real-life scene based on multiple multi-angle real-life scene pictures uploaded by the user;

接收模块72，用于接收用户上传的原声文件和录制文件，所述录制文件为用户在所述实景场景播放所述原声文件时录制得到的文件；The receiving module 72 is used to receive the original sound file and the recorded file uploaded by the user. The recorded file is the file recorded when the user plays the original sound file in the real-life scene;

渲染模块73，用于将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件；The rendering module 73 is used to perform sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;

校验模块74，用于对所述录制文件和所述渲染后的音频文件进行频谱分析并比对，根据比对结果判断构建的所述三维空间模型是否准确。The verification module 74 is used to perform spectrum analysis and comparison on the recorded file and the rendered audio file, and determine whether the constructed three-dimensional space model is accurate based on the comparison results.

可选地，所述构建系统70还包括：Optionally, the build system 70 also includes:

第二构建模块，用于构建图片库，其中，所述构建图片库包括：The second building module is used to build a picture library, wherein said building a picture library includes:

获取模块，用于获取通过网络爬虫获取的图片所在的网页中的相关文本；The acquisition module is used to obtain the relevant text in the web page where the image obtained through the web crawler is located;

确定模块，用于对所述文本进行分析，根据分析结果确定所述通过网络爬虫获取的图片的场景分类。A determination module, configured to analyze the text and determine the scene classification of the pictures obtained through the web crawler according to the analysis results.

可选地，所述第一构建模块71，用于对所述多张多角度实景场景图片进行物体识别，得到每一张所述实景场景图片的物体列表；合并同一场景分类下的实景场景图片的物体列表，得到物体列表集合；确定所述同一场景分类下的每张所述实景场景图片中的物体的尺寸数据；根据所述物体列表集合中的每个物体在不同的所述实景场景图片中的所有尺寸数据，确定所述物体列表集合中的每个物体的目标尺寸数据；根据所述多张多角度实景场景图片中的每个物体的目标尺寸数据，构建所述实景场景对应的元宇宙场景的三维空间模型。Optionally, the first building module 71 is used to perform object recognition on the multiple multi-angle real scene pictures to obtain an object list for each of the real scene pictures; merge the real scene pictures under the same scene classification The object list is obtained to obtain an object list set; determine the size data of the objects in each of the real-scene scene pictures under the same scene classification; according to each object in the object list set, in different real-scene scene pictures All size data in the object list set is determined to determine the target size data of each object in the object list set; based on the target size data of each object in the multiple multi-angle real scene pictures, the element corresponding to the real scene scene is constructed. A three-dimensional space model of a cosmic scene.

可选地，所述第一构建模块71，用于获取所述实景场景图片中的物体的外轮廓和内轮廓；获取所述实景场景图片中的物体的深度信息；根据所述实景场景图片中的物体的目标尺寸数据、外轮廓、内轮廓和深度信息，构建所述实景场景对应的元宇宙场景的三维空间模型。Optionally, the first building module 71 is used to obtain the outer contour and inner contour of the object in the real scene picture; obtain the depth information of the object in the real scene picture; according to the real scene picture The target size data, outer contour, inner contour and depth information of the object are used to construct a three-dimensional space model of the metaverse scene corresponding to the real scene scene.

可选地，所述渲染模块73，用于确定所述原声文件对应的声源以及所述录制文件对应的听者在所述三维空间模型内空间坐标，所述声源的空间坐标与用户在所述实景场景播放所述原声文件时的位置对应，所述听者的空间坐标与用户在所述实景场景录制所述录制文件时的位置对应；根据所述原声文件对应的声源以及所述录制文件对应的听者在所述三维空间模型内空间坐标，将所述原声文件在所述三维空间模型内进行声音渲染处理，得到渲染后的音频文件。Optionally, the rendering module 73 is used to determine the spatial coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model. The spatial coordinates of the sound source are consistent with the location of the user. The real scene scene corresponds to the position when the original sound file is played, and the spatial coordinates of the listener correspond to the user's position when the real scene scene records the recording file; according to the sound source corresponding to the original sound file and the The listener's spatial coordinate corresponding to the recorded file is in the three-dimensional space model, and the original sound file is subjected to sound rendering processing in the three-dimensional space model to obtain a rendered audio file.

请参考图8，本发明实施例还提供一种服务器80，包括处理器81，存储器82，存储在存储器82上并可在所述处理器81上运行的计算机程序，该计算机程序被处理器81执行时实现上述元宇宙场景的构建方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Please refer to Figure 8. This embodiment of the present invention also provides a server 80, which includes a processor 81, a memory 82, and a computer program stored on the memory 82 and executable on the processor 81. The computer program is processed by the processor 81. During execution, each process of the above embodiment of the method for constructing a metaverse scene is realized, and the same technical effect can be achieved. To avoid repetition, details will not be described here.

本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质上存储计算机程序，所述计算机程序被处理器执行时实现上述元宇宙场景的构建方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，所述的计算机可读存储介质，如只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random AccessMemory，RAM)、磁碟或者光盘等。Embodiments of the present invention also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above embodiment of the method for constructing a metaverse scene is implemented, and can achieve the same technical effect, so to avoid repetition, we will not repeat them here. Wherein, the computer-readable storage medium is such as read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or the part that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings. However, the present invention is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of the present invention, many forms can be made without departing from the spirit of the present invention and the scope protected by the claims, all of which fall within the protection of the present invention.

Claims

1. The construction method of the meta-universe scene is characterized by comprising the following steps of:

constructing a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;

receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;

performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;

and carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.

2. The method of claim 1, wherein the live-action scene picture comprises at least one of: the method comprises the steps of shooting pictures by a user, uploading pictures from a local album by the user, and selecting pictures from pictures matched with the live-action scene in a picture library by the user, wherein the pictures in the picture library comprise at least one of the following: and the pictures are acquired by the web crawlers, and uploaded by the users.

3. The method as recited in claim 2, further comprising:

constructing a picture library, wherein the constructing the picture library comprises:

performing rough classification on target pictures acquired by a web crawler and/or uploaded by a user, wherein the rough classification comprises the following steps: labeling a scene major class for part of the target pictures in the target pictures; extracting the digital characteristics of the target pictures of the marked scene major categories and the target pictures to be classified by using a neural network model, calculating the distance between the target pictures to be classified and the target pictures of each type of marked scene major categories, and determining the scene major categories to which the target pictures to be classified belong according to the distance;

performing fine classification on the roughly classified target pictures, wherein the fine classification comprises the following steps: object recognition is carried out on a first target picture and a second target picture belonging to a target scene main class, an object list is obtained, and regularly-shaped objects in the first target picture and the second target picture are extracted; if the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to a first preset proportion, determining that the first target picture and the second target picture belong to the same scene subclass; if the intersection is empty, determining that the first target picture and the second target picture do not belong to the same scene subclass; and if the ratio of the intersection set to the union set is smaller than the first preset proportion, comparing the degrees of comparison of the regularly-shaped objects in the intersection set, and if the similarity of the regularly-shaped objects in the intersection set is larger than or equal to a second preset proportion, determining that the first target picture and the second target picture do not belong to the same scene subclass.

4. A method according to claim 3, further comprising:

acquiring a related text in a webpage where a picture acquired by a web crawler is located;

and analyzing the text, and determining scene classification of the pictures acquired by the web crawlers according to an analysis result.

5. The method of claim 1, wherein constructing a three-dimensional spatial model of a metauniverse scene corresponding to the live-action scene from a plurality of multi-angle live-action scene pictures uploaded by a user comprises:

performing object recognition on the multiple multi-angle live-action scene pictures to obtain an object list of each live-action scene picture;

merging object lists of the live-action scene pictures under the same scene classification to obtain an object list set;

determining size data of objects in each of the live-action scene pictures under the same scene classification;

determining target size data of each object in the object list set according to all size data of each object in the object list set in different live-action scene pictures;

and constructing a three-dimensional space model of the meta-universe scene corresponding to the real scene according to the target size data of each object in the multi-angle real scene pictures.

6. The method of claim 5, wherein constructing a three-dimensional spatial model of a metauniverse scene corresponding to the live-action scene from target size data of each object in the plurality of multi-angle live-action scene pictures comprises:

acquiring the outer outline and the inner outline of an object in the live-action scene picture;

acquiring depth information of an object in the live-action scene picture;

and constructing a three-dimensional space model of the metauniverse scene corresponding to the live-action scene according to the target size data, the outer contour, the inner contour and the depth information of the object in the live-action scene picture.

7. The method of claim 1, wherein performing a sound rendering process on the acoustic file in the three-dimensional space model to obtain a rendered audio file, comprising:

determining the space coordinates of a sound source corresponding to the original sound file and a listener corresponding to the recorded file in the three-dimensional space model, wherein the space coordinates of the sound source correspond to the position of a user when the original sound file is played in the live-action scene, and the space coordinates of the listener correspond to the position of the user when the recorded file is recorded in the live-action scene;

and carrying out sound rendering processing on the original sound file in the three-dimensional space model according to the space coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model to obtain a rendered audio file.

8. A system for building a meta-cosmic scene, comprising:

the first construction module is used for constructing a three-dimensional space model of a meta-universe scene corresponding to the live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;

the receiving module is used for receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;

the rendering module is used for performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;

and the verification module is used for carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.

9. A server, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method of constructing a metauniverse scene as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of constructing a metauniverse scene according to any one of claims 1 to 7.