CN117635830A - Construction method, system, server and storage medium of meta-universe scene - Google Patents
Construction method, system, server and storage medium of meta-universe scene Download PDFInfo
- Publication number
- CN117635830A CN117635830A CN202311595119.3A CN202311595119A CN117635830A CN 117635830 A CN117635830 A CN 117635830A CN 202311595119 A CN202311595119 A CN 202311595119A CN 117635830 A CN117635830 A CN 117635830A
- Authority
- CN
- China
- Prior art keywords
- scene
- pictures
- live
- file
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000009877 rendering Methods 0.000 claims abstract description 20
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000010183 spectrum analysis Methods 0.000 claims abstract description 9
- 238000012795 verification Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 4
- 230000002829 reductive effect Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000013524 data verification Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention provides a construction method, a system, a server and a storage medium of a meta-universe scene, wherein the method comprises the following steps: according to a plurality of multi-angle real scene pictures uploaded by a user, constructing a three-dimensional space model of a meta-universe scene corresponding to the real scene; receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene; performing sound rendering processing on the original sound file in the three-dimensional space model; and carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result. According to the invention, on the premise of reducing reality, the dependence on a UI designer is greatly reduced, and after the three-dimensional space model of the meta-universe scene is created, the sound space verification of the three-dimensional space model is carried out according to the audio file, so that the accuracy of the built three-dimensional space model is ensured.
Description
Technical Field
The embodiment of the invention relates to the technical field of three-dimensional modeling, in particular to a construction method, a construction system, a construction server and a construction storage medium of a meta-universe scene.
Background
In the prior art, when a meta-universe scene is created, an interface design (UI) designer is mainly relied on to provide materials and draw the materials, and the scene is mostly constructed in a speculative manner and lacks a correspondence with reality.
Disclosure of Invention
The embodiment of the invention provides a construction method, a construction system, a server and a storage medium of a meta-universe scene, which are used for solving the problems that most of the existing meta-universe scenes are constructed in a speculative manner and lack of correspondence with reality.
In order to solve the technical problems, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for constructing a meta-universe scene, including:
constructing a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;
performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;
and carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
Optionally, the live-action scene picture includes at least one of: the method comprises the steps of shooting pictures by a user, uploading pictures from a local album by the user, and selecting pictures from pictures matched with the live-action scene in a picture library by the user, wherein the pictures in the picture library comprise at least one of the following: and the pictures are acquired by the web crawlers, and uploaded by the users.
Optionally, the method further comprises:
constructing a picture library, wherein the constructing the picture library comprises:
performing rough classification on target pictures acquired by a web crawler and/or uploaded by a user, wherein the rough classification comprises the following steps: labeling a scene major class for part of the target pictures in the target pictures; extracting the digital characteristics of the target pictures of the marked scene major categories and the target pictures to be classified by using a neural network model, calculating the distance between the target pictures to be classified and the target pictures of each type of marked scene major categories, and determining the scene major categories to which the target pictures to be classified belong according to the distance;
performing fine classification on the roughly classified target pictures, wherein the fine classification comprises the following steps: object recognition is carried out on a first target picture and a second target picture belonging to a target scene main class, an object list is obtained, and regularly-shaped objects in the first target picture and the second target picture are extracted; if the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to a first preset proportion, determining that the first target picture and the second target picture belong to the same scene subclass; if the intersection is empty, determining that the first target picture and the second target picture do not belong to the same scene subclass; and if the ratio of the intersection set to the union set is smaller than the first preset proportion, comparing the degrees of comparison of the regularly-shaped objects in the intersection set, and if the similarity of the regularly-shaped objects in the intersection set is larger than or equal to a second preset proportion, determining that the first target picture and the second target picture do not belong to the same scene subclass.
Optionally, the method further comprises:
acquiring a related text in a webpage where a picture acquired by a web crawler is located;
and analyzing the text, and determining scene classification of the pictures acquired by the web crawlers according to an analysis result.
Optionally, the constructing the three-dimensional space model of the meta-universe scene corresponding to the live-action scene according to the multiple multi-angle live-action scene pictures uploaded by the user includes:
performing object recognition on the multiple multi-angle live-action scene pictures to obtain an object list of each live-action scene picture;
merging object lists of the live-action scene pictures under the same scene classification to obtain an object list set;
determining size data of objects in each of the live-action scene pictures under the same scene classification;
determining target size data of each object in the object list set according to all size data of each object in the object list set in different live-action scene pictures;
and constructing a three-dimensional space model of the meta-universe scene corresponding to the real scene according to the target size data of each object in the multi-angle real scene pictures.
Optionally, constructing a three-dimensional space model of a meta-universe scene corresponding to the live-action scene according to target size data of each object in the multiple multi-angle live-action scene pictures, including:
acquiring the outer outline and the inner outline of an object in the live-action scene picture;
acquiring depth information of an object in the live-action scene picture;
and constructing a three-dimensional space model of the metauniverse scene corresponding to the live-action scene according to the target size data, the outer contour, the inner contour and the depth information of the object in the live-action scene picture.
Optionally, performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file, including:
determining the space coordinates of a sound source corresponding to the original sound file and a listener corresponding to the recorded file in the three-dimensional space model, wherein the space coordinates of the sound source correspond to the position of a user when the original sound file is played in the live-action scene, and the space coordinates of the listener correspond to the position of the user when the recorded file is recorded in the live-action scene;
and carrying out sound rendering processing on the original sound file in the three-dimensional space model according to the space coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model to obtain a rendered audio file.
In a second aspect, an embodiment of the present invention provides a system for constructing a meta-cosmic scene, including:
the first construction module is used for constructing a three-dimensional space model of a meta-universe scene corresponding to the live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
the receiving module is used for receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;
the rendering module is used for performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;
and the verification module is used for carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
In a third aspect, an embodiment of the present invention provides a server, including: a processor, a memory, and a program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method of constructing a metauniverse scene as described in the first aspect above.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for constructing a metauniverse scene as described in the first aspect above.
In the embodiment of the invention, the three-dimensional space model of the meta-universe scene corresponding to the live-action scene is constructed by adopting a plurality of multi-angle live-action scene pictures uploaded by a user, the dependence on a UI designer is greatly reduced on the premise of reducing reality, and after the three-dimensional space model of the meta-universe scene is created, the sound space verification of the three-dimensional space model is carried out according to the audio file, so that the accuracy of the constructed three-dimensional space model is ensured.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic flow chart of a method for constructing a meta-universe according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a user interface for a user to upload live-action scene pictures in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a user interface for saving a constructed three-dimensional space model in accordance with an embodiment of the present invention;
FIGS. 4 and 5 are schematic diagrams of four-dimensional light field functions according to embodiments of the present invention;
FIG. 6 is a schematic diagram of a user interface for automatically fine-tuning a three-dimensional spatial model via audio data uploaded by a user in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of a construction system of a meta-universe according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of the present invention provides a method for constructing a meta-space scene, including:
step 11: constructing a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
in the embodiment of the invention, the multiple multi-angle live-action scene pictures are pictures in the same live-action scene space.
In an embodiment of the present invention, the angle may include at least one of: whole surface, left side surface, right side surface, etc.
Step 12: receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;
step 13: performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;
step 14: and carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
In the embodiment of the invention, the three-dimensional space model of the meta-universe scene corresponding to the live-action scene is constructed by adopting a plurality of multi-angle live-action scene pictures uploaded by a user, the dependence on a UI designer is greatly reduced on the premise of reducing reality, and after the three-dimensional space model of the meta-universe scene is created, the sound space verification of the three-dimensional space model is carried out according to the audio file, so that the accuracy of the constructed three-dimensional space model is ensured.
In the embodiment of the present invention, as shown in fig. 2, a user interface may be provided for a user to upload a live-action scene picture. The user may perform the following operations at the user interface:
1. the user may enter a scene name in a text entry box, such as a palace or gate, or other custom scene name, such as my ideal castle, etc. Or, more specifically, some uterine study, etc.
2. And uploading the live-action scene picture by the user.
In an embodiment of the present invention, the live-action scene picture includes at least one of the following: the method comprises the steps of shooting pictures by a user, uploading pictures from a local album by the user, and selecting pictures from pictures matched with the live-action scene in a picture library by the user.
That is, the user can upload the live-action scene picture by taking a photograph or browsing a local album. If the user selects to use the picture library, the system searches the picture library for pictures matched with the scene names according to the scene names input by the user in the first step, and optionally, each picture in the picture library is provided with a scene classification label which is used for indicating a real scene to which the picture belongs. If a picture matched with the scene name input by the user is searched in the picture library, the picture can be displayed in a browsing area on the right side of the user interface shown in fig. 2 for the user to select. The user can select the picture in the browsing area by clicking or dragging. In the browsing area, the user can also browse more pictures by sliding or the like. If the scene name of the user is a custom scene name such as "my ideal castle", and there is a possibility that there is no matching picture in the gallery, a prompt, for example, prompt "no matching picture found" may be given in the browsing area.
In the embodiment of the present invention, the angle of the live-action scene may include at least one of the following: whole surface, left side surface, right side surface, etc. For modeling accuracy, in the embodiment of the present invention, the number of pictures of each angle may be defined, for example, there are at least two pictures of each angle. In addition to photographs of a prescribed angle, the user may upload photographs of other angles.
3. And the user clicks a confirmation submitting button to upload the live-action scene picture to the server.
Optionally, before uploading to the server, simple data verification may also be performed, e.g., data verification includes at least one of: whether the number of pictures meets the standard, whether the scene name is input, and the like.
The method for constructing the picture library according to the embodiment of the invention is described below.
In an embodiment of the present invention, the pictures in the picture library include at least one of the following: and the pictures are acquired by the web crawlers, and uploaded by the users.
When the pictures are acquired through the web crawlers, the scenes can be set first, the web picture crawlers are acquired according to the scenes, the acquired pictures are downloaded into a local specific resource folder, and scene classification labels of the pictures are set.
In the embodiment of the present invention, please refer to fig. 3, a user may be prompted on a user interface for storing the constructed three-dimensional space model, whether to share the picture authorization into the picture library. If the user does not click the authorized use, after the user clicks the completion button, the picture uploaded by the user on the user interface shown in fig. 2 is regarded as temporary data to be deleted in the background; if the user clicks the authorized use, the picture is uploaded to a server, and then the picture is added into an own picture library after subsequent processing (such as setting a scene classification label).
In the embodiment of the invention, aiming at the pictures acquired by the web crawlers, the pictures are also required to be subjected to scene classification and the scene classification labels are set, and of course, for the sake of accuracy, the pictures uploaded by the users can also be subjected to scene classification and the scene classification labels are set.
In the embodiment of the invention, the construction of the picture library comprises the following steps:
1. performing rough classification on target pictures acquired by a web crawler and/or uploaded by a user, wherein the rough classification comprises the following steps:
1.1, labeling a scene major class for part of target pictures in the target pictures; optionally, when manually labeling the scene major categories for the partial target pictures, it is required to ensure that the pictures corresponding to each scene major category are not less than a preset number, for example, not less than 5, where the scene major categories may be, for example: five kinds of scene major categories are that a certain palace is outside, a certain palace is built, a certain palace is inside, a certain palace is other, and other error pictures are included.
1.2, extracting the digital characteristics of a target picture of a marked scene major class and a target picture to be classified by using a neural network model, calculating the distance between the target picture to be classified and the target picture of each marked scene major class, and determining the scene major class to which the target picture to be classified belongs according to the distance;
in the embodiment of the present invention, the distance may be a euclidean distance, and the calculation formula of the euclidean distance d (x, μ) is as follows:
wherein x is the digital characteristic of the target picture to be classified, u i The average value of the digital characteristics of the i-th type of pictures (the average value is taken because only one picture is not arranged below each type) is artificially marked, and n is the number of the large types of the artificially marked scenes.
And calculating that the digital characteristic of the target picture to be classified is closest to the characteristic distance average value of the manually marked scene major class of the type, and judging the target picture to be classified as belonging to the scene major class of the type.
Taking the five types of scenes as an example, after classification is completed, the picture resources under the classification of other error pictures can be deleted, and the remaining four types of pictures enter a subsequent processing process.
2. And carrying out fine classification on the roughly classified target pictures.
The fine classification includes:
2.1, carrying out object recognition on a first target picture and a second target picture belonging to a target scene major class to obtain an object list, and extracting regular-shaped objects in the first target picture and the second target picture;
optionally, the regular shape comprises at least one of: rectangular, circular, triangular.
2.2 if the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to a first preset ratio (e.g. 40%), determining that the first target picture and the second target picture belong to the same scene subclass;
if the intersection is empty, determining that the first target picture and the second target picture do not belong to the same scene subclass;
and if the ratio of the intersection set to the union set is smaller than the first preset proportion, comparing the degrees of comparison of the regularly-shaped objects in the intersection set, and if the similarity of the regularly-shaped objects in the intersection set is larger than or equal to a second preset proportion (such as 60%), determining that the first target picture and the second target picture do not belong to the same scene subclass.
Taking a scene major class of a certain internal environment as an example, the scene minor class obtained after the fine classification can include: restaurants, study rooms, bedrooms, kitchens, toilets, and the like.
In the embodiment of the present invention, optionally, the method for creating a meta-universe scene further includes: acquiring a related text in a webpage where a picture acquired by a web crawler is located; and analyzing the text (such as word segmentation processing), and determining scene classification of the pictures acquired by the web crawlers according to analysis results. The reverse tracing mode is suitable for the situation that the scene classification of the picture cannot be determined through fine classification. For example, assuming that the image ImgA is crawled from the website h1, text in text content having tags such as < p >/< div > in front and back of the < image > (ImgA) tag can be extracted, and recorded as imga_h1_t0, and word segmentation processing is performed on the text to obtain nouns therein. Judging whether scene classification appears in the noun list, if so, finding out the label of the picture by analyzing the grammar structure of the sentence. If not, the noun list is directly analyzed, and scene classification labels of the pictures are given by combining the context. If the scene classification tags of the pictures ImgA and ImgB are the same, the pictures can be correctly determined as one type even if there is no overlapping item list.
In the embodiment of the present invention, optionally, constructing a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user includes:
step 111: performing object recognition on the multiple multi-angle live-action scene pictures to obtain an object list of each live-action scene picture;
step 112: merging object lists of the live-action scene pictures under the same scene classification to obtain an object list set;
that is, the pictures belonging to the same scene classification are regarded as a whole, each picture is at a different angle of the whole, and different angle details and data are provided for drawing the whole.
The object list set for a certain scene classification can be expressed as:
where i represents the i-th picture in the scene classification (e.g., an "exterior wall" scene subclass under a certain building scene subclass, or a "restaurant" scene subclass under an internal environment scene subclass), and n is the total number of pictures under the scene classification.
Step 113: determining size data of objects in each of the live-action scene pictures under the same scene classification; the size data may include length, width, and height.
Assuming that the object list set objlist= { a, b, c, d, e, f }, the picture ImgA has only three objects { b, e, f }, the dimensions (length, width, height) of the three objects { b, e, f } are only determined for the picture ImgA.
Step 114: determining target size data of each object in the object list set according to all size data of each object in the object list set in different live-action scene pictures;
in the embodiment of the invention, the adjacency list in the data structure can be used for storing the size data of the object.
For example, for an object a in an object list set of a certain scene classification, after step 113, ka (1 < =ka < =n) group size data (n is the total number of pictures under the scene classification) can be obtained; similarly, for the object b in the object list set of the scene classification, kb (1 < =kb < =n) group size data can be obtained, and ka has no quantitative relation with kb.
In the embodiment of the present invention, the determining the target size data of each object in the object list set may be: normalizing all size data of the target object in the object list set, solving the mode number and the average number of the normalized size data, and directly using the mode number as the size data after the object normalization when the number ratio of the mode number exceeds a preset threshold value; otherwise, the average of the mode and the average is used as the size data of the normalized object.
For example, the ka set size data of the object a are normalized (for example, the width is unified to 100) respectively, and the unification of the scales is completed. Because the aspect ratio of the objects themselves are similar, although the same object may be of different sizes in different pictures. Solving the mode number and the average number of the length (or the height or the width) in the normalized ka group size data, and directly using the mode number as the length (or the height or the width) of the normalized object when the number of the mode number exceeds 50%; otherwise, the average of the mode and average is used as the length (or height or width) of the object normalized.
Step 115: and constructing a three-dimensional space model of the meta-universe scene corresponding to the real scene according to the target size data of each object in the multi-angle real scene pictures.
Because the real scene is usually a building such as a famous venue in reality, part of real data of the real scene can be obtained, and the estimated size data can be corrected by comparing the part of real data with the obtained target size data (estimated size data) of the object, so that the real size data of the object can be obtained. Assuming that the estimated size data of the object a is (La, 100, ha) and the estimated size data of the object b is (Lb, 100, hb), the real size data of the object b may be mapped when only the real size data of the object a exists.
In the embodiment of the present invention, optionally, according to the target size data of each object in the multiple multi-angle live-action scene pictures, a three-dimensional space model of a meta-space scene corresponding to the live-action scene is constructed, including:
step 116: acquiring the outer outline and the inner outline of an object in the live-action scene picture;
acquiring the outer contour of an object refers to: the same object (there are multiple pictures containing this object) under the same scene classification (the fine classification obtained earlier, i.e. scene subclass) is traced. Optionally, the recognition frame for object recognition is multiplied by a preset multiple (for example, 1.3 times), then clipping is performed (the multiple can be adjusted according to practical situations, but the multiple must be greater than 1 to keep the coverage of the recognition frame to a sufficient area), a main image of the object (the image almost does not contain other objects) is obtained, and then image segmentation is performed through a neural network type unseg model, so as to obtain the outer contour of the object on the two-dimensional plane.
The method for acquiring the inner outline of the object can be as follows: by means of the thought of an active contour model and an edge detection algorithm, a curve is firstly defined, then an energy function is obtained according to picture data, curve change is caused by minimizing the energy function, edge points are detected by utilizing the discontinuous pixel value property of adjacent areas and adopting first-order or second-order derivatives, the edge points are gradually approximated to a target edge, the target edge is finally found, and then the inner contour of an object on a two-dimensional plane is obtained.
Step 117: acquiring depth information of an object in the live-action scene picture;
alternatively, a polar plane image (Epipolar Plane Image, EPI) of a two-dimensional slice may be obtained using a four-dimensional light field function, which includes a spatial dimension and an angular dimension, with an object point in the scene being shown in EPI as an inclined straight line, the inclination of which is proportional to the object point to camera distance. Accordingly, depth information of the object can be acquired by the slope of the corresponding object point.
Referring to fig. 4 and 5, the principle of the four-dimensional light field function is as follows: a light ray passes through two points on two parallel planes, the coordinates are (u, v) and (s, t), the distance between the two parallel planes is F, and the two-dimensional angle information of the light ray can be obtained by the position coordinates of the two points, so that the light ray can be represented by a four-dimensional light field function LF (s, t, u, v). The relationship between the light field function LF (s, t, u, v) defined by the lens surface and the sensor surface and the light field function LF' (s, t, u, v) defined by the lens surface and the refocusing surface can be obtained by a geometrical transformation, the propagation of the four-dimensional light field function in space corresponding to a shear transformation. According to the four-dimensional light field function, the illumination of the image radiation from the u-plane and the v-plane to a certain point on the s-plane and the t-plane can be expressed by an integral formula of the ray radiation degree, so that a two-dimensional slice expression can be obtained.
Step 118: and constructing a three-dimensional space model of the metauniverse scene corresponding to the live-action scene according to the target size data, the outer contour, the inner contour and the depth information of the object in the live-action scene picture.
In the embodiment of the present invention, optionally, performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file, including:
determining the space coordinates of a sound source corresponding to the original sound file and a listener corresponding to the recorded file in the three-dimensional space model, wherein the space coordinates of the sound source correspond to the position of a user when the original sound file is played in the live-action scene, and the space coordinates of the listener correspond to the position of the user when the recorded file is recorded in the live-action scene;
and carrying out sound rendering processing on the original sound file in the three-dimensional space model according to the space coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model to obtain a rendered audio file.
Referring to fig. 6, fig. 6 is a schematic diagram of a user interface for automatically fine-tuning a three-dimensional space model by using audio data uploaded by a user. Under this user interface, the user can locate into the three-dimensional space model by dragging icons of the sound source (small sun in fig. 6) and the listener (smiling face in fig. 6). During the dragging process, the (X, Y, Z) coordinates on the right are updated in real time. In addition to dragging, the user may also output specific values in the right (X, Y, Z) to determine the spatial coordinates of the sound source and listener within the three-dimensional spatial model. The space coordinates of the sound source correspond to the position of the user when the live-action scene plays the original sound file, and the space coordinates of the listener correspond to the position of the user when the live-action scene records the recorded file. In addition, in the dragging process, it can also be determined whether the distance between two points set by the user exceeds the threshold of the nearest distance. If not, reminding the user of resetting. Submitting the recorded file and the original sound file in a browsing and uploading mode, clicking a confirmation optimization button, and checking the three-dimensional space model.
In the embodiment of the present invention, optionally, the positions of the user when the live-action scene plays the original sound file and the positions of the user when the live-action scene records the recorded file are multiple groups.
The following examples are illustrative. Firstly, a plurality of groups of double-point positions in a scene are taken as acquisition points of a recorded file, including but not limited to the following groups of double-point positions: < boundary, pair boundary >, < boundary, center point >. And secondly, playing the original sound file at one of the two-point positions, and placing a microphone at the other one of the two-point positions for recording to obtain a recorded file. In the embodiment of the present invention, the length of the recording file may be set according to the requirement, for example, the length of the audio data is within 30 seconds to 1 minute. And carrying out spectrum analysis and sampling on the acquired recorded file, and recording the recorded file as a data Table. And determining a sound source corresponding to the original sound file and a space coordinate of a listener corresponding to the recorded file in the three-dimensional space model in the constructed three-dimensional space model, wherein the space coordinate of the sound source corresponds to the position of the user when the original sound file is played in the live-action scene, the space coordinate of the listener corresponds to the position of the user when the recorded file is recorded in the live-action scene, and performing sound rendering processing on the original sound file after the space coordinate is determined. And carrying out spectrum analysis and sampling on the data generated by rendering, and recording the data into a data Test. Comparing the similarity of the data Table and the data Test, and if the similarity is very close to the similarity, judging that the three-dimensional space model is successfully and accurately constructed; if the two are very different, the three-dimensional space model is not accurately constructed. Therefore, the accuracy of the three-dimensional space model construction can be effectively ensured.
In the embodiment of the invention, the three-dimensional space model constructed by checking the audio data can be automatically adjusted according to the analysis result by analyzing the difference between the recorded file and the rendered audio file.
In the embodiment of the invention, after the three-dimensional space model is constructed, a user can manually adjust the constructed three-dimensional space model by himself, for example, a user interface is provided, the three-dimensional space model is displayed on the user interface, and after the user selects an object to be adjusted in the three-dimensional space model, the length can be increased or decreased through operations such as sliding a sliding block. In addition, parameters such as color and position may be adjusted.
Referring to fig. 7, an embodiment of the present invention provides a construction system 70 for a meta-space scene, including:
the first construction module 71 is configured to construct a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
the receiving module 72 is configured to receive an acoustic file and a recording file uploaded by a user, where the recording file is a file recorded by the user when the acoustic file is played in the live-action scene;
a rendering module 73, configured to perform sound rendering processing on the acoustic file in the three-dimensional space model, so as to obtain a rendered audio file;
and the verification module 74 is used for carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
Optionally, the live-action scene picture includes at least one of: the method comprises the steps of shooting pictures by a user, uploading pictures from a local album by the user, and selecting pictures from pictures matched with the live-action scene in a picture library by the user, wherein the pictures in the picture library comprise at least one of the following: and the pictures are acquired by the web crawlers, and uploaded by the users.
Optionally, the building system 70 further includes:
the second construction module is configured to construct a picture library, where the constructing the picture library includes:
performing rough classification on target pictures acquired by a web crawler and/or uploaded by a user, wherein the rough classification comprises the following steps: labeling a scene major class for part of the target pictures in the target pictures; extracting the digital characteristics of the target pictures of the marked scene major categories and the target pictures to be classified by using a neural network model, calculating the distance between the target pictures to be classified and the target pictures of each type of marked scene major categories, and determining the scene major categories to which the target pictures to be classified belong according to the distance;
performing fine classification on the roughly classified target pictures, wherein the fine classification comprises the following steps: object recognition is carried out on a first target picture and a second target picture belonging to a target scene main class, an object list is obtained, and regularly-shaped objects in the first target picture and the second target picture are extracted; if the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to a first preset proportion, determining that the first target picture and the second target picture belong to the same scene subclass; if the intersection is empty, determining that the first target picture and the second target picture do not belong to the same scene subclass; and if the ratio of the intersection set to the union set is smaller than the first preset proportion, comparing the degrees of comparison of the regularly-shaped objects in the intersection set, and if the similarity of the regularly-shaped objects in the intersection set is larger than or equal to a second preset proportion, determining that the first target picture and the second target picture do not belong to the same scene subclass.
Optionally, the building system 70 further includes:
the acquisition module is used for acquiring related texts in the webpage where the pictures acquired by the web crawler are located;
and the determining module is used for analyzing the text and determining scene classification of the pictures acquired by the web crawlers according to the analysis result.
Optionally, the first building module 71 is configured to perform object recognition on the multiple multi-angle live-action scene pictures to obtain an object list of each of the live-action scene pictures; merging object lists of the live-action scene pictures under the same scene classification to obtain an object list set; determining size data of objects in each of the live-action scene pictures under the same scene classification; determining target size data of each object in the object list set according to all size data of each object in the object list set in different live-action scene pictures; and constructing a three-dimensional space model of the meta-universe scene corresponding to the real scene according to the target size data of each object in the multi-angle real scene pictures.
Optionally, the first building module 71 is configured to obtain an outer contour and an inner contour of an object in the live-action scene picture; acquiring depth information of an object in the live-action scene picture; and constructing a three-dimensional space model of the metauniverse scene corresponding to the live-action scene according to the target size data, the outer contour, the inner contour and the depth information of the object in the live-action scene picture.
Optionally, the rendering module 73 is configured to determine a spatial coordinate of a sound source corresponding to the original sound file and a listener corresponding to the recording file in the three-dimensional space model, where the spatial coordinate of the sound source corresponds to a position of a user when the original sound file is played in the live-action scene, and the spatial coordinate of the listener corresponds to a position of the user when the recording file is recorded in the live-action scene; and carrying out sound rendering processing on the original sound file in the three-dimensional space model according to the space coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model to obtain a rendered audio file.
Referring to fig. 8, the embodiment of the present invention further provides a server 80, which includes a processor 81, a memory 82, and a computer program stored in the memory 82 and capable of running on the processor 81, where the computer program when executed by the processor 81 implements each process of the above-mentioned embodiment of the method for constructing a meta-space scene, and can achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above-mentioned meta-universe scene building method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.
Claims (10)
1. The construction method of the meta-universe scene is characterized by comprising the following steps of:
constructing a three-dimensional space model of a meta-universe scene corresponding to a live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;
performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;
and carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
2. The method of claim 1, wherein the live-action scene picture comprises at least one of: the method comprises the steps of shooting pictures by a user, uploading pictures from a local album by the user, and selecting pictures from pictures matched with the live-action scene in a picture library by the user, wherein the pictures in the picture library comprise at least one of the following: and the pictures are acquired by the web crawlers, and uploaded by the users.
3. The method as recited in claim 2, further comprising:
constructing a picture library, wherein the constructing the picture library comprises:
performing rough classification on target pictures acquired by a web crawler and/or uploaded by a user, wherein the rough classification comprises the following steps: labeling a scene major class for part of the target pictures in the target pictures; extracting the digital characteristics of the target pictures of the marked scene major categories and the target pictures to be classified by using a neural network model, calculating the distance between the target pictures to be classified and the target pictures of each type of marked scene major categories, and determining the scene major categories to which the target pictures to be classified belong according to the distance;
performing fine classification on the roughly classified target pictures, wherein the fine classification comprises the following steps: object recognition is carried out on a first target picture and a second target picture belonging to a target scene main class, an object list is obtained, and regularly-shaped objects in the first target picture and the second target picture are extracted; if the ratio of the number of objects in the intersection of the object lists of the first target picture and the second target picture to the union of the object lists of the first target picture and the second target picture is greater than or equal to a first preset proportion, determining that the first target picture and the second target picture belong to the same scene subclass; if the intersection is empty, determining that the first target picture and the second target picture do not belong to the same scene subclass; and if the ratio of the intersection set to the union set is smaller than the first preset proportion, comparing the degrees of comparison of the regularly-shaped objects in the intersection set, and if the similarity of the regularly-shaped objects in the intersection set is larger than or equal to a second preset proportion, determining that the first target picture and the second target picture do not belong to the same scene subclass.
4. A method according to claim 3, further comprising:
acquiring a related text in a webpage where a picture acquired by a web crawler is located;
and analyzing the text, and determining scene classification of the pictures acquired by the web crawlers according to an analysis result.
5. The method of claim 1, wherein constructing a three-dimensional spatial model of a metauniverse scene corresponding to the live-action scene from a plurality of multi-angle live-action scene pictures uploaded by a user comprises:
performing object recognition on the multiple multi-angle live-action scene pictures to obtain an object list of each live-action scene picture;
merging object lists of the live-action scene pictures under the same scene classification to obtain an object list set;
determining size data of objects in each of the live-action scene pictures under the same scene classification;
determining target size data of each object in the object list set according to all size data of each object in the object list set in different live-action scene pictures;
and constructing a three-dimensional space model of the meta-universe scene corresponding to the real scene according to the target size data of each object in the multi-angle real scene pictures.
6. The method of claim 5, wherein constructing a three-dimensional spatial model of a metauniverse scene corresponding to the live-action scene from target size data of each object in the plurality of multi-angle live-action scene pictures comprises:
acquiring the outer outline and the inner outline of an object in the live-action scene picture;
acquiring depth information of an object in the live-action scene picture;
and constructing a three-dimensional space model of the metauniverse scene corresponding to the live-action scene according to the target size data, the outer contour, the inner contour and the depth information of the object in the live-action scene picture.
7. The method of claim 1, wherein performing a sound rendering process on the acoustic file in the three-dimensional space model to obtain a rendered audio file, comprising:
determining the space coordinates of a sound source corresponding to the original sound file and a listener corresponding to the recorded file in the three-dimensional space model, wherein the space coordinates of the sound source correspond to the position of a user when the original sound file is played in the live-action scene, and the space coordinates of the listener correspond to the position of the user when the recorded file is recorded in the live-action scene;
and carrying out sound rendering processing on the original sound file in the three-dimensional space model according to the space coordinates of the sound source corresponding to the original sound file and the listener corresponding to the recorded file in the three-dimensional space model to obtain a rendered audio file.
8. A system for building a meta-cosmic scene, comprising:
the first construction module is used for constructing a three-dimensional space model of a meta-universe scene corresponding to the live-action scene according to a plurality of multi-angle live-action scene pictures uploaded by a user;
the receiving module is used for receiving an original sound file and a recorded file uploaded by a user, wherein the recorded file is a file recorded by the user when the original sound file is played in the live-action scene;
the rendering module is used for performing sound rendering processing on the original sound file in the three-dimensional space model to obtain a rendered audio file;
and the verification module is used for carrying out spectrum analysis and comparison on the recorded file and the rendered audio file, and judging whether the constructed three-dimensional space model is accurate or not according to the comparison result.
9. A server, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method of constructing a metauniverse scene as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of constructing a metauniverse scene according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311595119.3A CN117635830A (en) | 2023-11-27 | 2023-11-27 | Construction method, system, server and storage medium of meta-universe scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311595119.3A CN117635830A (en) | 2023-11-27 | 2023-11-27 | Construction method, system, server and storage medium of meta-universe scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117635830A true CN117635830A (en) | 2024-03-01 |
Family
ID=90026402
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311595119.3A Pending CN117635830A (en) | 2023-11-27 | 2023-11-27 | Construction method, system, server and storage medium of meta-universe scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117635830A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118267718A (en) * | 2024-06-04 | 2024-07-02 | 四川物通科技有限公司 | Multi-scene operation management method and system based on meta universe |
-
2023
- 2023-11-27 CN CN202311595119.3A patent/CN117635830A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118267718A (en) * | 2024-06-04 | 2024-07-02 | 四川物通科技有限公司 | Multi-scene operation management method and system based on meta universe |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10007867B2 (en) | Systems and methods for identifying entities directly from imagery | |
CN107833213B (en) | Weak supervision object detection method based on false-true value self-adaptive method | |
CN109582880B (en) | Interest point information processing method, device, terminal and storage medium | |
US9129191B2 (en) | Semantic object selection | |
US9129192B2 (en) | Semantic object proposal generation and validation | |
KR102406150B1 (en) | Method for creating obstruction detection model using deep learning image recognition and apparatus thereof | |
CN111062871A (en) | Image processing method and device, computer equipment and readable storage medium | |
US11704357B2 (en) | Shape-based graphics search | |
US20190114780A1 (en) | Systems and methods for detection of significant and attractive components in digital images | |
US10417833B2 (en) | Automatic 3D camera alignment and object arrangment to match a 2D background image | |
CN117635830A (en) | Construction method, system, server and storage medium of meta-universe scene | |
Wu et al. | Image completion with multi-image based on entropy reduction | |
US20230281350A1 (en) | A Computer Implemented Method of Generating a Parametric Structural Design Model | |
Li et al. | A method based on an adaptive radius cylinder model for detecting pole-like objects in mobile laser scanning data | |
CN114359590A (en) | NFT image work infringement detection method and device and computer storage medium | |
Slade et al. | Automatic semantic and geometric enrichment of CityGML building models using HOG-based template matching | |
Xiao et al. | Coupling point cloud completion and surface connectivity relation inference for 3D modeling of indoor building environments | |
CN111783561A (en) | Picture examination result correction method, electronic equipment and related products | |
CN112015937B (en) | Picture geographic positioning method and system | |
CN116415020A (en) | Image retrieval method, device, electronic equipment and storage medium | |
CN112132845B (en) | Method, device, electronic equipment and readable medium for singulating three-dimensional model | |
CN113128604A (en) | Page element identification method and device, electronic equipment and storage medium | |
CN111062388B (en) | Advertisement character recognition method, system, medium and equipment based on deep learning | |
KR20220036772A (en) | Personal record integrated management service connecting to repository | |
US9230366B1 (en) | Identification of dynamic objects based on depth data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |