CN111583417A - Method and device for constructing indoor VR scene with combined constraint of image semantics and scene geometry, electronic equipment and medium - Google Patents

Method and device for constructing indoor VR scene with combined constraint of image semantics and scene geometry, electronic equipment and medium Download PDF

Info

Publication number
CN111583417A
CN111583417A CN202010399289.4A CN202010399289A CN111583417A CN 111583417 A CN111583417 A CN 111583417A CN 202010399289 A CN202010399289 A CN 202010399289A CN 111583417 A CN111583417 A CN 111583417A
Authority
CN
China
Prior art keywords
indoor
information
image
scene
indoor image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010399289.4A
Other languages
Chinese (zh)
Other versions
CN111583417B (en
Inventor
吴洪宇
于昊楠
陈小武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202010399289.4A priority Critical patent/CN111583417B/en
Publication of CN111583417A publication Critical patent/CN111583417A/en
Application granted granted Critical
Publication of CN111583417B publication Critical patent/CN111583417B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a method, a device, electronic equipment and a medium for constructing an indoor VR scene constrained by image semantics and scene geometry. One embodiment of the method comprises: obtaining an image and the height of a camera from the ground; preprocessing the image; inputting the preprocessed image into a depth residual error network trained in advance, and outputting the characteristic information of the image; detecting information of straight lines in the characteristic information; inputting the information of the straight line into a convolutional neural network, and outputting the information of the wall connection part in the image; determining indoor layout information; inputting the characteristic information into a pre-trained target detection network, and outputting the indoor object information in the image; obtaining the position information of the indoor object; and finishing the construction of an indoor and virtual reality interactive scene. According to the embodiment, the real image scene is accurately and quickly restored through the construction of the indoor VR interactive scene.

Description

Method and device for constructing indoor VR scene with combined constraint of image semantics and scene geometry, electronic equipment and medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a method, a device, electronic equipment and a medium for constructing an indoor VR scene constrained by image semantics and scene geometry.
Background
With the rapid development of Virtual Reality technology (VR), the implementation of interactivity is made easier. VR technology is increasingly used in our daily lives. For example, in interior decoration, VR technology can be designed to interact with customers in the interior decoration, thereby solving the problem of diversity of customer requirements for interior decoration. Because the virtual reality scene has higher requirements on details, the manual scene manufacturing not only consumes a large amount of manpower and material resources, but also has high cost, and is one of the reasons for restricting the development of the virtual reality technology. The existing solutions include methods of template scene generation, using various devices to collect real world three-dimensional information and converting the information into a three-dimensional model. These generation methods rely on a set of pre-designed scene templates, followed by setting parameters to generate three-dimensional scenes in bulk. However, these methods have great limitations and poor accuracy.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a medium for indoor VR scene construction with joint constraint of image semantics and scene geometry, so as to solve the technical problems mentioned in the above background.
In a first aspect, some embodiments of the present disclosure provide a method for indoor VR scene construction constrained by joint image semantics and scene geometry, the method including: obtaining an indoor image and the height of a camera from the ground when the indoor image is shot; preprocessing an indoor image; inputting the preprocessed indoor image into a depth residual error network trained in advance, and outputting the characteristic information of the indoor image; detecting information of a straight line in the characteristic information; inputting the information of the straight line into a convolutional neural network, and outputting the information of a wall connection part in the indoor image, wherein the wall connection part is a straight line connecting walls in the indoor image; obtaining indoor layout information based on the height of the camera from the ground and the information of the wall connection part; inputting the characteristic information into a pre-trained target detection network, and outputting indoor object information in the indoor image; obtaining position information of the indoor object based on the information of the indoor object and the information of the wall connection part; and finishing the construction of an indoor and virtual reality interactive scene according to the indoor layout information and the position information of the indoor object.
In a second aspect, some embodiments of the present disclosure provide an apparatus for indoor VR scene construction constrained by joint image semantics and scene geometry, the apparatus comprising: an acquisition unit configured to acquire an indoor image and a height of a camera from a ground when the indoor image is captured; a processing unit configured to pre-process an indoor image; a first input/output unit configured to input the preprocessed indoor image into a depth residual error network trained in advance, and output feature information of the indoor image; a detection unit configured to detect information of a straight line in the feature information; a second input/output unit configured to input information of the straight line into a convolutional neural network, and output information of a wall connection portion in the indoor image, wherein the wall connection portion is a straight line connecting between walls in the indoor image; a first determination unit configured to obtain indoor layout information based on a height of the camera from a ground and information of the wall connection part; a third input/output unit configured to input the feature information into a pre-trained target detection network and output indoor object information in the indoor image; a second determination unit configured to obtain position information of the indoor object based on the indoor object information and the information of the wall connection part; and the third determining unit is configured to complete the construction of the indoor and virtual reality interactive scenes according to the indoor layout information and the position information of the indoor objects.
In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first and second aspects.
In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, where the program when executed by a processor implements a method as in any of the first and second aspects.
One of the above-described various embodiments of the present disclosure has the following advantageous effects: first, an indoor image is obtained and the height of a camera from the ground when the indoor image is taken. Furthermore, the indoor images are preprocessed, the processed data enable the deep learning network to learn more easily, and the accuracy of the network is improved laterally. And inputting the preprocessed indoor image into a depth residual error network trained in advance, and extracting the characteristic information of the indoor image, thereby laying a foundation for the identification of the wall connection part and the object. Then, the information of the straight lines in the characteristic information is detected and stored for detecting all the straight lines which may be the wall connecting parts. This allows the use of a convolutional neural network to score all lines that may be part of a wall connection. And basically determining indoor layout information according to the height of the camera from the ground and the information of the wall connection part. And inputting the characteristic information into a pre-trained target detection network for determining information such as the position and the size of the indoor object in the indoor image. Then, based on the information of the indoor object and the information of the wall connection part, the position information of the indoor object is obtained. And finally, completing the construction of an indoor and virtual reality interactive scene according to the indoor layout information and the position information of the indoor object.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a flow diagram of some embodiments of a method for indoor VR scene construction according to the present disclosure.
Fig. 2 is a schematic structural diagram of some embodiments of an apparatus for indoor VR scene construction according to the present disclosure.
FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Referring to fig. 1, a flow 100 of some embodiments of a method for indoor VR scene construction is shown, in accordance with the present disclosure. The method may be performed by a server. The method for generating classification information comprises the following steps:
step 101, obtaining an indoor image and the height of a camera from the ground when the indoor image is shot.
In some embodiments, an executing subject (which may be a server) of the method for indoor VR scene construction may acquire an indoor image and a height of a camera from the ground when the indoor image is captured in various ways. For example, the execution body may acquire the indoor image and capture the height of the camera from the ground by a wired connection or a wireless connection. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Step 102, preprocessing the indoor image.
In some embodiments, the execution subject performs preprocessing on the indoor image obtained in step 101, where the preprocessing refers to processing performed before feature extraction, segmentation, and matching are performed on the input image. The main purposes of image preprocessing are to eliminate irrelevant information from the image, recover useful real information, enhance the detectability of relevant information and simplify the data to the maximum extent, thereby improving the reliability of feature extraction, image segmentation, matching and recognition. The pre-processing process typically has the steps of digitization, geometric transformation, normalization, smoothing, restoration, and enhancement.
In some optional implementations of some embodiments, the preprocessing the indoor image includes: and adjusting the indoor image to a preset resolution.
And 103, inputting the preprocessed indoor image into a depth residual error network trained in advance, and outputting the characteristic information of the indoor image.
In some embodiments, the indoor image after the main body preprocessing is input to a depth Residual Network (ResNet) trained in advance to extract feature information of the indoor image. Wherein a ResNet-50 neural network is used as the feature extraction neural network. The neural network inputs 224 × 3 (length × width × number of color channels), and outputs an image feature tensor of 1024 × 1 × 256. ResNet uses a network structure named as a Residual Block (Residual Block), and effectively reduces the number of parameters in the neural network by using a ReLU activation function between layers through 1 × 64 convolution, 3 × 64 convolution and 1 × 256 convolution of three layers.
A rectifying Linear Unit (ReLU), also called a modified Linear Unit, is a commonly used excitation function in neural networks. The ReLU processes the output of each layer of the neural network using the following formula:
f(x)=max{0,x}
and 104, detecting information of straight lines in the characteristic information.
In some embodiments, the executing entity may use the feature information of the indoor image obtained in step 103. And detecting the straight line in the input panoramic picture by using a straight line detection algorithm, and storing the position of the detected straight line for use as the characteristic of wall position judgment. Wherein, the above line detection algorithm may include but is not limited to at least one of the following: CannyLines algorithm, hough transform.
And 105, inputting the information of the straight line into a convolutional neural network, and outputting the information of the wall connection part in the indoor image.
In some embodiments, the execution subject may input the information of the straight line obtained in step 104 into a convolutional neural network, and output the information of the wall connection portion in the indoor image. In the stage of judging the position of the wall, a Convolutional Neural Network (CNN) is used as a discriminator to grade the positions of the straight lines extracted in the feature extraction, and the straight lines are judged to be the connecting parts between the walls through the grade. The neural network is iterated using the following loss function:
Figure BDA0002488829170000061
wherein f islossα and β are self-defined parameters, respectively represent the weight of the calculation results of the pixels of the two parts in the loss value, and need to be determined according to the training parameters of the actual model;
Figure BDA0002488829170000062
is the true decision result of the pixel sample i, PiAnd predicting the probability of belonging to the corner for the pixel sample i, wherein n is the total number of pixel points in the image. The neural network training is performed as a process of minimizing the loss function calculation.
In order to optimize the generated wall information to improve the reconstruction accuracy, the finally generated wall information is evaluated by using the following scoring formula:
Figure BDA0002488829170000063
wherein score (L) is a fraction, and C is a set of detected included angle coordinates (x, y) between walls; lc∈ C denotes lcIs an included angle pixel coordinate point to be detected in the set C; lfSet of all detected floor pixel coordinate points (x, y) |fIs a set LfA floor pixel coordinate point to be detected; pcorner、PfloorRespectively, the probability that the pixel belongs to the object for each detected wall or floor portion. Omegacorner、ωfloorFor the weighting parameters, it needs to be determined experimentally for different data sets, and two values can also be determined by Grid Search (Grid Search).
And 106, obtaining indoor layout information based on the height of the camera from the ground and the information of the wall connection part.
In some embodiments, the xy coordinates above and below the corner in the image to the xyz coordinates in the three-dimensional environment are first calculated by projection transformation according to the height of the camera from the ground. And then calculating the xyz coordinates of the floor and the roof once for each corner, and taking the average value of all the values as the position of the reconstructed roof and floor. And traversing all the identified possible wall positions and screening, and only keeping a series of walls which are vertical to each other among the walls as candidates. And voting the screened candidate wall bodies, and deleting the wall bodies with the length less than 0.16m and the included angle between the two ends of the wall body and the position of the camera less than 5 degrees. And aiming at the distortion phenomenon existing in the panoramic picture, image stretching processing is carried out, and reconstruction precision is improved. Finally, the indoor layout information is obtained.
And step 107, inputting the characteristic information into a pre-trained target detection network, and outputting the indoor object information in the indoor image.
In the indoor object identification stage, the R-FCN neural network is used for object identification and positioning. The input of the R-FCN neural network is the feature tensor output in step 103, and the network includes three parts in addition to the feature layer: a regional candidate network rpn (region pro-social network), a location sensitive prediction layer, and a RoI pooling layer.
The method comprises the steps of performing grid type cutting on an original image, and splitting a plurality of interest areas (RoI) which possibly contain a target to be identified. In order to detect targets in each area and solve the problem of translation invariance of neural network characteristics, and enable the neural network to keep the original accuracy rate when migrating to a new task, the R-FCN introduces position-sensitive pooling operation. The concrete mode is as follows: for each generated region candidate frame, splitting the generated region candidate frame into k x k position sensitive regions, wherein each region is about the size of
Figure BDA0002488829170000071
The convolutional layer finally outputs k for each object classification2And (4) the probability. Then, pooling the probability spectrum, wherein the pooling operation is calculated as follows:
Figure BDA0002488829170000072
wherein r isc(i, j, Θ) is the result of the pooling computation for category c at this RoI region (i, j) coordinate, where Θ is the learning parameter of the network. (x, y) is the pixel value in the (i, j) -th lattice point.
Figure BDA0002488829170000073
Scoring plots for c categories at the RoI region (i, j) coordinates, (x0, y0) is the central coordinate of the RoI.
Figure BDA0002488829170000081
Indicating the value of the score belonging to the c-th category after conversion of the (x, y) relative coordinates in the ROI into absolute coordinates
In training an R-FCN network, the performance of the neural network is evaluated using the following loss function:
Figure BDA0002488829170000083
Figure BDA0002488829170000082
wherein, tx,y,w,hIs the current bounding box (bounding box), x, y, w, h respectively represent the center coordinate and length, width of the box; s is at frame tx,y,w,hA target to be identified; c. C*For true classification of the RoI region of the training set, c*0 represents classification as background; c. C*If > 0, the expression in the square brackets has a real value of 1, otherwise, the expression is 0;
Figure BDA0002488829170000084
indicating that the object s to be recognized belongs to the class c*The probability of (d); l isregression(t,t*) Indicates that the regression loss, t, of the bounding box for the correct label (ground route) is calculated*Is the corresponding ground truth box;
Figure BDA0002488829170000085
cross entropy loss for classification; l isregressionTo detect the boundary region regression loss of the target, λ is the balance weight between different classes. If the training targets have the same weight, λ may be set to 1.
After the target detection is carried out on the image by using the R-FCN, the central coordinate P of each identified target is outputobjAnd length and width information of the object position.
And 108, obtaining the position information of the indoor object based on the information of the indoor object and the information of the wall connection part.
In some embodiments, the two-dimensional panoramic image is first cut according to the position coordinates of each wall, the position of each wall is identified by using the depth residual error network, and the position coordinates P of each wall connection in the two-dimensional panoramic image are detectedwallAnd extracting the detected center coordinates P of each target area frameobjCalculate PobjTo each PwallThe position of the object in the reconstructed three-dimensional scene can be obtained.
And step 109, completing the construction of an indoor and virtual reality interactive scene according to the indoor layout information and the position information of the indoor object.
In some embodiments, the construction of the virtual reality interaction scene is completed by the indoor layout information obtained in step 106 and the position information of the indoor object obtained in step 108.
With continuing reference to fig. 2, as an implementation of the above-described method for the above-described figures, the present disclosure provides some embodiments of an apparatus for indoor VR scene construction constrained by a combination of image semantics and scene geometry, which correspond to the method embodiments described above with reference to fig. 1, and which may be applied in various electronic devices.
As shown in fig. 2, an apparatus 200 for generating classification information of some embodiments includes:
an acquisition unit 201, a processing unit 202, a first input/output unit 203, a detection unit 204, a second input/output unit 205, a first determination unit 206, a third input/output unit 207, a second determination unit 208, and a third determination unit 209. Wherein the acquisition unit is configured to acquire an indoor image and a height of the camera from the ground when the indoor image is captured; a processing unit configured to pre-process the indoor image; a first input/output unit configured to input the preprocessed indoor image into a depth residual error network trained in advance, and output feature information of the indoor image; a detection unit configured to detect information of a straight line in the feature information; a second input/output unit configured to input information of the straight line into a convolutional neural network, and output information of a wall connection portion in the indoor image, wherein the wall connection portion is a straight line connecting between walls in the indoor image; a first determination unit configured to obtain indoor layout information based on a height of the camera from a ground and information of the wall connection part; a third input/output unit configured to input the feature information into a pre-trained target detection network and output indoor object information in the indoor image; a second determination unit configured to obtain position information of the indoor object based on the indoor object information and the information of the wall connection part; and the third determining unit is configured to complete the construction of the indoor and virtual reality interactive scenes according to the indoor layout information and the position information of the indoor objects.
It will be understood that the units described in the apparatus 200 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 200 and the units included therein, and are not described herein again.
Referring now to FIG. 3, a block diagram of an electronic device server 300 suitable for use in implementing some embodiments of the present disclosure is shown. The server shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 309, or installed from the storage device 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described above in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the apparatus; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining an indoor image and the height of a camera from the ground when the indoor image is shot; preprocessing an indoor image; inputting the preprocessed indoor image into a depth residual error network trained in advance, and outputting the characteristic information of the indoor image; detecting information of a straight line in the characteristic information; inputting the information of the straight line into a convolutional neural network, and outputting the information of a wall connection part in the indoor image, wherein the wall connection part is a straight line connecting walls in the indoor image; obtaining indoor layout information based on the height of the camera from the ground and the information of the wall connection part; inputting the characteristic information into a pre-trained target detection network, and outputting indoor object information in the indoor image; obtaining position information of the indoor object based on the information of the indoor object and the information of the wall connection part; and finishing the construction of an indoor and virtual reality interactive scene according to the indoor layout information and the position information of the indoor object.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: the device comprises an acquisition unit, a processing unit, a first input/output unit, a detection unit, a second input/output unit, a first determination unit, a third input/output unit, a second determination unit and a third determination unit. Here, the names of the units do not constitute a limitation to the unit itself in some cases, and for example, the acquisition unit may also be described as a "unit that obtains an indoor image and takes the above-described indoor image with the height of the camera from the ground".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (9)

1. A method for constructing an indoor VR scene constrained by joint image semantics and scene geometry comprises the following steps:
obtaining an indoor image and the height of a camera from the ground when the indoor image is shot;
preprocessing the indoor image;
inputting the preprocessed indoor image into a depth residual error network trained in advance, and outputting the characteristic information of the indoor image;
detecting information of straight lines in the characteristic information;
inputting the information of the straight lines into a convolutional neural network, and outputting the information of wall connection parts in the indoor image, wherein the wall connection parts are the straight lines connected between the walls in the indoor image;
obtaining indoor layout information based on the height of the camera from the ground and the information of the wall connection part;
inputting the characteristic information into a pre-trained target detection network, and outputting indoor object information in the indoor image;
obtaining position information of the indoor object based on the indoor object information and the information of the wall body connecting part;
and finishing the construction of an indoor and virtual reality interactive scene according to the indoor layout information and the position information of the indoor object.
2. The method of claim 1, wherein the pre-processing the indoor image comprises:
adjusting the indoor image to a predetermined resolution.
3. The method of claim 1, wherein the detecting information of the straight line in the feature information comprises:
and detecting the position of the straight line in the characteristic information by using a straight line detection algorithm, and storing the position information of the straight line.
4. The method of claim 3, wherein the inputting the information of the straight line into a convolutional neural network and outputting the information of the wall connection part in the indoor image comprises:
and taking the convolutional neural network as a discriminator, scoring the positions of the stored straight lines, and obtaining the straight line of the wall connecting part through scoring.
5. The method of claim 1, wherein the inputting the feature information of the indoor image into a pre-trained target detection network and outputting the indoor object information in the indoor image comprises:
and detecting the indoor object of the indoor image by using a target detection network, and outputting the center coordinate, the length and the width information of the indoor object in the indoor image.
6. The method of claim 5, wherein the completing the construction of the indoor and virtual reality interaction scene according to the indoor layout information and the position information of the indoor object comprises:
clipping the indoor image;
detecting the position coordinates of the wall connecting part in the indoor image through the information of the wall connecting part output by the convolutional neural network;
extracting the central coordinates of the indoor object in the indoor image;
obtaining the position of the indoor object through the relation between the position coordinate of the wall connecting part in the indoor image and the center coordinate of the indoor object;
and finishing the construction of an indoor and virtual reality interactive scene based on the position of the indoor object and the information of the wall connecting part.
7. An apparatus for indoor VR scene construction constrained by joint image semantics and scene geometry, comprising:
an acquisition unit configured to obtain an indoor image and a height of a camera from a ground when the indoor image is captured;
a processing unit configured to pre-process the indoor image;
a first input and output unit configured to input the preprocessed indoor image into a depth residual error network trained in advance, and output feature information of the indoor image;
a detection unit configured to detect information of a straight line in the feature information;
a second input/output unit configured to input information of the straight line into a convolutional neural network, and output information of a wall connection portion in the indoor image, wherein the wall connection portion is a straight line connecting between walls in the indoor image;
a first determination unit configured to obtain indoor layout information based on a height of the camera from a ground and information of the wall connection part;
a third input/output unit configured to input the feature information into a pre-trained target detection network, and output indoor object information in the indoor image;
a second determination unit configured to obtain position information of the indoor object based on the indoor object information and the information of the wall connection part;
and the third determining unit is configured to complete the construction of the indoor and virtual reality interactive scenes according to the indoor layout information and the position information of the indoor objects.
8. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.
9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.
CN202010399289.4A 2020-05-12 2020-05-12 Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium Active CN111583417B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010399289.4A CN111583417B (en) 2020-05-12 2020-05-12 Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010399289.4A CN111583417B (en) 2020-05-12 2020-05-12 Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111583417A true CN111583417A (en) 2020-08-25
CN111583417B CN111583417B (en) 2022-05-03

Family

ID=72116980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010399289.4A Active CN111583417B (en) 2020-05-12 2020-05-12 Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111583417B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884841A (en) * 2021-04-14 2021-06-01 哈尔滨工业大学 Binocular vision positioning method based on semantic target
CN117373055A (en) * 2023-08-25 2024-01-09 国家粮食和物资储备局科学研究院 Method, system, electronic equipment and storage medium for detecting and identifying pests

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
CN106952338A (en) * 2017-03-14 2017-07-14 网易(杭州)网络有限公司 Method, system and the readable storage medium storing program for executing of three-dimensional reconstruction based on deep learning
CN108961395A (en) * 2018-07-03 2018-12-07 上海亦我信息技术有限公司 A method of three dimensional spatial scene is rebuild based on taking pictures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050722A (en) * 2014-06-06 2014-09-17 北京航空航天大学 Indoor three-dimensional scene layout and color transfer generation method driven by image contents
CN106952338A (en) * 2017-03-14 2017-07-14 网易(杭州)网络有限公司 Method, system and the readable storage medium storing program for executing of three-dimensional reconstruction based on deep learning
CN108961395A (en) * 2018-07-03 2018-12-07 上海亦我信息技术有限公司 A method of three dimensional spatial scene is rebuild based on taking pictures

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUHANG ZOU ET AL.: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
SHICHAO YANG ET AL.: "Real-time 3D Scene Layout from a Single Image Using Convolutional Neural Networks", 《2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884841A (en) * 2021-04-14 2021-06-01 哈尔滨工业大学 Binocular vision positioning method based on semantic target
CN112884841B (en) * 2021-04-14 2022-11-25 哈尔滨工业大学 Binocular vision positioning method based on semantic target
CN117373055A (en) * 2023-08-25 2024-01-09 国家粮食和物资储备局科学研究院 Method, system, electronic equipment and storage medium for detecting and identifying pests
CN117373055B (en) * 2023-08-25 2024-09-06 国家粮食和物资储备局科学研究院 Method, system, electronic equipment and storage medium for detecting and identifying pests

Also Published As

Publication number Publication date
CN111583417B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN109508678B (en) Training method of face detection model, and detection method and device of face key points
KR102126724B1 (en) Method and apparatus for restoring point cloud data
CN112529015B (en) Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping
KR102166458B1 (en) Defect inspection method and apparatus using image segmentation based on artificial neural network
CN108257178B (en) Method and apparatus for locating the position of a target human body
CN114758337B (en) Semantic instance reconstruction method, device, equipment and medium
CN102959946A (en) Augmenting image data based on related 3d point cloud data
CN105917354A (en) Spatial pyramid pooling networks for image processing
CN109272543B (en) Method and apparatus for generating a model
US12118807B2 (en) Apparatus and method for three-dimensional object recognition
CN114842139A (en) Building three-dimensional digital model construction method based on spatial analysis
CN111583417B (en) Method and device for constructing indoor VR scene based on image semantics and scene geometry joint constraint, electronic equipment and medium
CN113284144B (en) Tunnel detection method and device based on unmanned aerial vehicle
CN109345460B (en) Method and apparatus for rectifying image
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN114283343B (en) Map updating method, training method and device based on remote sensing satellite image
Murtiyoso et al. Semantic segmentation for building façade 3D point cloud from 2D orthophoto images using transfer learning
CN118097157A (en) Image segmentation method and system based on fuzzy clustering algorithm
CN117689887A (en) Workpiece grabbing method, device, equipment and storage medium based on point cloud segmentation
CN109543716B (en) K-line form image identification method based on deep learning
CN117237681A (en) Image processing method, device and related equipment
CN114998630B (en) Ground-to-air image registration method from coarse to fine
CN116563748A (en) Height measuring method and system for high-rise construction building
CN115375742A (en) Method and system for generating depth image
CN111968030B (en) Information generation method, apparatus, electronic device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant