CN111158463A - SLAM-based computer vision large space positioning method and system - Google Patents
SLAM-based computer vision large space positioning method and system Download PDFInfo
- Publication number
- CN111158463A CN111158463A CN201911206522.6A CN201911206522A CN111158463A CN 111158463 A CN111158463 A CN 111158463A CN 201911206522 A CN201911206522 A CN 201911206522A CN 111158463 A CN111158463 A CN 111158463A
- Authority
- CN
- China
- Prior art keywords
- scene
- user
- module
- image
- coordinate information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/01—Indexing scheme relating to G06F3/01
- G06F2203/012—Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
Abstract
A computer vision large space positioning method and system based on SLAM includes: the system comprises a shooting module, an inertia measurement module, a transmission module, an image preprocessing module, an SLAM module, a content generation module, a display module and a content display module, wherein the shooting module shoots a scene image in a user visual field range, the inertia measurement module detects user posture information, the transmission module transmits the scene image and the user posture information to a server side, the image preprocessing module processes the scene coordinate information and the user trunk posture to obtain scene coordinate information, the SLAM module constructs an instant map according to the scene coordinate information and the user posture information, the content generation module generates a user virtual model in the instant map to obtain a virtual reality large-space scene, and the transmission. Therefore, the cost of the virtual reality large-space scheme is obviously reduced by the camera arranged on the user terminal, the shooting visual angles of multiple users in different directions in the large space are integrated, the action posture of each user is completely and clearly identified, and the virtual reality interaction experience of the user is good.
Description
Technical Field
The invention relates to the technical field of space positioning, in particular to a computer vision large space positioning method and system based on SLAM.
Background
The virtual reality large space technology is used for realizing multi-user real-time virtual reality interaction in a wide scene by means of technologies such as wireless transmission, machine vision and space positioning, and is very suitable for application scenes such as offline multi-user VR battles, large space experience halls, virtual reality intelligent classrooms and virtual amusement parks.
However, the existing virtual reality large-space scheme has a small problem in application, and a plurality of cameras erected above a field are high in cost and difficult to debug and maintain. In addition, when the number of users in a scene is excessively dense, part of actions can be shielded by other users, and the actions cannot be shot and identified, so that the corresponding actions in a virtual reality scene are lost, and the virtual reality interaction experience of the users is influenced.
Disclosure of Invention
The embodiment of the invention discloses a computer vision large space positioning method And system based on SLAM (Simultaneous Localization And Mapping), which can greatly reduce the construction And maintenance cost of a virtual reality large space scheme, completely capture the action postures of a plurality of users in a virtual reality large space And ensure good virtual reality interaction experience of the users.
The first aspect of the embodiment of the invention discloses a computer vision large space positioning method based on SLAM, which comprises the following steps:
the method comprises the steps that a shooting module shoots a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module detects user attitude information;
the transmission module transmits the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module processes the scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module constructs an instant map according to the user posture information and the scene coordinate information;
the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module transmits the virtual reality large-space scene to the display module;
and the display module displays the virtual reality large space scene.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the processing, by the image preprocessing module, the scene image to obtain scene coordinate information and a user trunk pose includes:
recognizing the user posture image in the scene image by adopting a deep visual neural network to obtain the trunk posture of the user;
filtering the user gesture image in the scene image to obtain a pure scene image;
and identifying the depth information of the pure scene image to obtain the scene coordinate information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the constructing, by the SLAM module, an instant map according to the user posture information and the scene coordinate information includes:
the scene coordinate information is obtained by adopting a direct density method through a global minimum space specification function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtainedIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the generating a user virtual model in the instant map by the content generating module according to the scene coordinate information, the user posture information, and the user trunk posture to obtain a virtual reality large space scene includes:
generating the user virtual model according to a preset material template and the trunk posture of the user;
rendering the instant map according to the preset material template;
and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
As an optional implementation manner, in the first aspect of this embodiment of the present invention, the method further includes:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
The second aspect of the embodiments of the present invention discloses a computer vision large space positioning system based on SLAM, which includes:
the shooting module is used for shooting a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module is used for detecting user posture information;
the transmission module is used for transmitting the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module is used for processing the scene image to obtain scene coordinate information and a trunk posture of a user;
the SLAM module is used for constructing an instant map according to the user posture information and the scene coordinate information;
the content generation module is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module is further used for transmitting the virtual reality large space scene to the display module;
and the display module is used for displaying the virtual reality large space scene.
As an alternative implementation manner, in the second aspect of the embodiment of the present invention, the image preprocessing module includes:
the gesture collection submodule is used for identifying the user gesture image in the scene image by adopting a deep visual neural network to obtain the trunk gesture of the user;
the gesture filtering submodule is used for filtering the user gesture image in the scene image to obtain a pure scene image;
and the coordinate identification submodule is used for identifying the depth information of the pure scene image to obtain the scene coordinate information.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the SLAM module uses a direct dense method, finds the scene coordinate information through a global minimum spatial specification function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtainedIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the content generating module includes:
the model generation submodule is used for generating the user virtual model according to a preset material template and the user trunk posture;
the map rendering submodule is used for rendering the instant map according to the preset material template;
and the scene synthesis submodule is used for synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the system further includes:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
The third aspect of the embodiments of the present invention discloses a computer vision large space positioning system based on SLAM, which includes:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the SLAM-based computer vision large space positioning method disclosed by the first aspect of the embodiment of the invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute the method for positioning a large visual space based on a SLAM disclosed in the first aspect of the embodiments of the present invention.
A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.
A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, after the fixedly erected camera is replaced by the camera installed on the user terminal, the cost of the virtual reality large-space scheme is obviously reduced, and the action postures of each user can be completely and clearly identified and captured by integrating a plurality of shooting visual angles provided by a plurality of users in different directions in a large space, so that the virtual reality interaction experience of the user is good.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a SLAM-based computer vision large space positioning method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a SLAM-based computer vision large space positioning system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another SLAM-based computer vision large space positioning system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a computer vision large space positioning method and system based on SLAM, wherein after a fixedly erected camera is replaced by the camera installed on a user terminal, the cost of a virtual reality large space scheme is obviously reduced, a plurality of shooting visual angles provided by a plurality of users in different directions in a large space are integrated, the action posture of each user can be completely and clearly identified and captured, and the virtual reality interaction experience of the user is good.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a SLAM-based computer vision large space positioning method according to an embodiment of the present invention. As shown in fig. 1, the large space positioning method may include the following steps:
101. the shooting module shoots a scene image in the visual field range of the user, and the scene image comprises a user posture image.
In the embodiment of the invention, a plurality of users wear glasses type user terminals in a large space for carrying out virtual reality interaction, each user terminal comprises a shooting module, an inertia measurement module, a transmission module and a display module, and the number of the user terminals is at least two; in addition, the server side comprises a transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module. The number of the user terminals is limited to at least two, so that each user can be shot by the shooting module on the user terminal worn by other users, and virtual reality interaction is performed in a virtual reality large space.
As an alternative embodiment, the user terminal is in a glasses type, and the shooting module adopts two depth cameras, the two depth cameras are respectively installed at two sides of the display module of the user terminal, the camera is used for shooting a scene image which is equal to the actual visual field of a user, the scene image comprises user posture images of other users, the user in a large space expects actions towards other users in the virtual reality interaction process and drives the shooting module to face other users, the problem of shielding of the shooting visual angle when a camera erected at a fixed position is used for shooting is avoided, and the scene image shot by a user terminal worn by a plurality of users is synthesized, the user posture images of a plurality of users in a large space and the scene images of the large space where the users are located can be accurately and completely obtained, and the spatial position of the user terminal in the large space can be further obtained.
102. The inertial measurement module detects user attitude information.
In the embodiment of the invention, when a user moves quickly or rotates a visual angle quickly, a shooting module is used for identifying a scene image which is difficult to shoot clearly and completely, so that the inertial sensor in the inertial measurement module is used for measuring user posture information including a user pitch angle, a yaw angle, a tilt angle and the like, and the inertial sensor is used for performing auxiliary space positioning under the condition that the shooting module is invalid.
103. The transmission module transmits the scene image and the user posture information from the user terminal to the server terminal.
In the embodiment of the invention, the user terminal does not perform data processing tasks such as image processing and the like, but transmits the scene image to the server end in a low-delay wireless transmission mode through the transmission module, and the server end performs centralized calculation, so that the user terminal does not need to be provided with a special processor, the power consumption is saved, the weight of the user terminal is reduced, and the wearing experience of the user is good.
104. The image preprocessing module processes the scene image to obtain scene coordinate information and the trunk posture of the user.
In the embodiment of the invention, the scene image is a depth image, and information such as actual distance between the scene and an object can be identified.
As an optional implementation manner, a user posture image in a scene image is recognized by adopting a deep visual neural network to obtain a user trunk posture; filtering a user posture image in the scene image to obtain a pure scene image; and identifying the depth information of the pure scene image to obtain scene coordinate information. Specifically, the user posture image with the depth different from that of the large-space background environment can be conveniently recognized from the scene image by adopting the depth visual neural network, then the user posture image is filtered from the scene image, a pure scene image without object interference is obtained, and then the scene coordinate information of the scene image is determined according to the depth information of the pure scene image. Therefore, by distinguishing the user posture image in the scene image, the image processing efficiency is improved, and the recognition of the scene coordinate information is prevented from being interfered.
105. And the SLAM module constructs an instant map according to the user posture information and the scene coordinate information.
In the embodiment of the invention, a user wears a user terminal to move in a large space, and a shooting module gradually shoots more and more scene images, so that the actual environment and the user posture information of the large space are in a dynamic updating state, and an SLAM is adopted to construct an instant map of the large space.
As an alternative implementation, a direct dense method is adopted, scene coordinate information is obtained through a global minimum space specification function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtainedIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
Specifically, the SLAM module monitors the relative change of any object in the scene image and the user terminal in depth, obtains scene coordinate information or user posture information of any object relative to the user terminal through global minimum space specification processing, selects certain coordinate information as a fixed point, integrates the scene coordinate information and the user posture information acquired by a plurality of user terminals into the same coordinate system, and constructs an instant map including large-space scene coordinate information and user posture information of users in a large space. Therefore, the SLAM scheme is adopted to carry out real-time positioning on a large space, a real-time map is constructed, the positioning efficiency is effectively improved, and the identification effect on complex scenes is obviously improved.
106. And the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain the virtual reality large-space scene.
In the embodiment of the invention, through the steps 101-105, the instant map and the coordinate information of the user in the instant map are constructed, and at the moment, the model material to be displayed in the virtual reality large space can be generated according to the coordinate information.
As an optional implementation manner, generating a user virtual model according to a preset material template and the posture of the trunk of the user; rendering an instant map according to a preset material template; and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene. Specifically, assuming that the large space is applied to a virtual reality shooting game, the preset material template should include a battlefield material template of the scene image and a soldier material template, at this time, a soldier virtual model matched with user posture information and user trunk posture of each user is generated in the virtual reality large space according to the soldier material template and the user trunk posture, the large space is rendered into a battlefield scene according to the battlefield material template and scene coordinate information, and the scene synthesis submodule synthesizes the battlefield scene and the soldier virtual model to generate the virtual reality large space scene. In addition, by replacing preset material templates with different themes, application scenes with larger space can be flexibly changed, and the applicability is strong.
As another optional implementation, the server is provided with an image processing acceleration module, which includes an image processing chip dedicated to image processing, and under task scenes such as real-time processing of scene images, real-time construction of a real-time map, rendering and generation of a virtual reality large-space scene, the processing efficiency of the image processing chip is greatly superior to that of a conventional general processor, so that synchronization between the virtual reality large-space scene and an actual scene is ensured.
107. And the transmission module transmits the virtual reality large-space scene to the display module.
In the embodiment of the invention, the transmission module transmits the virtual reality large-space scene generated by the server end to the display module of the user terminal in a wireless transmission mode.
108. The display module displays a virtual reality large space scene.
In the embodiment of the invention, a user views an in-person virtual reality large-space scene through the display module.
It can be seen that by implementing the large space positioning method described in fig. 1, after the fixedly-erected camera is replaced by the camera installed at the user terminal, the cost of the virtual reality large space scheme is significantly reduced, and by integrating a plurality of shooting visual angles provided by multiple users in different directions in a large space, the action posture of each user can be recognized and captured completely and clearly, and the virtual reality interaction experience of the user is good.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a SLAM-based computer vision large space positioning system according to an embodiment of the present invention. As shown in fig. 2, the large space positioning system may include:
the shooting module 201 is configured to shoot a scene image within a user visual field range, where the scene image includes a user gesture image;
an inertial measurement module 202 for detecting user attitude information;
a transmission module 203, configured to transmit the scene image and the user posture information from the user terminal to the server;
the shooting module 201, the inertia measurement module 202, the transmission module 203 and the display module 204 form a user terminal, and the number of the user terminals is at least two; the server comprises a transmission module 204, an image preprocessing module 205, a SLAM module 206, a content generation module 207 and an image processing acceleration module 208;
the image preprocessing module 205 is configured to process a scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module 206 is configured to construct an instant map according to the user posture information and the scene coordinate information;
the content generation module 207 is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module 204 is further configured to transmit the virtual reality large space scene to the display module 203;
the display module 203 is used for displaying a virtual reality large space scene;
an image processing acceleration module 208, configured to accelerate the image processing flow of the image preprocessing module 205, the SLAM module 206, and the content generation module 207 when the image preprocessing module 205, the SLAM module 206, and the content generation module 207 run.
The image preprocessing module 205 further includes:
the gesture collection submodule 2051 is configured to identify a user gesture image in a scene image by using a deep visual neural network, so as to obtain a user trunk gesture;
a gesture filtering submodule 2052, configured to filter a user gesture image in the scene image to obtain a pure scene image;
and a coordinate identification submodule 2053, configured to identify depth information of the clean scene image, to obtain scene coordinate information.
And SLAM module 206 is specifically configured to use a direct dense method to obtain scene coordinate information via a global minimum spatial normalization function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtainedIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
In addition, the content generation module 207 further includes:
the model generation submodule 2071 is used for generating the user virtual model according to a preset material template and the posture of the trunk of the user;
a map rendering submodule 2072, configured to render an instant map according to a preset material template;
and a scene synthesis submodule 2073, configured to synthesize the instant map and the user virtual model to obtain a virtual reality large-space scene.
As an alternative embodiment, the user terminal is in the form of glasses, and the shooting module thereof employs two depth cameras, which are respectively installed at two sides of the display module 201 of the user terminal, the camera is used for shooting a scene image which is equal to the actual visual field of a user, the scene image comprises user posture images of other users, the user in a large space expects actions towards other users in the virtual reality interaction process and drives the shooting module to face other users, the problem of shielding of the shooting visual angle when a camera erected at a fixed position is used for shooting is avoided, and the scene image shot by a user terminal worn by a plurality of users is synthesized, the user posture images of a plurality of users in a large space and the scene images of the large space where the users are located can be accurately and completely obtained, and the spatial position of the user terminal in the large space can be further obtained.
As an optional implementation manner, the user terminal does not perform data processing tasks such as image processing, but transmits the scene image to the server end in a low-latency wireless transmission manner through the transmission module 204, and the server end performs centralized calculation, so that the user terminal does not need to be configured with a dedicated processor, power consumption is saved, the weight of the user terminal is reduced, and the user wearing experience is good.
As an optional implementation manner, the posture collection sub-module 2051 identifies the user posture image in the scene image by using a deep visual neural network to obtain the trunk posture of the user; the gesture filtering submodule 2072 filters the user gesture image in the scene image to obtain a pure scene image; the coordinate identification submodule 2073 identifies the depth information of the clean scene image to obtain scene coordinate information. Specifically, the gesture collection submodule 2051 may conveniently identify the user gesture image with a depth different from that of the large-space background environment from the scene image by using the depth visual neural network, and the gesture filtering submodule 2072 filters the user gesture image from the scene image to obtain a pure scene image without object interference, and then the coordinate recognition submodule 2073 determines the scene coordinate information of the scene image according to the depth information of the pure scene image. Therefore, by distinguishing the user posture image in the scene image, the image processing efficiency is improved, and the recognition of the scene coordinate information is prevented from being interfered.
As an optional implementation manner, the SLAM module 206 monitors the relative change in depth between any object in the scene image and the user terminal, obtains scene coordinate information or user posture information of any object relative to the user terminal through global minimum space specification processing, selects a certain coordinate information as a fixed point, and then integrates the scene coordinate information and the user posture information acquired by the plurality of user terminals into the same coordinate system to construct an instant map including the large-space scene coordinate information and the user posture information of the user in the large space. Therefore, the SLAM scheme is adopted to carry out real-time positioning on a large space, a real-time map is constructed, the positioning efficiency is effectively improved, and the identification effect on complex scenes is obviously improved.
As an optional implementation manner, the model generation submodule 2071 generates a user virtual model according to the preset material template and the posture of the user trunk; the map rendering submodule 2072 renders the instant map according to the preset material template; the scene synthesis submodule 2073 synthesizes the instant map and the user virtual model to obtain a virtual reality large space scene. Specifically, assuming that the large space is applied to a virtual reality shooting game, the preset material templates should include battlefield material templates and soldier image templates of scene images, at this time, the model generation submodule 2071 generates a soldier virtual model corresponding to each user according to the soldier image templates and the trunk postures of the users, the map rendering submodule 2072 renders the large space into a battlefield scene according to the battlefield material templates and scene coordinate information, and the scene synthesis submodule 2073 synthesizes the battlefield scene and the soldier virtual model to generate a virtual reality large space scene. In addition, by replacing preset material templates with different themes, application scenes with larger space can be flexibly changed, and the applicability is strong.
As another optional implementation, the server is provided with an image processing acceleration module, which includes an image processing chip dedicated to image processing, and under task scenes such as real-time processing of scene images, real-time construction of a real-time map, rendering and generation of a virtual reality large-space scene, the processing efficiency of the image processing chip is greatly superior to that of a conventional general processor, so that synchronization between the virtual reality large-space scene and an actual scene is ensured.
It can be seen that by implementing the large space positioning system described in fig. 2, after the camera installed at the user terminal is adopted to replace the fixedly erected camera, the cost of the virtual reality large space scheme is significantly reduced, and by integrating a plurality of shooting visual angles provided by multiple users in different directions in a large space, the action posture of each user can be recognized and captured completely and clearly, and the virtual reality interaction experience of the user is good.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another SLAM-based computer vision large space positioning system according to an embodiment of the present invention. As shown in fig. 3, the learning apparatus may include:
a memory 301 storing executable program code;
a processor 302 coupled to the memory 301;
the processor 302 calls the executable program code stored in the memory 301 to execute a SLAM-based computer vision large space positioning method described in fig. 1.
The embodiment of the invention discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a SLAM-based computer vision large space positioning method described in figure 1.
Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.
The embodiment of the present invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing a computer program product, and when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of the method in the above method embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated modules, if implemented as software functional modules and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.
The method and the system for computer vision large space positioning based on SLAM disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A computer vision large space positioning method based on SLAM is characterized by comprising the following steps:
the method comprises the steps that a shooting module shoots a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module detects user attitude information;
the transmission module transmits the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module processes the scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module constructs an instant map according to the user posture information and the scene coordinate information;
the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module transmits the virtual reality large-space scene to the display module;
and the display module displays the virtual reality large space scene.
2. The method of claim 1, wherein the image preprocessing module processes the scene imagery to obtain scene coordinate information and a user torso pose, comprising:
recognizing the user posture image in the scene image by adopting a deep visual neural network to obtain the trunk posture of the user;
filtering the user gesture image in the scene image to obtain a pure scene image;
and identifying the depth information of the pure scene image to obtain the scene coordinate information.
3. The method of claim 1, wherein the SLAM module constructs an instant map from the user pose information and the scene coordinate information, comprising:
the scene coordinate information is obtained by adopting a direct density method through a global minimum space specification function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtainedIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
4. The method of claim 3, wherein the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture, and obtains a virtual reality large space scene, including:
generating the user virtual model according to a preset material template and the trunk posture of the user;
rendering the instant map according to the preset material template;
and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
5. The method according to any one of claims 1 to 4, further comprising:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
6. A SLAM-based computer vision large space positioning system, comprising:
the shooting module is used for shooting a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module is used for detecting user posture information;
the transmission module is used for transmitting the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module is used for processing the scene image to obtain scene coordinate information and a trunk posture of a user;
the SLAM module is used for constructing an instant map according to the user posture information and the scene coordinate information;
the content generation module is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module is further used for transmitting the virtual reality large space scene to the display module;
and the display module is used for displaying the virtual reality large space scene.
7. The system of claim 6, wherein the image pre-processing module comprises:
the gesture collection submodule is used for identifying the user gesture image in the scene image by adopting a deep visual neural network to obtain the trunk gesture of the user;
the gesture filtering submodule is used for filtering the user gesture image in the scene image to obtain a pure scene image;
and the coordinate identification submodule is used for identifying the depth information of the pure scene image to obtain the scene coordinate information.
8. The system of claim 6, wherein the SLAM module uses a direct dense method to find scene coordinate information through a global minimum spatial norm function,
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,it is shown that,representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene imageNamely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the scene image at the current moment can be obtained by obtaining the user posture information according to the scene coordinate information of any object in the scene image of the previous frame, wherein the user posture information is obtained by calculationIn the form of a circumferential ratio,the depth distance between any object and the shooting module is obtained.
9. The system of claim 8, wherein the content generation module comprises:
the model generation submodule is used for generating the user virtual model according to a preset material template and the user trunk posture;
the map rendering submodule is used for rendering the instant map according to the preset material template;
and the scene synthesis submodule is used for synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
10. The system according to any one of claims 6 to 9, further comprising:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911206522.6A CN111158463A (en) | 2019-11-29 | 2019-11-29 | SLAM-based computer vision large space positioning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911206522.6A CN111158463A (en) | 2019-11-29 | 2019-11-29 | SLAM-based computer vision large space positioning method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111158463A true CN111158463A (en) | 2020-05-15 |
Family
ID=70556320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911206522.6A Withdrawn CN111158463A (en) | 2019-11-29 | 2019-11-29 | SLAM-based computer vision large space positioning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111158463A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051424A (en) * | 2021-03-26 | 2021-06-29 | 联想(北京)有限公司 | Positioning method and device based on SLAM map |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106896925A (en) * | 2017-04-14 | 2017-06-27 | 陈柳华 | The device that a kind of virtual reality is merged with real scene |
CN107168532A (en) * | 2017-05-05 | 2017-09-15 | 武汉秀宝软件有限公司 | A kind of virtual synchronous display methods and system based on augmented reality |
CN107820593A (en) * | 2017-07-28 | 2018-03-20 | 深圳市瑞立视多媒体科技有限公司 | A kind of virtual reality exchange method, apparatus and system |
CN108022302A (en) * | 2017-12-01 | 2018-05-11 | 深圳市天界幻境科技有限公司 | A kind of sterically defined AR 3 d display devices of Inside-Out |
CN109358754A (en) * | 2018-11-02 | 2019-02-19 | 北京盈迪曼德科技有限公司 | A kind of mixed reality wears display system |
CN109387204A (en) * | 2018-09-26 | 2019-02-26 | 东北大学 | The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber |
CN109584295A (en) * | 2017-09-29 | 2019-04-05 | 阿里巴巴集团控股有限公司 | The method, apparatus and system of automatic marking are carried out to target object in image |
CN109671118A (en) * | 2018-11-02 | 2019-04-23 | 北京盈迪曼德科技有限公司 | A kind of more people's exchange methods of virtual reality, apparatus and system |
-
2019
- 2019-11-29 CN CN201911206522.6A patent/CN111158463A/en not_active Withdrawn
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106896925A (en) * | 2017-04-14 | 2017-06-27 | 陈柳华 | The device that a kind of virtual reality is merged with real scene |
CN107168532A (en) * | 2017-05-05 | 2017-09-15 | 武汉秀宝软件有限公司 | A kind of virtual synchronous display methods and system based on augmented reality |
CN107820593A (en) * | 2017-07-28 | 2018-03-20 | 深圳市瑞立视多媒体科技有限公司 | A kind of virtual reality exchange method, apparatus and system |
CN109584295A (en) * | 2017-09-29 | 2019-04-05 | 阿里巴巴集团控股有限公司 | The method, apparatus and system of automatic marking are carried out to target object in image |
CN108022302A (en) * | 2017-12-01 | 2018-05-11 | 深圳市天界幻境科技有限公司 | A kind of sterically defined AR 3 d display devices of Inside-Out |
CN109387204A (en) * | 2018-09-26 | 2019-02-26 | 东北大学 | The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber |
CN109358754A (en) * | 2018-11-02 | 2019-02-19 | 北京盈迪曼德科技有限公司 | A kind of mixed reality wears display system |
CN109671118A (en) * | 2018-11-02 | 2019-04-23 | 北京盈迪曼德科技有限公司 | A kind of more people's exchange methods of virtual reality, apparatus and system |
Non-Patent Citations (1)
Title |
---|
万琴等: "基于三维视觉系统的多运动目标跟踪方法综述", 《计算机工程与应用》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051424A (en) * | 2021-03-26 | 2021-06-29 | 联想(北京)有限公司 | Positioning method and device based on SLAM map |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10460512B2 (en) | 3D skeletonization using truncated epipolar lines | |
US10674142B2 (en) | Optimized object scanning using sensor fusion | |
US9855496B2 (en) | Stereo video for gaming | |
CN107341827B (en) | Video processing method, device and storage medium | |
CN111710036B (en) | Method, device, equipment and storage medium for constructing three-dimensional face model | |
CN110363133B (en) | Method, device, equipment and storage medium for sight line detection and video processing | |
US20130127827A1 (en) | Multiview Face Content Creation | |
CN113706699B (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN106296598B (en) | 3 d pose processing method, system and camera terminal | |
CN106896925A (en) | The device that a kind of virtual reality is merged with real scene | |
US20160210761A1 (en) | 3d reconstruction | |
CN109255749A (en) | From the map structuring optimization in non-autonomous platform of advocating peace | |
CN112819875B (en) | Monocular depth estimation method and device and electronic equipment | |
CN111833457A (en) | Image processing method, apparatus and storage medium | |
CN111667588A (en) | Person image processing method, person image processing device, AR device and storage medium | |
KR20230078777A (en) | 3D reconstruction methods, devices and systems, media and computer equipment | |
US20210035326A1 (en) | Human pose estimation system | |
CN115482556A (en) | Method for key point detection model training and virtual character driving and corresponding device | |
CN111158463A (en) | SLAM-based computer vision large space positioning method and system | |
CN111383313B (en) | Virtual model rendering method, device, equipment and readable storage medium | |
CN111079535B (en) | Human skeleton action recognition method and device and terminal | |
CN113010009B (en) | Object sharing method and device | |
CN114882106A (en) | Pose determination method and device, equipment and medium | |
CN114299262A (en) | Display method and device for augmented reality AR scene | |
CA3172140A1 (en) | Full skeletal 3d pose recovery from monocular camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200515 |
|
WW01 | Invention patent application withdrawn after publication |