CN111158463A - SLAM-based computer vision large space positioning method and system - Google Patents

SLAM-based computer vision large space positioning method and system Download PDF

Info

Publication number
CN111158463A
CN111158463A CN201911206522.6A CN201911206522A CN111158463A CN 111158463 A CN111158463 A CN 111158463A CN 201911206522 A CN201911206522 A CN 201911206522A CN 111158463 A CN111158463 A CN 111158463A
Authority
CN
China
Prior art keywords
scene
user
module
image
coordinate information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911206522.6A
Other languages
Chinese (zh)
Inventor
黄昌正
周言明
陈曦
黄庆麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Yilian Interation Information Technology Co ltd
Huaibei Huanjing Intelligent Technology Co ltd
Original Assignee
Dongguan Yilian Interation Information Technology Co ltd
Huaibei Huanjing Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Yilian Interation Information Technology Co ltd, Huaibei Huanjing Intelligent Technology Co ltd filed Critical Dongguan Yilian Interation Information Technology Co ltd
Priority to CN201911206522.6A priority Critical patent/CN111158463A/en
Publication of CN111158463A publication Critical patent/CN111158463A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/01Indexing scheme relating to G06F3/01
    • G06F2203/012Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment

Abstract

A computer vision large space positioning method and system based on SLAM includes: the system comprises a shooting module, an inertia measurement module, a transmission module, an image preprocessing module, an SLAM module, a content generation module, a display module and a content display module, wherein the shooting module shoots a scene image in a user visual field range, the inertia measurement module detects user posture information, the transmission module transmits the scene image and the user posture information to a server side, the image preprocessing module processes the scene coordinate information and the user trunk posture to obtain scene coordinate information, the SLAM module constructs an instant map according to the scene coordinate information and the user posture information, the content generation module generates a user virtual model in the instant map to obtain a virtual reality large-space scene, and the transmission. Therefore, the cost of the virtual reality large-space scheme is obviously reduced by the camera arranged on the user terminal, the shooting visual angles of multiple users in different directions in the large space are integrated, the action posture of each user is completely and clearly identified, and the virtual reality interaction experience of the user is good.

Description

SLAM-based computer vision large space positioning method and system
Technical Field
The invention relates to the technical field of space positioning, in particular to a computer vision large space positioning method and system based on SLAM.
Background
The virtual reality large space technology is used for realizing multi-user real-time virtual reality interaction in a wide scene by means of technologies such as wireless transmission, machine vision and space positioning, and is very suitable for application scenes such as offline multi-user VR battles, large space experience halls, virtual reality intelligent classrooms and virtual amusement parks.
However, the existing virtual reality large-space scheme has a small problem in application, and a plurality of cameras erected above a field are high in cost and difficult to debug and maintain. In addition, when the number of users in a scene is excessively dense, part of actions can be shielded by other users, and the actions cannot be shot and identified, so that the corresponding actions in a virtual reality scene are lost, and the virtual reality interaction experience of the users is influenced.
Disclosure of Invention
The embodiment of the invention discloses a computer vision large space positioning method And system based on SLAM (Simultaneous Localization And Mapping), which can greatly reduce the construction And maintenance cost of a virtual reality large space scheme, completely capture the action postures of a plurality of users in a virtual reality large space And ensure good virtual reality interaction experience of the users.
The first aspect of the embodiment of the invention discloses a computer vision large space positioning method based on SLAM, which comprises the following steps:
the method comprises the steps that a shooting module shoots a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module detects user attitude information;
the transmission module transmits the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module processes the scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module constructs an instant map according to the user posture information and the scene coordinate information;
the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module transmits the virtual reality large-space scene to the display module;
and the display module displays the virtual reality large space scene.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the processing, by the image preprocessing module, the scene image to obtain scene coordinate information and a user trunk pose includes:
recognizing the user posture image in the scene image by adopting a deep visual neural network to obtain the trunk posture of the user;
filtering the user gesture image in the scene image to obtain a pure scene image;
and identifying the depth information of the pure scene image to obtain the scene coordinate information.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the constructing, by the SLAM module, an instant map according to the user posture information and the scene coordinate information includes:
the scene coordinate information is obtained by adopting a direct density method through a global minimum space specification function,
Figure 819752DEST_PATH_IMAGE001
Figure 325820DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 73196DEST_PATH_IMAGE003
it is shown that,
Figure 486860DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 370502DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtained
Figure 578630DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 547723DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the generating a user virtual model in the instant map by the content generating module according to the scene coordinate information, the user posture information, and the user trunk posture to obtain a virtual reality large space scene includes:
generating the user virtual model according to a preset material template and the trunk posture of the user;
rendering the instant map according to the preset material template;
and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
As an optional implementation manner, in the first aspect of this embodiment of the present invention, the method further includes:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
The second aspect of the embodiments of the present invention discloses a computer vision large space positioning system based on SLAM, which includes:
the shooting module is used for shooting a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module is used for detecting user posture information;
the transmission module is used for transmitting the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module is used for processing the scene image to obtain scene coordinate information and a trunk posture of a user;
the SLAM module is used for constructing an instant map according to the user posture information and the scene coordinate information;
the content generation module is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module is further used for transmitting the virtual reality large space scene to the display module;
and the display module is used for displaying the virtual reality large space scene.
As an alternative implementation manner, in the second aspect of the embodiment of the present invention, the image preprocessing module includes:
the gesture collection submodule is used for identifying the user gesture image in the scene image by adopting a deep visual neural network to obtain the trunk gesture of the user;
the gesture filtering submodule is used for filtering the user gesture image in the scene image to obtain a pure scene image;
and the coordinate identification submodule is used for identifying the depth information of the pure scene image to obtain the scene coordinate information.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the SLAM module uses a direct dense method, finds the scene coordinate information through a global minimum spatial specification function,
Figure 233919DEST_PATH_IMAGE008
Figure 972068DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 616676DEST_PATH_IMAGE003
it is shown that,
Figure 541906DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 297373DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtained
Figure 155607DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 439958DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the content generating module includes:
the model generation submodule is used for generating the user virtual model according to a preset material template and the user trunk posture;
the map rendering submodule is used for rendering the instant map according to the preset material template;
and the scene synthesis submodule is used for synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the system further includes:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
The third aspect of the embodiments of the present invention discloses a computer vision large space positioning system based on SLAM, which includes:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the SLAM-based computer vision large space positioning method disclosed by the first aspect of the embodiment of the invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program enables a computer to execute the method for positioning a large visual space based on a SLAM disclosed in the first aspect of the embodiments of the present invention.
A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.
A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, after the fixedly erected camera is replaced by the camera installed on the user terminal, the cost of the virtual reality large-space scheme is obviously reduced, and the action postures of each user can be completely and clearly identified and captured by integrating a plurality of shooting visual angles provided by a plurality of users in different directions in a large space, so that the virtual reality interaction experience of the user is good.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a SLAM-based computer vision large space positioning method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a SLAM-based computer vision large space positioning system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another SLAM-based computer vision large space positioning system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a computer vision large space positioning method and system based on SLAM, wherein after a fixedly erected camera is replaced by the camera installed on a user terminal, the cost of a virtual reality large space scheme is obviously reduced, a plurality of shooting visual angles provided by a plurality of users in different directions in a large space are integrated, the action posture of each user can be completely and clearly identified and captured, and the virtual reality interaction experience of the user is good.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a SLAM-based computer vision large space positioning method according to an embodiment of the present invention. As shown in fig. 1, the large space positioning method may include the following steps:
101. the shooting module shoots a scene image in the visual field range of the user, and the scene image comprises a user posture image.
In the embodiment of the invention, a plurality of users wear glasses type user terminals in a large space for carrying out virtual reality interaction, each user terminal comprises a shooting module, an inertia measurement module, a transmission module and a display module, and the number of the user terminals is at least two; in addition, the server side comprises a transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module. The number of the user terminals is limited to at least two, so that each user can be shot by the shooting module on the user terminal worn by other users, and virtual reality interaction is performed in a virtual reality large space.
As an alternative embodiment, the user terminal is in a glasses type, and the shooting module adopts two depth cameras, the two depth cameras are respectively installed at two sides of the display module of the user terminal, the camera is used for shooting a scene image which is equal to the actual visual field of a user, the scene image comprises user posture images of other users, the user in a large space expects actions towards other users in the virtual reality interaction process and drives the shooting module to face other users, the problem of shielding of the shooting visual angle when a camera erected at a fixed position is used for shooting is avoided, and the scene image shot by a user terminal worn by a plurality of users is synthesized, the user posture images of a plurality of users in a large space and the scene images of the large space where the users are located can be accurately and completely obtained, and the spatial position of the user terminal in the large space can be further obtained.
102. The inertial measurement module detects user attitude information.
In the embodiment of the invention, when a user moves quickly or rotates a visual angle quickly, a shooting module is used for identifying a scene image which is difficult to shoot clearly and completely, so that the inertial sensor in the inertial measurement module is used for measuring user posture information including a user pitch angle, a yaw angle, a tilt angle and the like, and the inertial sensor is used for performing auxiliary space positioning under the condition that the shooting module is invalid.
103. The transmission module transmits the scene image and the user posture information from the user terminal to the server terminal.
In the embodiment of the invention, the user terminal does not perform data processing tasks such as image processing and the like, but transmits the scene image to the server end in a low-delay wireless transmission mode through the transmission module, and the server end performs centralized calculation, so that the user terminal does not need to be provided with a special processor, the power consumption is saved, the weight of the user terminal is reduced, and the wearing experience of the user is good.
104. The image preprocessing module processes the scene image to obtain scene coordinate information and the trunk posture of the user.
In the embodiment of the invention, the scene image is a depth image, and information such as actual distance between the scene and an object can be identified.
As an optional implementation manner, a user posture image in a scene image is recognized by adopting a deep visual neural network to obtain a user trunk posture; filtering a user posture image in the scene image to obtain a pure scene image; and identifying the depth information of the pure scene image to obtain scene coordinate information. Specifically, the user posture image with the depth different from that of the large-space background environment can be conveniently recognized from the scene image by adopting the depth visual neural network, then the user posture image is filtered from the scene image, a pure scene image without object interference is obtained, and then the scene coordinate information of the scene image is determined according to the depth information of the pure scene image. Therefore, by distinguishing the user posture image in the scene image, the image processing efficiency is improved, and the recognition of the scene coordinate information is prevented from being interfered.
105. And the SLAM module constructs an instant map according to the user posture information and the scene coordinate information.
In the embodiment of the invention, a user wears a user terminal to move in a large space, and a shooting module gradually shoots more and more scene images, so that the actual environment and the user posture information of the large space are in a dynamic updating state, and an SLAM is adopted to construct an instant map of the large space.
As an alternative implementation, a direct dense method is adopted, scene coordinate information is obtained through a global minimum space specification function,
Figure 118064DEST_PATH_IMAGE001
Figure 677222DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 858804DEST_PATH_IMAGE003
it is shown that,
Figure 845215DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 276196DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtained
Figure 842307DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 409554DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
Specifically, the SLAM module monitors the relative change of any object in the scene image and the user terminal in depth, obtains scene coordinate information or user posture information of any object relative to the user terminal through global minimum space specification processing, selects certain coordinate information as a fixed point, integrates the scene coordinate information and the user posture information acquired by a plurality of user terminals into the same coordinate system, and constructs an instant map including large-space scene coordinate information and user posture information of users in a large space. Therefore, the SLAM scheme is adopted to carry out real-time positioning on a large space, a real-time map is constructed, the positioning efficiency is effectively improved, and the identification effect on complex scenes is obviously improved.
106. And the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain the virtual reality large-space scene.
In the embodiment of the invention, through the steps 101-105, the instant map and the coordinate information of the user in the instant map are constructed, and at the moment, the model material to be displayed in the virtual reality large space can be generated according to the coordinate information.
As an optional implementation manner, generating a user virtual model according to a preset material template and the posture of the trunk of the user; rendering an instant map according to a preset material template; and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene. Specifically, assuming that the large space is applied to a virtual reality shooting game, the preset material template should include a battlefield material template of the scene image and a soldier material template, at this time, a soldier virtual model matched with user posture information and user trunk posture of each user is generated in the virtual reality large space according to the soldier material template and the user trunk posture, the large space is rendered into a battlefield scene according to the battlefield material template and scene coordinate information, and the scene synthesis submodule synthesizes the battlefield scene and the soldier virtual model to generate the virtual reality large space scene. In addition, by replacing preset material templates with different themes, application scenes with larger space can be flexibly changed, and the applicability is strong.
As another optional implementation, the server is provided with an image processing acceleration module, which includes an image processing chip dedicated to image processing, and under task scenes such as real-time processing of scene images, real-time construction of a real-time map, rendering and generation of a virtual reality large-space scene, the processing efficiency of the image processing chip is greatly superior to that of a conventional general processor, so that synchronization between the virtual reality large-space scene and an actual scene is ensured.
107. And the transmission module transmits the virtual reality large-space scene to the display module.
In the embodiment of the invention, the transmission module transmits the virtual reality large-space scene generated by the server end to the display module of the user terminal in a wireless transmission mode.
108. The display module displays a virtual reality large space scene.
In the embodiment of the invention, a user views an in-person virtual reality large-space scene through the display module.
It can be seen that by implementing the large space positioning method described in fig. 1, after the fixedly-erected camera is replaced by the camera installed at the user terminal, the cost of the virtual reality large space scheme is significantly reduced, and by integrating a plurality of shooting visual angles provided by multiple users in different directions in a large space, the action posture of each user can be recognized and captured completely and clearly, and the virtual reality interaction experience of the user is good.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a SLAM-based computer vision large space positioning system according to an embodiment of the present invention. As shown in fig. 2, the large space positioning system may include:
the shooting module 201 is configured to shoot a scene image within a user visual field range, where the scene image includes a user gesture image;
an inertial measurement module 202 for detecting user attitude information;
a transmission module 203, configured to transmit the scene image and the user posture information from the user terminal to the server;
the shooting module 201, the inertia measurement module 202, the transmission module 203 and the display module 204 form a user terminal, and the number of the user terminals is at least two; the server comprises a transmission module 204, an image preprocessing module 205, a SLAM module 206, a content generation module 207 and an image processing acceleration module 208;
the image preprocessing module 205 is configured to process a scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module 206 is configured to construct an instant map according to the user posture information and the scene coordinate information;
the content generation module 207 is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module 204 is further configured to transmit the virtual reality large space scene to the display module 203;
the display module 203 is used for displaying a virtual reality large space scene;
an image processing acceleration module 208, configured to accelerate the image processing flow of the image preprocessing module 205, the SLAM module 206, and the content generation module 207 when the image preprocessing module 205, the SLAM module 206, and the content generation module 207 run.
The image preprocessing module 205 further includes:
the gesture collection submodule 2051 is configured to identify a user gesture image in a scene image by using a deep visual neural network, so as to obtain a user trunk gesture;
a gesture filtering submodule 2052, configured to filter a user gesture image in the scene image to obtain a pure scene image;
and a coordinate identification submodule 2053, configured to identify depth information of the clean scene image, to obtain scene coordinate information.
And SLAM module 206 is specifically configured to use a direct dense method to obtain scene coordinate information via a global minimum spatial normalization function,
Figure 566866DEST_PATH_IMAGE001
Figure 688406DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 589366DEST_PATH_IMAGE003
it is shown that,
Figure 276699DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 73754DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtained
Figure 213748DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 664538DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
In addition, the content generation module 207 further includes:
the model generation submodule 2071 is used for generating the user virtual model according to a preset material template and the posture of the trunk of the user;
a map rendering submodule 2072, configured to render an instant map according to a preset material template;
and a scene synthesis submodule 2073, configured to synthesize the instant map and the user virtual model to obtain a virtual reality large-space scene.
As an alternative embodiment, the user terminal is in the form of glasses, and the shooting module thereof employs two depth cameras, which are respectively installed at two sides of the display module 201 of the user terminal, the camera is used for shooting a scene image which is equal to the actual visual field of a user, the scene image comprises user posture images of other users, the user in a large space expects actions towards other users in the virtual reality interaction process and drives the shooting module to face other users, the problem of shielding of the shooting visual angle when a camera erected at a fixed position is used for shooting is avoided, and the scene image shot by a user terminal worn by a plurality of users is synthesized, the user posture images of a plurality of users in a large space and the scene images of the large space where the users are located can be accurately and completely obtained, and the spatial position of the user terminal in the large space can be further obtained.
As an optional implementation manner, the user terminal does not perform data processing tasks such as image processing, but transmits the scene image to the server end in a low-latency wireless transmission manner through the transmission module 204, and the server end performs centralized calculation, so that the user terminal does not need to be configured with a dedicated processor, power consumption is saved, the weight of the user terminal is reduced, and the user wearing experience is good.
As an optional implementation manner, the posture collection sub-module 2051 identifies the user posture image in the scene image by using a deep visual neural network to obtain the trunk posture of the user; the gesture filtering submodule 2072 filters the user gesture image in the scene image to obtain a pure scene image; the coordinate identification submodule 2073 identifies the depth information of the clean scene image to obtain scene coordinate information. Specifically, the gesture collection submodule 2051 may conveniently identify the user gesture image with a depth different from that of the large-space background environment from the scene image by using the depth visual neural network, and the gesture filtering submodule 2072 filters the user gesture image from the scene image to obtain a pure scene image without object interference, and then the coordinate recognition submodule 2073 determines the scene coordinate information of the scene image according to the depth information of the pure scene image. Therefore, by distinguishing the user posture image in the scene image, the image processing efficiency is improved, and the recognition of the scene coordinate information is prevented from being interfered.
As an optional implementation manner, the SLAM module 206 monitors the relative change in depth between any object in the scene image and the user terminal, obtains scene coordinate information or user posture information of any object relative to the user terminal through global minimum space specification processing, selects a certain coordinate information as a fixed point, and then integrates the scene coordinate information and the user posture information acquired by the plurality of user terminals into the same coordinate system to construct an instant map including the large-space scene coordinate information and the user posture information of the user in the large space. Therefore, the SLAM scheme is adopted to carry out real-time positioning on a large space, a real-time map is constructed, the positioning efficiency is effectively improved, and the identification effect on complex scenes is obviously improved.
As an optional implementation manner, the model generation submodule 2071 generates a user virtual model according to the preset material template and the posture of the user trunk; the map rendering submodule 2072 renders the instant map according to the preset material template; the scene synthesis submodule 2073 synthesizes the instant map and the user virtual model to obtain a virtual reality large space scene. Specifically, assuming that the large space is applied to a virtual reality shooting game, the preset material templates should include battlefield material templates and soldier image templates of scene images, at this time, the model generation submodule 2071 generates a soldier virtual model corresponding to each user according to the soldier image templates and the trunk postures of the users, the map rendering submodule 2072 renders the large space into a battlefield scene according to the battlefield material templates and scene coordinate information, and the scene synthesis submodule 2073 synthesizes the battlefield scene and the soldier virtual model to generate a virtual reality large space scene. In addition, by replacing preset material templates with different themes, application scenes with larger space can be flexibly changed, and the applicability is strong.
As another optional implementation, the server is provided with an image processing acceleration module, which includes an image processing chip dedicated to image processing, and under task scenes such as real-time processing of scene images, real-time construction of a real-time map, rendering and generation of a virtual reality large-space scene, the processing efficiency of the image processing chip is greatly superior to that of a conventional general processor, so that synchronization between the virtual reality large-space scene and an actual scene is ensured.
It can be seen that by implementing the large space positioning system described in fig. 2, after the camera installed at the user terminal is adopted to replace the fixedly erected camera, the cost of the virtual reality large space scheme is significantly reduced, and by integrating a plurality of shooting visual angles provided by multiple users in different directions in a large space, the action posture of each user can be recognized and captured completely and clearly, and the virtual reality interaction experience of the user is good.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another SLAM-based computer vision large space positioning system according to an embodiment of the present invention. As shown in fig. 3, the learning apparatus may include:
a memory 301 storing executable program code;
a processor 302 coupled to the memory 301;
the processor 302 calls the executable program code stored in the memory 301 to execute a SLAM-based computer vision large space positioning method described in fig. 1.
The embodiment of the invention discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a SLAM-based computer vision large space positioning method described in figure 1.
Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.
The embodiment of the present invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing a computer program product, and when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of the method in the above method embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated modules, if implemented as software functional modules and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.
The method and the system for computer vision large space positioning based on SLAM disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A computer vision large space positioning method based on SLAM is characterized by comprising the following steps:
the method comprises the steps that a shooting module shoots a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module detects user attitude information;
the transmission module transmits the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module processes the scene image to obtain scene coordinate information and a user trunk posture;
the SLAM module constructs an instant map according to the user posture information and the scene coordinate information;
the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module transmits the virtual reality large-space scene to the display module;
and the display module displays the virtual reality large space scene.
2. The method of claim 1, wherein the image preprocessing module processes the scene imagery to obtain scene coordinate information and a user torso pose, comprising:
recognizing the user posture image in the scene image by adopting a deep visual neural network to obtain the trunk posture of the user;
filtering the user gesture image in the scene image to obtain a pure scene image;
and identifying the depth information of the pure scene image to obtain the scene coordinate information.
3. The method of claim 1, wherein the SLAM module constructs an instant map from the user pose information and the scene coordinate information, comprising:
the scene coordinate information is obtained by adopting a direct density method through a global minimum space specification function,
Figure 426531DEST_PATH_IMAGE001
Figure 891011DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 559889DEST_PATH_IMAGE003
it is shown that,
Figure 349991DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 369900DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the current scene image can be obtained according to the scene coordinate information or the user posture information of any object in the previous scene image, wherein the scene coordinate information or the user posture information of any object in the previous scene image is obtained
Figure 903649DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 958193DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
4. The method of claim 3, wherein the content generation module generates a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture, and obtains a virtual reality large space scene, including:
generating the user virtual model according to a preset material template and the trunk posture of the user;
rendering the instant map according to the preset material template;
and synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
5. The method according to any one of claims 1 to 4, further comprising:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
6. A SLAM-based computer vision large space positioning system, comprising:
the shooting module is used for shooting a scene image in a user visual field range, wherein the scene image comprises a user posture image;
the inertial measurement module is used for detecting user posture information;
the transmission module is used for transmitting the scene image and the user posture information from a user terminal to a server terminal;
the shooting module, the inertia measurement module, the transmission module and the display module form the user terminal, and the number of the user terminals is at least two; the server side comprises the transmission module, an image preprocessing module, an SLAM module, a content generation module and an image processing acceleration module;
the image preprocessing module is used for processing the scene image to obtain scene coordinate information and a trunk posture of a user;
the SLAM module is used for constructing an instant map according to the user posture information and the scene coordinate information;
the content generation module is used for generating a user virtual model in the instant map according to the scene coordinate information, the user posture information and the user trunk posture to obtain a virtual reality large-space scene;
the transmission module is further used for transmitting the virtual reality large space scene to the display module;
and the display module is used for displaying the virtual reality large space scene.
7. The system of claim 6, wherein the image pre-processing module comprises:
the gesture collection submodule is used for identifying the user gesture image in the scene image by adopting a deep visual neural network to obtain the trunk gesture of the user;
the gesture filtering submodule is used for filtering the user gesture image in the scene image to obtain a pure scene image;
and the coordinate identification submodule is used for identifying the depth information of the pure scene image to obtain the scene coordinate information.
8. The system of claim 6, wherein the SLAM module uses a direct dense method to find scene coordinate information through a global minimum spatial norm function,
Figure 653616DEST_PATH_IMAGE001
Figure 895242DEST_PATH_IMAGE002
where k denotes a current time scene picture, and (k-1) denotes a previous frame scene picture, and thus,
Figure 967103DEST_PATH_IMAGE003
it is shown that,
Figure 876153DEST_PATH_IMAGE004
representing the scene coordinate information corresponding to any object in the previous frame of scene image, performing global minimum space specification processing on the scene coordinate information corresponding to each frame of scene image in real time, and updating the scene coordinate information corresponding to the previous frame of scene image to obtain the scene coordinate information corresponding to the previous frame of scene image
Figure 8057DEST_PATH_IMAGE005
Namely, constructing an obtained instant map at the current moment;
in addition, the scene coordinate information or the user posture information of any object in the scene image at the current moment can be obtained by obtaining the user posture information according to the scene coordinate information of any object in the scene image of the previous frame, wherein the user posture information is obtained by calculation
Figure 471400DEST_PATH_IMAGE006
In the form of a circumferential ratio,
Figure 346952DEST_PATH_IMAGE007
the depth distance between any object and the shooting module is obtained.
9. The system of claim 8, wherein the content generation module comprises:
the model generation submodule is used for generating the user virtual model according to a preset material template and the user trunk posture;
the map rendering submodule is used for rendering the instant map according to the preset material template;
and the scene synthesis submodule is used for synthesizing the instant map and the user virtual model to obtain the virtual reality large-space scene.
10. The system according to any one of claims 6 to 9, further comprising:
the image processing acceleration module is used for accelerating the image processing flow of the image preprocessing module, the SLAM module and the content generation module when the image preprocessing module, the SLAM module and the content generation module run.
CN201911206522.6A 2019-11-29 2019-11-29 SLAM-based computer vision large space positioning method and system Withdrawn CN111158463A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911206522.6A CN111158463A (en) 2019-11-29 2019-11-29 SLAM-based computer vision large space positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911206522.6A CN111158463A (en) 2019-11-29 2019-11-29 SLAM-based computer vision large space positioning method and system

Publications (1)

Publication Number Publication Date
CN111158463A true CN111158463A (en) 2020-05-15

Family

ID=70556320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911206522.6A Withdrawn CN111158463A (en) 2019-11-29 2019-11-29 SLAM-based computer vision large space positioning method and system

Country Status (1)

Country Link
CN (1) CN111158463A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051424A (en) * 2021-03-26 2021-06-29 联想(北京)有限公司 Positioning method and device based on SLAM map

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106896925A (en) * 2017-04-14 2017-06-27 陈柳华 The device that a kind of virtual reality is merged with real scene
CN107168532A (en) * 2017-05-05 2017-09-15 武汉秀宝软件有限公司 A kind of virtual synchronous display methods and system based on augmented reality
CN107820593A (en) * 2017-07-28 2018-03-20 深圳市瑞立视多媒体科技有限公司 A kind of virtual reality exchange method, apparatus and system
CN108022302A (en) * 2017-12-01 2018-05-11 深圳市天界幻境科技有限公司 A kind of sterically defined AR 3 d display devices of Inside-Out
CN109358754A (en) * 2018-11-02 2019-02-19 北京盈迪曼德科技有限公司 A kind of mixed reality wears display system
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
CN109584295A (en) * 2017-09-29 2019-04-05 阿里巴巴集团控股有限公司 The method, apparatus and system of automatic marking are carried out to target object in image
CN109671118A (en) * 2018-11-02 2019-04-23 北京盈迪曼德科技有限公司 A kind of more people's exchange methods of virtual reality, apparatus and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106896925A (en) * 2017-04-14 2017-06-27 陈柳华 The device that a kind of virtual reality is merged with real scene
CN107168532A (en) * 2017-05-05 2017-09-15 武汉秀宝软件有限公司 A kind of virtual synchronous display methods and system based on augmented reality
CN107820593A (en) * 2017-07-28 2018-03-20 深圳市瑞立视多媒体科技有限公司 A kind of virtual reality exchange method, apparatus and system
CN109584295A (en) * 2017-09-29 2019-04-05 阿里巴巴集团控股有限公司 The method, apparatus and system of automatic marking are carried out to target object in image
CN108022302A (en) * 2017-12-01 2018-05-11 深圳市天界幻境科技有限公司 A kind of sterically defined AR 3 d display devices of Inside-Out
CN109387204A (en) * 2018-09-26 2019-02-26 东北大学 The synchronous positioning of the mobile robot of dynamic environment and patterning process in faced chamber
CN109358754A (en) * 2018-11-02 2019-02-19 北京盈迪曼德科技有限公司 A kind of mixed reality wears display system
CN109671118A (en) * 2018-11-02 2019-04-23 北京盈迪曼德科技有限公司 A kind of more people's exchange methods of virtual reality, apparatus and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
万琴等: "基于三维视觉系统的多运动目标跟踪方法综述", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051424A (en) * 2021-03-26 2021-06-29 联想(北京)有限公司 Positioning method and device based on SLAM map

Similar Documents

Publication Publication Date Title
US10460512B2 (en) 3D skeletonization using truncated epipolar lines
US10674142B2 (en) Optimized object scanning using sensor fusion
US9855496B2 (en) Stereo video for gaming
CN107341827B (en) Video processing method, device and storage medium
CN111710036B (en) Method, device, equipment and storage medium for constructing three-dimensional face model
CN110363133B (en) Method, device, equipment and storage medium for sight line detection and video processing
US20130127827A1 (en) Multiview Face Content Creation
CN113706699B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN106296598B (en) 3 d pose processing method, system and camera terminal
CN106896925A (en) The device that a kind of virtual reality is merged with real scene
US20160210761A1 (en) 3d reconstruction
CN109255749A (en) From the map structuring optimization in non-autonomous platform of advocating peace
CN112819875B (en) Monocular depth estimation method and device and electronic equipment
CN111833457A (en) Image processing method, apparatus and storage medium
CN111667588A (en) Person image processing method, person image processing device, AR device and storage medium
KR20230078777A (en) 3D reconstruction methods, devices and systems, media and computer equipment
US20210035326A1 (en) Human pose estimation system
CN115482556A (en) Method for key point detection model training and virtual character driving and corresponding device
CN111158463A (en) SLAM-based computer vision large space positioning method and system
CN111383313B (en) Virtual model rendering method, device, equipment and readable storage medium
CN111079535B (en) Human skeleton action recognition method and device and terminal
CN113010009B (en) Object sharing method and device
CN114882106A (en) Pose determination method and device, equipment and medium
CN114299262A (en) Display method and device for augmented reality AR scene
CA3172140A1 (en) Full skeletal 3d pose recovery from monocular camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200515

WW01 Invention patent application withdrawn after publication