WO2022161386A1 - 一种位姿确定方法以及相关设备 - Google Patents
一种位姿确定方法以及相关设备 Download PDFInfo
- Publication number
- WO2022161386A1 WO2022161386A1 PCT/CN2022/073944 CN2022073944W WO2022161386A1 WO 2022161386 A1 WO2022161386 A1 WO 2022161386A1 CN 2022073944 W CN2022073944 W CN 2022073944W WO 2022161386 A1 WO2022161386 A1 WO 2022161386A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target object
- information
- terminal
- pose
- image
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 230000002159 abnormal effect Effects 0.000 claims description 42
- 238000004590 computer program Methods 0.000 claims description 17
- 208000028752 abnormal posture Diseases 0.000 claims description 16
- 230000008569 process Effects 0.000 abstract description 15
- 230000005856 abnormality Effects 0.000 abstract 2
- 238000004891 communication Methods 0.000 description 36
- 230000006854 communication Effects 0.000 description 36
- 238000010586 diagram Methods 0.000 description 35
- 238000012545 processing Methods 0.000 description 32
- 230000006870 function Effects 0.000 description 30
- 238000007726 management method Methods 0.000 description 21
- 230000000007 visual effect Effects 0.000 description 20
- 238000004422 calculation algorithm Methods 0.000 description 16
- 230000005236 sound signal Effects 0.000 description 13
- 238000010295 mobile communication Methods 0.000 description 11
- 210000000988 bone and bone Anatomy 0.000 description 10
- 230000004807 localization Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000001133 acceleration Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 229920001621 AMOLED Polymers 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000011022 operating instruction Methods 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 241001464837 Viridiplantae Species 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000016776 visual perception Effects 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 1
- 241001417527 Pempheridae Species 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 238000010009 beating Methods 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000002945 steepest descent method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/54—Extraction of image or video features relating to texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/63—Control of cameras or camera modules by using electronic viewfinders
- H04N23/633—Control of cameras or camera modules by using electronic viewfinders for displaying additional information relating to control or operation of the camera
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/66—Remote control of cameras or camera parts, e.g. by remote control devices
- H04N23/661—Transmitting camera control signals through networks, e.g. control via the Internet
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- the present application relates to the field of image processing, and in particular, to a pose determination method and related equipment.
- the problem to be solved by the visual positioning technology is how to use the images or videos captured by the camera for positioning, so as to accurately locate the position and posture of the camera in the real world.
- the visual localization problem is a hot issue in the field of computer vision in recent years, and it is very challenging. It is of great significance in many fields such as augmented reality, interactive virtual reality, robot visual navigation, public scene monitoring, and intelligent transportation.
- Current localization algorithms mainly rely on visual global features to perform image retrieval to determine candidate frames, perform feature matching based on visual local features, determine the correspondence between image 2D key points and 3D point clouds, and then accurately estimate the camera pose.
- the present application provides a method for determining a pose, which uses a target object in a scene to achieve high-precision pose positioning when high-precision pose information cannot be determined.
- the present application provides a method for determining a pose, the method comprising:
- the target object is around the position where the terminal is located, and the target object is not in the In the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to when the terminal shoots the target object, and the second pose information does not satisfy the requirements.
- the method further includes:
- the target image includes the target object
- the target object is located around the position of the terminal, including: the target object and the position of the terminal are within a preset distance range, the target object and the terminal The location of the terminal is within the map of the same area, and there are no other obstacles between the target object and the location of the terminal.
- the method before the displaying prompt information for indicating the shooting target object, the method further includes:
- the method before the displaying prompt information for indicating the shooting target object, the method further includes:
- the target object that meets the preset condition is determined from a digital map, wherein the digital map includes multiple objects, and the multiple objects are around the location of the terminal.
- the object, the preset conditions include at least one of the following:
- the terminal moves from the location to at least one object among the plurality of objects that requires a shorter moving distance.
- the information of the target object includes at least one of the following information: the position of the target object, the image, name and category of the target object; correspondingly, the prompt information includes the following At least one kind of information: the position of the target object, the navigation information from the position where the terminal is located to the position of the target object, the image, name and category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- displaying prompt information for indicating the shooting target object including:
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the present application provides a method for determining a pose, the method comprising:
- first pose information where the first pose information is determined according to the first image, and the first pose information represents the pose corresponding to when the terminal shoots the first image;
- a target object is determined according to the position of the terminal, wherein the target object is around the position of the terminal, and the target object is not there in the first image;
- the target object is used to obtain second pose information
- the second pose information represents the pose corresponding to the terminal when the target object is photographed
- the first pose information The two pose information does not satisfy the abnormal pose condition.
- the method further includes:
- the information of the target object includes at least one of the following information: a position of the target object, an image, a name, and a category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the first pose information is determined based on the first 3D point cloud information corresponding to the first object in the digital map; or,
- the second pose information is determined based on the second 3D point cloud information corresponding to the target object in the digital map, and the point cloud density of the second 3D point cloud information is higher than that of the first 3D point cloud Information on the point cloud density.
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the determining the target object according to the location of the terminal includes:
- the target object that meets the preset condition is determined from a digital map, wherein the digital map includes multiple objects, and the multiple objects are around the location of the terminal.
- the object, the preset conditions include at least one of the following:
- the terminal moves from the location to at least one object among the plurality of objects that requires a shorter moving distance.
- the acquiring the first pixel position of the target object in the target image includes:
- the first pixel position of the target object in the target image sent by the terminal is received.
- the obtaining the first position information corresponding to the target object in the digital map includes:
- the first position information corresponding to the target object is determined in the digital map according to the target image.
- the obtaining the first position information corresponding to the target object in the digital map includes:
- the first position information corresponding to the target object in the digital map sent by the terminal is received.
- the determining the second pose information according to the first pixel position and the first position information includes:
- the second pose information is determined according to the 2D-3D correspondence.
- the first position information includes the global pose of the photographing device when the target object is photographed in advance; correspondingly, the second pose information represents the corresponding position when the information terminal photographed the target image. the global pose.
- the present application provides a device for determining a pose, the device comprising:
- an acquisition module for acquiring the first image
- a pose determination module configured to determine first pose information according to the first image, where the first pose information represents the pose corresponding to when the terminal shoots the first image;
- a display module configured to display prompt information for instructing a target object to be photographed when the first posture information satisfies the abnormal posture condition; wherein the target object is around the position where the terminal is located, and the The target object is not in the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to the terminal when the target object is photographed, and the second pose information The pose information does not satisfy the abnormal pose condition.
- the obtaining module is used for:
- the target image includes the target object
- the target object is located around the position of the terminal, including: the target object and the position of the terminal are within a preset distance range, the target object and the terminal The location of the terminal is within the map of the same area, and there are no other obstacles between the target object and the location of the terminal.
- the obtaining module is used for:
- the device also includes:
- a sending module configured to send the location of the terminal to the server
- a receiving module configured to receive the information of the target object sent by the server, wherein the target object is determined by the server based on the location of the terminal.
- the obtaining module is used for:
- the target object that meets the preset condition is determined from a digital map, wherein the digital map includes multiple objects, and the multiple objects are around the location of the terminal.
- the object, the preset conditions include at least one of the following:
- the terminal moves from the location to at least one object among the plurality of objects that requires a shorter moving distance.
- the information of the target object includes at least one of the following information: the position of the target object, the image, name and category of the target object; correspondingly, the prompt information includes the following At least one kind of information: the position of the target object, the navigation information from the position of the terminal to the position of the target object, the image, name and category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the sending module is configured to send the first pose information to the server;
- the acquiring module is configured to receive a message sent by the server to indicate the first pose The information satisfies the first information of the abnormal posture condition;
- the display module is configured to display prompt information for indicating the shooting target object according to the first information.
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the present application provides a device for determining a pose, the device comprising:
- an acquiring module configured to acquire first pose information, where the first pose information is determined according to the first image, and the first pose information represents the pose corresponding to when the terminal shoots the first image;
- a target object determination module configured to determine a target object according to the position of the terminal when the first pose information satisfies the abnormal pose condition, wherein the target object is around the position of the terminal, and the target object is not in the first image;
- a sending module configured to send the information of the target object to the terminal, the target object is used to obtain second pose information, and the second pose information represents the pose corresponding to the terminal when shooting the target object , and the second pose information does not satisfy the abnormal pose condition.
- the obtaining module is used for:
- the information of the target object includes at least one of the following information: a position of the target object, an image, a name, and a category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the first pose information is determined based on the first 3D point cloud information corresponding to the first object in the digital map; or,
- the second pose information is determined based on the second 3D point cloud information corresponding to the target object in the digital map, and the point cloud density of the second 3D point cloud information is higher than that of the first 3D point cloud Information on the point cloud density.
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the target object determination module is configured to determine the target object that meets a preset condition from a digital map according to the location of the terminal, wherein the digital map includes a plurality of objects, the multiple objects are objects around the location where the terminal is located, and the preset condition includes at least one of the following:
- the terminal moves from the location to at least one object among the plurality of objects that requires a shorter moving distance.
- the obtaining module is used for:
- first position information corresponding to the target object in the digital map, wherein the first position information represents the position of the target object in the digital map
- the second pose information is determined according to the first pixel position and the first position information.
- the obtaining module is specifically used for:
- the first pixel position of the target object in the target image sent by the terminal is received.
- the obtaining module is specifically used for:
- the first position information corresponding to the target object is determined in the digital map according to the target image.
- the obtaining module is specifically used for:
- the first position information corresponding to the target object in the digital map sent by the terminal is received.
- the obtaining module is specifically used for:
- the second pose information is determined according to the 2D-3D correspondence.
- the first position information includes the global pose corresponding to the first image obtained by the shooting device shooting the target object; correspondingly, the second pose information indicates that the terminal shoots the target The global pose corresponding to the image.
- the present application provides a pose determination apparatus, including: a display screen; a camera; one or more processors; a memory; a plurality of application programs; and one or more computer programs. Wherein, one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. When the instruction is executed by the pose determining device, the pose determining device is caused to perform the steps described in the first aspect and any possible implementation manner of the first aspect.
- the present application provides a server, comprising: one or more processors; a memory; a plurality of application programs; and one or more computer programs. Wherein, one or more computer programs are stored in the memory, the one or more computer programs comprising instructions. The instructions, when executed by one or more processors, cause the one or more processors to perform the steps described in the second aspect and any possible implementations of the second aspect above.
- the present application provides a computer storage medium, including computer instructions, when the computer instructions are run on an electronic device or server, the first aspect and any one of the possible implementations of the first aspect and the second aspect are executed. and the steps described in any one of the possible implementation manners of the second aspect.
- the present application provides a computer program product, when the computer program product is run on an electronic device or a server, the implementation of any one of the first aspect and the first aspect, the second aspect and the second Any one of the aspects may implement the steps of any one of the modes.
- An embodiment of the present application provides a method for determining a pose, the method includes: acquiring a first image; determining first pose information according to the first image, where the first pose information indicates that a terminal shoots the first pose The pose corresponding to the image; when the first pose information satisfies the abnormal pose condition, prompt information for indicating the target object to be photographed is displayed; wherein, the target object is around the position where the terminal is located, and the target object is not in the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to the terminal when the target object is photographed, and the The second pose information does not satisfy the abnormal pose condition.
- the target object in the scene is used for pose positioning, and the valid information in the scene is used to realize the confirmation of the pose information with higher precision; and
- prompt information to guide the user to shoot the target object is displayed, and the user is guided to shoot the target object, so as to avoid the situation that the user does not know how to operate or scans an invalid target object.
- FIG. 1 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- FIG. 2a is a block diagram of a software structure of a terminal device according to an embodiment of the application.
- 2b is a block diagram of a server structure according to an embodiment of the application.
- 2c is a structural block diagram of a pose determination system according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of an embodiment of a pose determination method provided by an embodiment of the present application.
- FIG. 4a is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 4b is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 5 is a schematic diagram of a terminal interface in an embodiment of the application.
- FIG. 6 is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 7 is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 8a is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 8b is a schematic diagram of a terminal interface in an embodiment of the present application.
- FIG. 9a is a schematic diagram of a pose determination method provided by an embodiment of the present application.
- FIG. 9b is a schematic diagram of offline data collection provided by an embodiment of the present application.
- FIG. 9c is a schematic diagram of a pose determination method provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of a terminal interface in an embodiment of the application.
- FIG. 11 is a schematic diagram of a terminal interface in an embodiment of the application.
- FIG. 12 is a schematic diagram of a pose determination method provided by an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of a pose determination device provided by an embodiment of the application.
- FIG. 14 is a schematic structural diagram of a pose determination device provided by an embodiment of the application.
- FIG. 15 is a schematic structural diagram of a terminal device provided by an embodiment of the application.
- FIG. 16 is a schematic structural diagram of a server provided by an embodiment of the present application.
- FIG. 1 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
- SIM subscriber identification module
- the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
- the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the terminal 100 .
- the terminal 100 may include more or less components than shown, or some components may be combined, or some components may be separated, or different component arrangements.
- the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
- application processor application processor, AP
- modem processor graphics processor
- ISP image signal processor
- controller video codec
- digital signal processor digital signal processor
- baseband processor baseband processor
- neural-network processing unit neural-network processing unit
- the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
- a memory may also be provided in the processor 110 for storing instructions and data.
- the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
- the processor 110 may include one or more interfaces.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
- I2C integrated circuit
- I2S integrated circuit built-in audio
- PCM pulse code modulation
- PCM pulse code modulation
- UART universal asynchronous transceiver
- MIPI mobile industry processor interface
- GPIO general-purpose input/output
- SIM subscriber identity module
- USB universal serial bus
- the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
- the processor 110 may contain multiple sets of I2C buses.
- the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
- the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the terminal 100 .
- the I2S interface can be used for audio communication.
- the processor 110 may contain multiple sets of I2S buses.
- the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
- the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
- the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
- the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
- the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
- the UART interface is a universal serial data bus used for asynchronous communication.
- the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
- the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
- the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
- the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
- MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
- the processor 110 communicates with the camera 193 through the CSI interface, so as to realize the shooting function of the terminal 100 .
- the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the terminal 100 .
- the GPIO interface can be configured by software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
- the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
- the USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transmit data between the terminal 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
- the interface can also be used to connect other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the terminal 100 .
- the terminal 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
- the charging management module 140 is used to receive charging input from the charger.
- the charger may be a wireless charger or a wired charger.
- the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
- the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
- the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
- the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
- the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
- the power management module 141 may also be provided in the processor 110 .
- the power management module 141 and the charging management module 140 may also be provided in the same device.
- the wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
- Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
- Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
- the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
- the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the terminal 100.
- the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
- the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
- the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
- at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
- at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
- the modem processor may include a modulator and a demodulator.
- the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
- the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
- the modem processor may be a stand-alone device.
- the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
- the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
- WLAN wireless local area networks
- BT wireless fidelity
- GNSS global navigation satellite system
- frequency modulation frequency modulation, FM
- NFC near field communication technology
- infrared technology infrared, IR
- the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
- the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
- the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for
- the antenna 1 of the terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
- the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
- global positioning system global positioning system, GPS
- global navigation satellite system global navigation satellite system, GLONASS
- Beidou navigation satellite system beidou navigation satellite system, BDS
- quasi-zenith satellite system quadsi -zenith satellite system, QZSS
- SBAS satellite based augmentation systems
- the terminal 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
- the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
- Display screen 194 is used to display images, videos, and the like.
- Display screen 194 includes a display panel.
- the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
- LED diode AMOLED
- flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
- the terminal 100 may include one or N display screens 194 , where N is a positive integer greater than one.
- the terminal 100 can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
- the ISP is used to process the data fed back by the camera 193 .
- the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
- ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
- ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
- the ISP may be provided in the camera 193 .
- Camera 193 is used to capture still images or video.
- the object is projected through the lens to generate an optical image onto the photosensitive element.
- the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
- CMOS complementary metal-oxide-semiconductor
- the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
- the ISP outputs the digital image signal to the DSP for processing.
- DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
- the terminal 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
- a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point, and so on.
- Video codecs are used to compress or decompress digital video.
- Terminal 100 may support one or more video codecs.
- the terminal 100 can play or record videos in various encoding formats, for example, moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
- MPEG moving picture experts group
- the NPU is a neural-network (NN) computing processor.
- NN neural-network
- Applications such as intelligent cognition of the terminal 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
- the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
- the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
- Internal memory 121 may be used to store computer executable program code, which includes instructions.
- the internal memory 121 may include a storage program area and a storage data area.
- the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
- the storage data area may store data (such as audio data, phone book, etc.) created during the use of the terminal 100 and the like.
- the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
- the processor 110 executes various functional applications and data processing of the terminal 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
- the terminal 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
- the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
- Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
- the terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
- the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
- the voice can be answered by placing the receiver 170B close to the human ear.
- the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
- the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
- the terminal 100 may be provided with at least one microphone 170C.
- the terminal 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals.
- the terminal 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
- the earphone jack 170D is used to connect wired earphones.
- the earphone interface 170D can be the USB interface 130, or can be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA cellular telecommunications industry association of the USA
- the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
- the pressure sensor 180A may be provided on the display screen 194 .
- the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
- the terminal 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A.
- the terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
- touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
- the gyro sensor 180B may be used to determine the motion attitude of the terminal 100 .
- the angular velocity of terminal 100 about three axes ie, x, y, and z axes
- the gyro sensor 180B can be used for image stabilization.
- the gyroscope sensor 180B detects the angle at which the terminal 100 shakes, calculates the distance to be compensated by the lens module according to the angle, and allows the lens to counteract the shake of the terminal 100 through reverse motion to achieve anti-shake.
- the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
- the air pressure sensor 180C is used to measure air pressure.
- the terminal 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
- the magnetic sensor 180D includes a Hall sensor.
- the terminal 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
- the terminal 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
- the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
- the terminal 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the terminal 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
- Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
- the light emitting diodes may be infrared light emitting diodes.
- the terminal 100 emits infrared light to the outside through light emitting diodes.
- the terminal 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100 . When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100 .
- the terminal 100 can use the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power.
- Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
- the ambient light sensor 180L is used to sense ambient light brightness.
- the terminal 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
- the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
- the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket, so as to prevent accidental touch.
- the fingerprint sensor 180H is used to collect fingerprints.
- the terminal 100 can use the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take a picture with the fingerprint, answer the incoming call with the fingerprint, and the like.
- the temperature sensor 180J is used to detect the temperature.
- the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
- the terminal 100 when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature.
- the terminal 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
- Touch sensor 180K also called “touch device”.
- the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
- the touch sensor 180K is used to detect a touch operation on or near it.
- the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
- Visual output related to touch operations may be provided through display screen 194 .
- the touch sensor 180K may also be disposed on the surface of the terminal 100 , which is different from the position where the display screen 194 is located.
- the bone conduction sensor 180M can acquire vibration signals.
- the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
- the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
- the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
- the audio module 170 can analyze the voice signal based on the vibration signal of the voice vibration bone block obtained by the bone conduction sensor 180M, and realize the voice function.
- the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
- the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
- the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100 .
- Motor 191 can generate vibrating cues.
- the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
- touch operations acting on different applications can correspond to different vibration feedback effects.
- the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
- Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
- the touch vibration feedback effect can also support customization.
- the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
- the SIM card interface 195 is used to connect a SIM card.
- the SIM card can be contacted and separated from the terminal 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
- the terminal 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
- the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
- the SIM card interface 195 can also be compatible with different types of SIM cards.
- the SIM card interface 195 is also compatible with external memory cards.
- the terminal 100 interacts with the network through the SIM card to realize functions such as calls and data communication.
- the terminal 100 employs an eSIM, ie an embedded SIM card.
- the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100 .
- the software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
- the embodiments of the present invention take an Android system with a layered architecture as an example to illustrate the software structure of the terminal 100 as an example.
- FIG. 2a is a software structural block diagram of the terminal 100 according to an embodiment of the present disclosure.
- the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
- the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
- the application layer can include a series of application packages.
- the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
- the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
- the application framework layer includes some predefined functions.
- the application framework layer may include a window manager, content provider, view system, telephony manager, resource manager, notification manager, etc.
- a window manager is used to manage window programs.
- the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
- Content providers are used to store and retrieve data and make these data accessible to applications.
- the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
- the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
- a display interface can consist of one or more views.
- the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
- the telephony manager is used to provide the communication function of the terminal 100 .
- the management of call status including connecting, hanging up, etc.).
- the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
- the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
- the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
- Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
- the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
- the application layer and the application framework layer run in virtual machines.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
- a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
- surface manager surface manager
- media library Media Libraries
- 3D graphics processing library eg: OpenGL ES
- 2D graphics engine eg: SGL
- the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
- the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
- the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
- 2D graphics engine is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
- the workflow of the software and hardware of the terminal 100 is exemplarily described below with reference to the photographing scene.
- a corresponding hardware interrupt is sent to the kernel layer.
- the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
- the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon, as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer, and then starts the camera driver by calling the kernel layer.
- the camera 193 captures still images or video.
- This embodiment of the present application further provides a server 1300 .
- the server 1300 may include a processor 1310 and a transceiver 1320, and the transceiver 1320 may be connected with the processor 1310, as shown in FIG. 2b.
- the transceiver 1320 may include a receiver and a transmitter, and may be used to receive or transmit messages or data, and the transceiver 1320 may be a network card.
- the server 1300 may also include an acceleration component (which may be referred to as an accelerator), and when the acceleration component is a network acceleration component, the acceleration component may be a network card.
- the processor 1310 may be the control center of the server 1300, and uses various interfaces and lines to connect various parts of the entire server 1300, such as the transceiver 1320 and the like.
- the processor 1310 may be a central processing unit (Central Processing Unit, CPU).
- the processor 1310 may include one or more processing units.
- the processor 1310 may also be a digital signal processor, an application specific integrated circuit, a field programmable gate array, a GPU, or other programmable logic device, or the like.
- the server 1300 may further include a memory 1330, which can be used to store software programs and modules.
- the processor 1310 reads the software codes and modules stored in the memory 1330 to execute various functional applications and data processing of the server 1300.
- An embodiment of the present application also provides a system for determining a pose, as shown in FIG. 2c, the system may include a terminal device and a server.
- the terminal device may be a mobile terminal, a human-computer interaction device, or a vehicle-mounted visual perception device, such as a mobile phone, a floor sweeper, an intelligent robot, an unmanned vehicle, an intelligent monitor, an Augmented Reality (AR) wearable device, and the like.
- AR Augmented Reality
- the methods provided by the embodiments of the present disclosure can be used in application fields such as human-computer interaction, vehicle-mounted visual perception, augmented reality, intelligent monitoring, and unmanned driving.
- FIG. 3 is a schematic diagram of an embodiment of a pose determination method provided by an embodiment of the present application.
- the pose determination method provided by the present application includes:
- the terminal in order to display the AR interface, may acquire a video stream shot by the terminal, and the first image is an image frame in the video stream shot by the terminal.
- the terminal in order to display the AR interface, may acquire a video stream captured by the terminal, and acquire, based on the video stream, the pose corresponding to when the terminal captures the video stream.
- the following describes how to obtain the pose corresponding to the terminal when the video stream is captured based on the video stream.
- the terminal can calculate its own posture and attitude through the data obtained by the camera equipment it carries and some sensors related to positioning, and the terminal can also use the camera equipment it carries and some sensors related to positioning.
- the received data is sent to the server on the cloud side, and the server calculates the terminal pose and sends the calculated pose to the terminal.
- the terminal device can obtain the obtained video stream based on its own shooting device, and the location information of the terminal (for example, based on the Global Positioning System (Global Positioning System, GPS) or location-based services (Location-Based Services, LBS)), real-time positioning and map building (simultaneous localization and mapping, SLAM) poses at historical moments, positioning poses at historical moments, etc. data is sent to the server.
- the SLAM pose at the historical moment is the change of the SLAM pose during the previous online positioning recorded by the terminal device
- the positioning pose at the historical moment is the positioning pose result of the previous online positioning recorded by the terminal device.
- the server can extract an image from the received video stream as an input frame, and then extract the global features of the input frame, and use the global features to search for images similar to the input frame in the digital map to obtain multiple candidate frames.
- the searched candidate frame and the input frame have a common view relationship.
- the so-called common view relationship means that the searched candidate frame is within X meters near the position of the input frame, and the shooting angle is within Y degrees.
- X and Y can be preset values.
- a digital map is a repository for organizing, storing and managing map data. It can include images with scene map data, feature data (including global features and local features) and point cloud data, images of 3D object data, point clouds, and feature data (including global features and local features), which are offline. Added to the digital map after registration processing. How to construct a digital map will be described in subsequent embodiments, and details will not be repeated here.
- the server can extract the local features of the input frame, perform image matching between the input frame and the multi-frame candidate frames, and obtain the 2D-2D correspondence.
- the 2D point and point of the candidate frame can be obtained from the digital map.
- the server can calculate the pose of the input frame through the pose solving algorithm, that is, the preliminary result of the pose of the terminal device.
- the pose solving algorithm may include but is not limited to a pose solving algorithm of perspective n points (perspective n points, pnp), a pose solving algorithm of perspective 2 points (perspective 2 points, p2p), and so on.
- the above description takes the calculation of the pose information where the terminal device is located by the server as an example.
- the terminal device itself completes the calculation of the pose information as an example for description:
- the terminal device can obtain the video stream captured by its own shooting device, the location information of the terminal (for example, the location information obtained based on the Global Positioning System (GPS) or the Location Based Services (LBS) ) obtained location information), real-time localization and map construction (simultaneous localization and mapping, SLAM) poses at historical moments, positioning poses at historical moments and other data.
- GPS Global Positioning System
- LBS Location Based Services
- SLAM real-time localization and map construction
- SLAM simultaneous localization and mapping
- the terminal device can extract the local features of the input frame, perform image matching between the input frame and the multi-frame candidate frames, and obtain the 2D-2D correspondence.
- the terminal device can calculate the pose of the input frame through the pose solving algorithm based on the 2D-3D correspondence between the input frame and the point cloud.
- the pose in this embodiment of the present application may include three-dimensional position coordinates, yaw angle, pitch angle, and roll angle of the terminal device when the image is captured.
- an augmented reality AR interface may be displayed based on the pose corresponding to the video stream, wherein the AR interface may include the The preview stream corresponding to the video stream.
- the terminal may display the AR interface based on the pose, where the AR interface may include the environment in which the current terminal device is located.
- the image (preview stream) and the logo generated based on its own pose information.
- the AR interface is an AR navigation interface
- the logo can be a navigation guide
- the AR interface is a scene explanation AR interface, such as an exhibit explanation interface in a museum
- the identification can be the indicator mark of the exhibit.
- the first image is a frame in the video stream
- the pose corresponding to the video stream shooting includes the first pose information
- the first pose information Indicates the pose corresponding to when the terminal captures the first image.
- the first posture information satisfies the abnormal posture condition
- display prompt information for indicating the target object to be photographed; wherein the target object is around the position where the terminal is located, and the target object is not In the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to the terminal when the target object is photographed, and the second pose information does not The abnormal pose condition is satisfied.
- the terminal can display the AR interface based on the pose corresponding to the video stream.
- the terminal when calculating the pose corresponding to the first image in the video stream, the terminal is in the position when the first image is captured. Due to environmental reasons, the pose accuracy of the obtained pose calculation result (first pose information) satisfies the pose abnormal condition, which may include that the pose information cannot be obtained; or, the currently determined pose information and the correct pose The deviation between pose information is larger than the threshold.
- the inability to obtain the pose information can be understood as the inability to obtain the pose information within the T1 time, or the pose information cannot be calculated based on the image, for example, the terminal cannot always receive the pose information calculated by the server within the T1 time, or, The terminal receives an instruction fed back by the server that the pose information cannot be calculated, or the terminal itself cannot calculate the pose information based on the image.
- T1 may be a preset time
- T1 may be a value within 0-0.5 seconds, for example, T1 may be 0.1 seconds or 0.3 seconds.
- the correct pose information can be understood as the pose information that can be calculated by the server based on a standard digital map, and the correct pose information can objectively and correctly represent the current pose of the terminal;
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold value can be understood as: the deviation between the currently determined pose information and the current correct pose of the terminal is too large.
- a corresponding threshold can be set, or a position coordinate point (including X coordinate, Y coordinate, and Z coordinate) can be set.
- the remaining angle information (yaw angle ⁇ 1, pitch angle ⁇ 2, and roll angle ⁇ 3) are respectively set with corresponding thresholds.
- the corresponding threshold can be set to a value between 0-2m.
- the threshold corresponding to the X and Y coordinates can be set to 0.5m or 1m, and for the Z coordinate, it can be set to 0- A value between 0.5m.
- the threshold corresponding to the Z coordinate can be set to 0.1m or 0.25m.
- the corresponding threshold can be set between 0-10 degrees A value of , for example, the thresholds corresponding to the yaw angle ⁇ 1, the pitch angle ⁇ 2, and the roll angle ⁇ 3 can be set to 5 degrees or 4 degrees.
- the deviation between any of the 6 degrees of freedom and the corresponding correct pose value exceeds the corresponding threshold, it is considered that the deviation between the currently determined pose information and the correct pose information is greater than Threshold, or when the deviation between several degrees of freedom specified in the 6 degrees of freedom and the corresponding correct pose value exceeds the corresponding threshold, it is considered that the deviation between the currently determined pose information and the correct pose information is greater than the threshold .
- the pose calculation result may also include a confidence level corresponding to the pose information.
- the confidence level may be determined based on the reprojection error, the number of inliers, etc., which is not limited in the embodiments of the present application. .
- the confidence is too low, it can be considered that the deviation between the currently determined pose information and the correct pose information is greater than the threshold. For example, if the full score of the confidence is set to be 1.0, when the confidence level is lower than 0.6, it is considered that the currently determined pose The deviation between the information and the correct pose information is greater than the threshold, or when the confidence level is lower than 0.7, it is considered that the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the so-called significant visual features refer to objects with a high degree of visual recognition of texture features.
- the so-called high recognition degree of texture features is a Refers to the fact that there are few objects in the world that have the same texture features as the target object, and the target object can be determined based on the texture features of the target object), such as museum cultural relics, statues in parks, etc.
- Using these objects as positioning targets can greatly improve The positioning success rate, but in the existing digital map modeling process, video sequences are collected from a large range of scenes according to a fixed line, and then the sparse point cloud of the scene is generated through offline processing, and there is only a small amount of sparseness for a single object. Point clouds cannot meet the requirements for 3D object positioning. Therefore, 3D objects can be separately collected and processed offline to generate dense data such as point clouds and images.
- the server can determine the preliminary result of the calculated pose (first pose information), and if the first pose information satisfies the abnormal pose condition, it can be determined from the digital map based on the location of the terminal Objects located around the terminal (referred to as target objects in this embodiment), and send information including the target objects to the terminal.
- the so-called around the terminal can be understood as the position where the target object and the terminal are located within a preset distance range. Due to the short distance, the user can easily move to the vicinity of the target object.
- the so-called around the terminal also means It can be understood that the target object and the location of the terminal are in the map of the same area.
- the target object and the first object are both in the museum, and the user can easily move to the vicinity of the target object.
- so-called around the terminal it can also be understood that there are no other obstacles between the target object and the position where the terminal is located.
- the digital map may include pre-collected information of multiple objects, and the information may include but not limited to the position of the object, the image of the object, the point cloud of the object, and the like.
- the server determines that the accuracy of the real-time pose of the terminal device meets the abnormal pose condition, it can obtain objects (including target objects) around the location of the terminal from the digital map, and send information indicating these objects to the terminal. Further, the terminal can display the information of these objects on the target interface. Further, the terminal may capture a target image including these objects, and re-determine the pose based on the target image.
- the target object is not in the first image
- the first image does not include any part of the target object
- the first image includes only a part of the target object
- Another part of the target object is not in the first image
- the part of the target object included in the first image is not sufficient for determining the position information of the terminal.
- the digital map may include 3D point cloud information of multiple objects, wherein the first object in the digital map corresponds to the first 3D point cloud information, the target object corresponds to the second 3D point cloud information in the digital map, and The point cloud density of the second 3D point cloud information is higher than the point cloud density of the first 3D point cloud information.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the target object is a small and medium-sized object, so that the user can capture the whole picture of the target object under the shooting parameters of the current terminal.
- the so-called relatively fixed physical position does not mean that the target object cannot be moved, but refers to the natural state of the target object. In a stationary state relative to the ground, such as in a museum scene, the target object can be an exhibit.
- the digital map includes a plurality of objects around the location of the terminal, and the server or the terminal may select at least one object (including a target object) from the plurality of objects based on preset conditions.
- the server or the terminal may select at least one object (including a target object) from the plurality of objects based on preset conditions. The following describes how to select at least one object from multiple objects based on preset conditions:
- the target object that satisfies a preset condition may be determined from a digital map according to the location of the terminal, wherein the digital map includes multiple objects, and the multiple objects are objects around the location where the terminal is located, and the preset conditions include at least one of the following:
- the terminal moves from the location to at least one object among the plurality of objects that requires a shorter moving distance.
- At least one object that is closer to the location where the terminal is located may be selected from multiple objects, or at least one object that is closer to the location where the terminal is located may be selected from multiple objects. At least one object that has no other obstacles between the positions, or selects from a plurality of objects to move the terminal from the position to at least one object that requires less moving distance among the plurality of objects.
- the server may send a target interface display indication to the terminal device, the target interface display indication may include information of the target object, and correspondingly, the terminal device may display prompt information instructing to photograph the target object.
- the terminal device can calculate the first pose information by itself, and determine the preliminary result of the calculated pose (first pose information). If the first pose information satisfies the abnormal pose condition, such as the first The solution of the pose information fails or the deviation of the pose accuracy of the first pose information from the correct value is greater than the threshold, then the object located within a certain distance of the terminal can be determined from the digital map based on its own position (called in this embodiment). the target object).
- the terminal device can make a judgment on the calculated preliminary result of the pose (first pose information), and if the first pose information satisfies the abnormal pose condition, it can send an indication that the pose accuracy meets the abnormal pose condition to the server,
- the server may determine an object (called a target object in this embodiment) located within a certain distance of the terminal from the digital map, and send the information including the target object to the terminal.
- the terminal device may receive the information of the target object sent by the server.
- the information of the target object may include the position of the target object; correspondingly, the terminal device may display the position of the target object, or display the position from the position of the terminal to the position of the target object navigation information.
- the information of the target object may also include the image, name and/or category of the target object.
- the terminal device may display the image, name and/or category of the target object, wherein the image may be obtained by photographing the target object in advance.
- the name can be the specific name of the target object. For example, in a museum scene, the name of the target object can be the name of the exhibit, the serial number of the exhibit, the category of the exhibit, and so on.
- the server can send the information of multiple objects located near the location of the terminal device to the terminal device, and the target object is one of the multiple objects.
- the target interface may include multiple pieces of information indicating the target object. , the user can select one of several objects.
- the terminal device can obtain the information of the target object from the digital map by itself.
- FIG. 4a is a schematic diagram of a terminal interface in an embodiment of the present application.
- the terminal device may display an application navigation interface, wherein the interface shown in FIG. 4a includes an AR navigation application, and the user can open the application navigation interface.
- AR navigation application and further, the terminal may display an interface as shown in FIG. 4b, which is a schematic diagram of a terminal interface in an embodiment of the application, wherein, as shown in FIG. 4b, the AR navigation interface may include images captured by the terminal device.
- a preview stream and a navigation identifier wherein the navigation identifier is generated based on the real-time pose information corresponding to the preview stream obtained by the terminal device when photographed.
- the terminal device may display the terminal interface as shown in FIG. 5 , wherein FIG. 5 may include an indicator for indicating the current positioning failure, for indicating that the target object-based positioning is enabled
- the control for determining the pose information (as shown in FIG. 5 , the open object recognition and positioning control) may also include a repositioning control.
- the user can click to open the object recognition and positioning control, and in response to the user's operation of clicking to open the object recognition and positioning control, the terminal device can display the terminal interface shown in FIG. 6 , wherein the terminal interface may include instructions for indicating The prompt information for the user to photograph the target object, such as the name of the target object shown in FIG. 6 (“A”, “B”, “C” and “D” shown in FIG. "Position 1", “Position 2", “Position 3" and “Position 4" shown in 6).
- the terminal device may display a terminal interface as shown in FIG. 7 , wherein the terminal interface may include information of the target object (for example, from the location of the terminal to the The navigation information of the position of the target object), such as the navigation interface shown in FIG. 6 , where the navigation interface can be a flat map, including an indication of the location of the terminal device and an indication of the location of the target object in the flat map.
- the terminal interface may include information of the target object (for example, from the location of the terminal to the The navigation information of the position of the target object), such as the navigation interface shown in FIG. 6 , where the navigation interface can be a flat map, including an indication of the location of the terminal device and an indication of the location of the target object in the flat map.
- the terminal may acquire a target image captured by the user according to the prompt information, where the target image includes the target object.
- the user can find the location of the target object according to the prompt.
- the terminal device can display the name, image or location information of at least one exhibit, and the user can select one exhibit (target object) and find the location of the target object based on the name, image or location information.
- the user may photograph the target object to obtain the target image, or the user may photograph the target image to obtain the video stream, where the target image is an image frame in the video stream.
- the terminal device can display Fig. 8a or Fig. 8b
- the shown shooting interface as shown in Figure 7, if the user reaches the vicinity of the target object, he can click the "Start Shooting" control displayed on the terminal interface in Figure 7.
- the terminal device can display The shooting interface shown in Figure 8a or Figure 8b.
- the terminal device may send the target image to the server, so that the server can calculate the pose information of the terminal device based on the target image.
- the terminal device after acquiring the video stream including the target image, can send the video stream to the server, so that the server can calculate the pose information of the terminal device based on the target image in the video stream.
- the terminal device may calculate the pose information of the terminal device based on the target image.
- the terminal device may calculate the pose information of the terminal device based on the target image in the video stream.
- FIG. 8a is a schematic diagram of an interface for photographing a target object displayed by a terminal device, and a user can obtain a target image including the target object by photographing the target object through the photographing interface shown in FIG. 8a.
- Fig. 8b is a schematic diagram of an interface for photographing a target object displayed by a terminal device.
- the user can scan the target object through the photographing interface shown in Fig. 8b to obtain a video stream including the target object.
- the second pose information may be obtained according to the target object in the target image.
- the first pixel position of the target object in the target image may be obtained, and the first position information corresponding to the target object in the digital map may be obtained, where the first position information indicates the The position of the target object in the digital map is determined, and the second pose information is determined according to the first pixel position and the first position information.
- the terminal device may acquire the first pixel position of the target object in the target image.
- the determination of the first pixel position can be done independently by the terminal device, or realized by interaction between the terminal device and the server, that is, the server determines the first pixel position and sends the first pixel position to the terminal device.
- the terminal device may acquire the first position information corresponding to the target object in the digital map.
- the determination of the first location information may be completed independently by the terminal device, or implemented by interaction between the terminal device and the server, that is, the server determines the first location information and sends the first location information to the terminal device.
- the terminal device may acquire the second pose information.
- the step of determining the second pose information according to the first pixel position and the first position information may be independently completed by the terminal device, or implemented by the interaction between the terminal device and the server, that is, the server The second pose information is determined, and the second pose information is sent to the terminal device.
- the terminal device may send the target image to the server, and receive the second pose information sent by the server, where the pose information is the target image by the server according to the target object.
- the first pixel position in the digital map and the corresponding first position information of the target object in the digital map are determined, and the first position information indicates the position of the target object in the digital map.
- the second pose information is determined according to the first pixel position and the 2D-3D correspondence of the first position information, wherein the 2D-3D correspondence indicates that the target object is in the target image.
- the terminal device may acquire the first pixel position of the target object in the target image, send the first pixel position in the target image to the server, and receive the second pixel position sent by the server.
- pose information wherein the pose information is determined by the server according to the first pixel position of the target object in the target image and the first position information corresponding to the target object in the digital map, and the The first position information represents the position of the target object in the digital map.
- the terminal device may acquire the first position information corresponding to the target object in the digital map, where the first position information indicates the position of the target object in the digital map, and the first position information indicates the position of the target object in the digital map.
- the server sends the target image and the first position information, and receives the second pose information sent by the server, wherein the pose information is based on the first position of the target object in the target image by the server.
- a pixel position and first position information corresponding to the target object in the digital map are determined, and the first position information indicates the position of the target object in the digital map.
- the target object is a landmark object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position, and the texture feature of the target object is more than that of the first object. Therefore, the second pose information determined based on the target object does not meet the abnormal pose condition, for example, the solution of the second pose information is successful, and the difference from the correct pose information is less than threshold.
- the video frame sequence of the target object can be collected in advance, for example, the video frame sequence of the target object can be collected in a 360-degree circle, and the video frame sequence can be processed to perform 3D objects offline. Modeling, outputting multiple images of the target object, the local pose and global pose of each image, 3D point cloud data, and more. It can collect the video frame sequence of the scene that needs to be located, process the video frame sequence of the scene, complete the sparse reconstruction of the scene offline, and output the scene image database.
- the scene image database can include scene image data, global pose and point cloud.
- the image containing the target object is searched, and the multi-frame associated images of the scene map are output.
- Extract the features of the multi-frame correlation images, and do image matching with the local pose of the target object and output the 2D-2D correspondence between the multi-frame correlation image features and the target object image features, and based on the multi-frame correlation map and the target object.
- the 2D-3D correspondence of the 3D point cloud, the relative relationship between the multi-frame associated frame and the target object can be solved by the pose solving algorithm, and the global pose of the target object can be calculated by combining the global pose of the associated frame. posture.
- the digital map may include the data and global pose of the target object obtained by the above calculation.
- the graph optimization algorithm can also be optimized for the global pose, so as to obtain a more robust global pose of the target object.
- the poses of the three frames P1, P2, and P3 are calculated from the multi-frame correlation graph, and the line connecting the feature points X1 to X6 on the target object and the optical center of the camera will intersect the image.
- the difference needs to be minimized to obtain the optimal camera. pose.
- BA optimization Solving this optimal problem is called BA optimization, which can be calculated by using the LM (Levenbrg-Marquardt) algorithm and the sparse nature of the BA model on this basis.
- the LM algorithm is the steepest descent method (gradient descent method) and Gauss- Newton's Binding.
- the traditional pose solution is to use the 2D features of the 3D object image and the 2D-3D relationship of the 3D point cloud of the scene to solve the 3D object pose.
- the extraction is a visual feature in a large-scale environment, and the features and point clouds extracted from 3D objects in the scene are sparse. Use this sparse point cloud to match the images of 3D objects. Matching and pose solution, the accuracy and success rate are not optimal.
- the reverse pose solution is to use the scene image and the dense 3D object point cloud for matching and pose solution, so that the accuracy and success rate of positioning are greatly improved.
- the first pixel position may be a feature point of the target object in the target image or a pixel position of a feature line.
- the feature point may be the corner point of the target object in the target image
- the feature line may be the edge line of the target object in the target image, which is not limited in this embodiment.
- the first position information may include 3D object point cloud information of the target object in the digital map, and the first position information may also include the corresponding information when the shooting device captures the target object to obtain the first image. global pose.
- the finally calculated second pose information may represent the global pose corresponding to the terminal when the target image is captured.
- the 2D-3D correspondence between the first pixel position and the first position information may be acquired, wherein the 2D-3D correspondence represents the two positions of the target object in the target image.
- the correspondence between the dimensional coordinates and the three-dimensional coordinates in the actual space, and the second pose information is determined according to the 2D-3D correspondence.
- the second pose information can be calculated through a pose solving algorithm, and the pose solving algorithm can include but is not limited to perspective n
- the pose solution algorithm of one point perspective n points, pnp
- the pose solution algorithm of perspective 2 points perspective 2 points, p2p
- object recognition may be performed on the target image first.
- a neural network model based on deep learning can be used to identify the target object in the target image, and output a preliminary pose information of the terminal device for shooting the target object; and then extract the local visual features (the first pixel position in the target image) ), do 2D-2D matching with the image of the target object in the digital map, and then combine the 3D object point cloud (first position information) in the digital map to obtain the 2D-3D correspondence, and input the 2D-3D correspondence
- the pose solution algorithm performs pose solution, and finally obtains a more accurate 3D object pose (second pose information).
- the 3D point cloud of point P1 in the digital map is X1-X4, and P2 and P3 are similar.
- the second pose information may include the yaw angle, pitch angle, and roll angle of the terminal device when the target image is captured.
- the terminal device is positioned while moving towards the target object (the lion statue shown in Fig. 9a); at time T1, the global pose for visual positioning is Tvps_1, and the local pose for SLAM positioning is Tslam_1; At time T2, the global pose of visual positioning is Tvps_2, and the local pose of SLAM positioning is Tslam_2; at time T3, the obtained second pose information is T3d_3, and the local pose of SLAM positioning is Tslam_3; at time T4, the obtained second pose information is Tslam_3.
- the second pose information is T3d_4, and the SLAM positioning local pose is Tslam_4; at time T5, the acquired second pose information is Tvps_5, and the SLAM positioning local pose is Tslam_5;
- the result of the acquired second pose information is the global pose, and there is such a constraint between these poses: that is, the transformation matrix between the global poses at any two moments should be the same as the transformation matrix between the local SLAM poses at the corresponding moments. equal. According to this constraint, graph optimization is used to minimize the difference between the two transformation matrices, and the optimized second pose information at time T5 is output.
- the terminal interface as shown in FIG. 10 may be displayed,
- the terminal interface is used to indicate that the positioning is successful, and further, the terminal device can return to the AR interface (for example, the AR navigation interface shown in FIG. 11 ).
- the terminal device may also obtain the pose of the terminal device and determine the real-time pose according to the second pose information and the obtained pose change of the terminal device.
- the terminal device can use the acquired second pose information as the initial pose, and determine the pose change of the terminal device through the simultaneous localization and mapping (slam) tracking technology.
- the initial pose and the pose change of the terminal determine the real-time pose.
- the terminal device can perform navigation, route planning, obstacle avoidance and other processing based on the real-time pose. For example, when performing path planning, the terminal device performs path planning according to the coordinate position, and obtains a planned path, wherein the starting point or end point of the planned path is the coordinate position, and a two-dimensional navigation interface is displayed, and the two-dimensional navigation interface is displayed.
- the dimension navigation interface includes the planned path.
- an AR navigation interface is displayed, where the AR navigation interface includes an image of the environment where the terminal device is currently located and a navigation guide, where the navigation guide is determined based on the yaw angle, pitch angle, and roll angle of the terminal device.
- the terminal device can also obtain the preview stream of the current scene; according to the second pose information, determine the preset media content included in the digital map corresponding to the scene in the preview stream; render the media content in the preview stream.
- Fig. 10 is an interface displayed after the terminal obtains the second pose information.
- the second pose information does not meet the abnormal pose condition, which is equivalent to the difference between the second pose information and the pose.
- the difference between the correct values is smaller than the threshold, that is, the second pose information can correctly represent the current pose of the terminal, and then, as shown in Figure 11, the AR navigation application can continue with the current pose of the terminal.
- the interface for AR navigation is displayed.
- the terminal device is a mobile phone or an AR wearable device, etc.
- a virtual scene can be constructed based on the pose information.
- the terminal device can obtain the preview stream of the current scene, for example, the user can shoot the preview stream of the current environment in a shopping mall.
- the terminal device may determine the second pose information as the initial pose according to the method mentioned above.
- the terminal device can obtain a digital map, the digital map records the three-dimensional coordinates of each position in the world coordinate system, and the preset three-dimensional coordinate position has corresponding preset media content, and the terminal can determine the real-time pose in the digital map The corresponding three-dimensional coordinates of the target.
- the preset media content is acquired. For example, when a user shoots a target store, the terminal recognizes the real-time pose and determines that the current camera is shooting at a target store, and the preset media content corresponding to the target store can be obtained.
- the preset media content corresponding to the target store can be It is the description information of the target store, such as which products in the target store are worth buying. Based on this, the terminal can render the media content in the preview stream.
- the user can view the preset media content corresponding to the target store in the preset area near the image corresponding to the target store in the mobile phone. After viewing the preset media content corresponding to the target store, the user can have a general understanding of the target store.
- Different digital maps can be set for different places, so that when the user moves to other places, the preset media content corresponding to the real-time pose can also be obtained based on the method of rendering media content provided in the embodiment of the present disclosure, and in the preview stream Render media content.
- An embodiment of the present application provides a method for determining a pose, the method includes: acquiring a first image; determining first pose information according to the first image, where the first pose information indicates that a terminal shoots the first pose The pose corresponding to the image; when the first pose information satisfies the abnormal pose condition, prompt information for indicating the target object to be photographed is displayed; wherein, the target object is around the position where the terminal is located, and the target object is not in the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to the terminal when the target object is photographed, and the The second pose information does not satisfy the abnormal pose condition.
- the target object in the scene is used for pose positioning, and the valid information in the scene is used to realize the confirmation of the pose information with higher precision; and
- prompt information to guide the user to shoot the target object is displayed, and the user is guided to shoot the target object, so as to avoid the situation that the user does not know how to operate or scans an invalid target object.
- FIG. 12 is a schematic diagram of an embodiment of a pose determination method provided by an embodiment of the present application. As shown in FIG. 12, the pose determination method provided by the present application includes:
- the server obtains first pose information, where the first pose information is determined according to a first image, and the first pose information represents the pose corresponding to when the terminal shoots the first image;
- step 1201 For the specific description of step 1201, reference may be made to the description related to the server acquiring the first pose information in step 301 and step 302, which will not be repeated here.
- step 1202 For the specific description of step 1202, reference may be made to the description in step 302 that the server obtains the location where the terminal is located, and details are not repeated here.
- the first posture information satisfies the abnormal posture condition, determine a target object according to the position of the terminal, wherein the target object is around the position of the terminal, and the target object is not in the first image;
- step 1203 For the specific description of step 1203, reference may be made to the description related to acquiring the information of the target object in step 303, which will not be repeated here.
- step 1204 For the specific description of step 1204, reference may be made to the description related to sending the information of the target object to the terminal in step 303, which will not be repeated here.
- the server may also acquire a target image sent by the terminal, where the target image includes the target object;
- the information of the target object includes at least one of the following information: a position of the target object, an image, a name, and a category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the first pose information is determined based on the first 3D point cloud information corresponding to the first object in the digital map or,
- the second pose information is determined based on the second 3D point cloud information corresponding to the target object in the digital map, and the point cloud density of the second 3D point cloud information is higher than that of the first 3D point cloud Information on the point cloud density.
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- the determining the target object according to the location of the terminal includes:
- the target object is determined from a digital map according to the location of the terminal, wherein the digital map includes a plurality of objects, and the target object is around the location of the terminal among the plurality of objects object.
- the server may also obtain the first pixel position of the target object in the target image; obtain first position information corresponding to the target object in the digital map, wherein the first pixel position of the target object is obtained.
- the location information represents the location of the target object in the digital map;
- the second pose information is determined according to the first pixel position and the first position information.
- the server may also receive the first pixel position of the target object in the target image sent by the terminal.
- the server may also receive the target image sent by the terminal;
- the first position information corresponding to the target object is determined in the digital map according to the target image.
- the server may also receive the first position information corresponding to the target object in the digital map sent by the terminal.
- the server may further acquire a 2D-3D correspondence between the first pixel position and the first position information, wherein the 2D-3D correspondence indicates that the target object is in the target The correspondence between the two-dimensional coordinates in the image and the three-dimensional coordinates in the actual space;
- the second pose information is determined according to the 2D-3D correspondence.
- the first position information includes the global pose of the photographing device when the target object is photographed in advance; correspondingly, the second pose information represents the corresponding position when the information terminal photographed the target image. the global pose.
- An embodiment of the present application provides a method for determining a pose.
- the method includes: acquiring first pose information, where the first pose information is determined according to a first image, and the first pose information indicates that the terminal shoots the pose corresponding to the first image; obtain the position of the terminal; when the first pose information satisfies the abnormal pose condition, determine the target object according to the position of the terminal, wherein the The target object is around the position where the terminal is located, and the target object is not in the first image; and the information of the target object is sent to the terminal, and the target object is used to obtain the second pose information , the second pose information represents the pose corresponding to when the terminal shoots the target object, and the second pose information does not satisfy the abnormal pose condition.
- the target object in the scene is used to locate the pose, and the valid information in the scene is used to realize the confirmation of the pose information.
- FIG. 13 is a schematic structural diagram of a pose determination apparatus provided by an embodiment of the application, as shown in FIG. 13 .
- the pose determination device 1300 includes:
- a pose determination module 1302 configured to determine first pose information according to the first image, where the first pose information represents the pose corresponding to when the terminal shoots the first image;
- step 302 For the specific description of the pose determination module 1302, reference may be made to the description in the embodiment corresponding to step 302, which will not be repeated here.
- the display module 1303 is configured to display prompt information for indicating the shooting target object when the first posture information meets the abnormal posture condition; wherein, the target object is around the position where the terminal is located, and all The target object is not in the first image; the target object is used to obtain second pose information, the second pose information represents the pose corresponding to the terminal when the target object is photographed, and the second pose information The pose information does not satisfy the abnormal pose condition.
- step 304 For the specific description of the display module 1303, reference may be made to the description in the embodiment corresponding to step 304, and details are not repeated here.
- the acquiring module 1301 is configured to acquire a target image captured by a user according to the prompt information, where the target image includes the target object;
- the target object is around the position of the terminal, including: the target object and the position of the terminal are within a preset distance range, the target object and the terminal The location of the terminal is within the map of the same area, and there are no other obstacles between the target object and the location of the terminal.
- the obtaining module 1301 is used for:
- the device also includes:
- a sending module configured to send the location of the terminal to the server
- a receiving module configured to receive the information of the target object sent by the server, wherein the target object is determined by the server based on the location of the terminal.
- the obtaining module 1301 is used for:
- the information of the target object is obtained from a digital map, wherein the digital map includes multiple objects, and the target object is one of the multiple objects where the terminal is located. objects around the location.
- the information of the target object includes at least one of the following information: the position of the target object, the image, name and category of the target object; correspondingly, the prompt information includes the following At least one kind of information: the position of the target object, the navigation information from the position where the terminal is located to the position of the target object, the image, name and category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the sending module is configured to send the first pose information to the server;
- the acquiring module is configured to receive a message sent by the server and used to indicate the first pose The information satisfies the first information of the abnormal posture condition;
- the display module is configured to display prompt information for indicating the shooting target object according to the first information.
- the abnormal pose conditions include:
- the deviation between the currently determined pose information and the correct pose information is greater than the threshold.
- FIG. 14 is a schematic structural diagram of a pose determination device provided by an embodiment of the application, as shown in FIG. 14 .
- the pose determination device 1400 includes:
- the obtaining module 1401 is configured to obtain the first pose information, the first pose information is determined according to the first image, and the first pose information represents the pose corresponding to when the terminal shoots the first image; obtain the location of the terminal;
- the target object determination module 1402 is configured to determine a target object according to the position of the terminal when the first pose information satisfies the abnormal pose condition, wherein the target object is around the position of the terminal, and the target object is not in the first image;
- step 1203 For the specific description of the target object determination module 1402, reference may be made to the description in the embodiment corresponding to step 1203, which will not be repeated here.
- the sending module 1403 is configured to determine a target object according to the position of the terminal when the first posture information meets the abnormal posture condition, wherein the target object is around the position of the terminal, and the target object is not in the first image.
- the obtaining module 1401 is used for:
- the information of the target object includes at least one of the following information: a position of the target object, an image, a name, and a category of the target object.
- the target object is an iconic object that can be completely imaged under the shooting parameters of the current terminal and has a relatively fixed physical position.
- the first image includes a first object
- the first object is used to determine the first pose information
- the texture feature of the target object is larger than the texture feature of the first object Features are more recognizable.
- the first pose information is determined based on the first 3D point cloud information corresponding to the first object in the digital map; or,
- the second pose information is determined based on the second 3D point cloud information corresponding to the target object in the digital map, and the point cloud density of the second 3D point cloud information is higher than that of the first 3D point cloud Information on the point cloud density.
- the abnormal pose conditions include:
- the target object determination module is configured to determine the target object from a digital map according to the location of the terminal, wherein the digital map includes a plurality of objects, and the target The object is an object around the position where the terminal is located among the plurality of objects.
- the obtaining module 1401 is used for:
- first position information corresponding to the target object in the digital map, wherein the first position information represents the position of the target object in the digital map
- the second pose information is determined according to the first pixel position and the first position information.
- the obtaining module 1401 is specifically used for:
- the first pixel position of the target object in the target image sent by the terminal is received.
- the obtaining module 1401 is specifically used for:
- the first position information corresponding to the target object is determined in the digital map according to the target image.
- the obtaining module 1401 is specifically used for:
- the first position information corresponding to the target object in the digital map sent by the terminal is received.
- the obtaining module is specifically used for:
- the second pose information is determined according to the 2D-3D correspondence.
- the first position information includes the global pose corresponding to the first image obtained by the shooting device shooting the target object; correspondingly, the second pose information indicates that the terminal shoots the target The global pose corresponding to the image.
- FIG. 15 is a schematic structural diagram of the terminal device provided by the embodiment of the present application.
- the device 1500 may specifically be represented as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, a smart wearable device, etc., which is not limited here.
- the terminal device 1500 includes: a receiver 1501, a transmitter 1502, a processor 1503, and a memory 1504 (wherein the number of processors 1503 in the terminal device 1500 may be one or more, and one processor is taken as an example in FIG. 15 ) , wherein the processor 1503 may include an application processor 15031 and a communication processor 15032.
- the receiver 1501, the transmitter 1502, the processor 1503, and the memory 1504 may be connected by a bus or otherwise.
- Memory 1504 may include read-only memory and random access memory, and provides instructions and data to processor 1503 .
- a portion of memory 1504 may also include non-volatile random access memory (NVRAM).
- NVRAM non-volatile random access memory
- the memory 1504 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
- the processor 1503 controls the operation of the terminal device.
- various components of the terminal device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
- the various buses are referred to as bus systems in the figures.
- the methods disclosed in the above embodiments of the present application may be applied to the processor 1503 or implemented by the processor 1503 .
- the processor 1503 may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1503 or an instruction in the form of software.
- the above-mentioned processor 1503 may be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), a field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable Field-programmable gate array
- the processor 1503 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
- the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
- the storage medium is located in the memory 1504, and the processor 1503 reads the information in the memory 1504, and completes the steps of the above method in combination with its hardware. Specifically, the processor 1503 can read the information in the memory 1504, and complete the steps related to data processing in steps 301 to 303 in the above embodiment in combination with its hardware.
- the receiver 1501 can be used to receive input digital or character information, and generate signal input related to the related settings and function control of the terminal device.
- the transmitter 1502 can be used to output digital or character information through the first interface; the transmitter 1502 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1502 can also include a display device such as a display screen .
- FIG. 16 is a schematic structural diagram of the server provided by the embodiment of the present application.
- the server 1600 may be Large differences in configuration or performance may include one or more central processing units (CPUs) 1616 (eg, one or more processors) and memory 1632, and one or more storage applications 1642 or storage medium 1630 for data 1644 (eg, one or more mass storage devices).
- the memory 1632 and the storage medium 1630 may be short-term storage or persistent storage.
- the program stored in the storage medium 1630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
- the central processing unit 1616 may be configured to communicate with the storage medium 1630 to execute a series of instruction operations in the storage medium 1630 on the server 1600 .
- Server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input and output interfaces 1658; or, one or more operating systems 1641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
- operating systems 1641 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
- the central processing unit 1616 can complete the steps related to data processing in steps 1201 to 1204 in the above embodiment.
- the embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to execute the steps of the pose determination method.
- Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, it causes the computer to execute the methods described in the foregoing embodiments.
- the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- U disk mobile hard disk
- ROM read-only memory
- RAM magnetic disk or optical disk
- a computer device which may be a personal computer, server, or network device, etc.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Telephone Function (AREA)
Abstract
本申请实施例提供了一种位姿确定方法,包括:获取第一图像,并在通过第一图像确定的第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息,通过用户根据提示拍摄得到的目标图像中的目标物体,可获取不满足位姿异常条件的第二位姿信息。本申请在无法进行高精度的位姿信息确定时,利用场景中的目标物体实现了位姿定位,且在进行终端设备的位姿信息确认过程中,显示指引用户拍摄目标物体的提示信息,指引用户拍摄目标物体,避免了用户不知如何操作或是扫描到无效目标物体等情况的出现。
Description
本申请要求于2021年01月30日提交中国专利局、申请号为202110134812.5、发明名称为“一种位姿确定方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及图像处理领域,尤其涉及一种位姿确定方法以及相关设备。
视觉定位技术要解决的问题是如何使用相机所拍摄的图像或者视频来进行定位,精确定位出相机在真实世界中的位置和姿态。视觉定位问题是近些年来计算机视觉领域的热点问题,同时非常具有挑战性,其在增强现实、交互虚拟现实、机器人视觉导航、公共场景监控、智能交通等诸多领域都具有十分重要的意义。当前的定位算法主要依赖视觉全局特征进行图像检索来确定候选帧,基于视觉局部特征来进行特征匹配,确定图像2D关键点与3D点云的对应关系,然后精确估算相机的位姿。
业界现有的视觉定位方案主要依赖视觉特征进行图像搜索和定位。然而基于视觉特征的方案,在一些场景中的效果较差,例如在室内博物馆场景,场地光线条件差,图像大部分区域无法提取出有效的特征点;例如在室外公园场景,视野比较空旷,图像大部分区域被绿植所占据,提取的特征点不能作为有效的匹配点。视觉定位的适用场景存在一些局限性。
发明内容
本申请提供的一种位姿确定方法,在无法进行高精度的位姿信息确定时,利用场景中的目标物体实现了高精度位姿定位。
第一方面,本申请提供了一种位姿确定方法,所述方法包括:
获取第一图像;
根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
在一种可能的实现中,所述方法还包括:
获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息。
在一种可能的实现中,所述目标物体在所述终端所处的位置周围,包括:所述目标物体与所述终端所处的位置在预设距离范围内、所述目标物体与所述终端所处的位置在同一区域的地图内、所述目标物体与所述终端所处的位置之间没有其他障碍物。
在一种可能的实现中,所述显示用于指示拍摄目标物体的提示信息之前,所述方法还包括:
获取所述终端所处的位置;
向服务器发送所述终端所处的位置;
接收所述服务器发送的所述目标物体的信息,其中,所述目标物体为所述服务器基于所述终端所处的位置确定的。
在一种可能的实现中,所述显示用于指示拍摄目标物体的提示信息之前,所述方法还包括:
获取所述终端所处的位置;
根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:
所述多个物体中距离所述终端所处的位置更近的至少一个物体;
所述多个物体中随机确定的至少一个物体;
所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;
所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别;相应的,所述提示信息,包括如下信息的至少一种:所述目标物体的位置、由所述终端所处的位置至所述目标物体的位置的导航信息、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息,包括:
向所述服务器发送所述第一位姿信息;接收所述服务器发送的用于指示所述第一位姿信息满足位姿异常条件的第一信息,并根据所述第一信息,显示用于指示拍摄目标物体的 提示信息。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
第二方面,本申请提供了一种位姿确定方法,所述方法包括:
获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
获取所述终端所处的位置;
当所述第一位姿信息满足位姿异常条件时,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;
向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
在一种可能的实现中,所述方法还包括:
获取终端发送的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的;或,
所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
在一种可能的实现中,所述根据所述终端所处的位置确定目标物体,包括:
根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:
所述多个物体中距离所述终端所处的位置更近的至少一个物体;
所述多个物体中随机确定的至少一个物体;
所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;
所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
在一种可能的实现中,所述获取所述目标物体在目标图像中的第一像素位置,包括:
接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
在一种可能的实现中,所述获取所述目标物体在数字地图中对应的第一位置信息,包括:
接收所述终端发送的目标图像;
根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
在一种可能的实现中,所述获取所述目标物体在数字地图中对应的第一位置信息,包括:
接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
在一种可能的实现中,所述根据所述第一像素位置以及所述第一位置信息确定第二位姿信息,包括:
获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;
根据所述2D-3D对应关系,确定所述第二位姿信息。
在一种可能的实现中,所述第一位置信息包括预先拍摄所述目标对象时拍摄设备的全局位姿;相应的,所述第二位姿信息表示信息终端拍摄所述目标图像时所对应的全局位姿。
第三方面,本申请提供了一种位姿确定装置,所述装置包括:
获取模块,用于获取第一图像;
位姿确定模块,用于根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
显示模块,用于当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所 述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
在一种可能的实现中,所述获取模块,用于:
获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息。
在一种可能的实现中,所述目标物体在所述终端所处的位置周围,包括:所述目标物体与所述终端所处的位置在预设距离范围内、所述目标物体与所述终端所处的位置在同一区域的地图内、所述目标物体与所述终端所处的位置之间没有其他障碍物。
在一种可能的实现中,所述获取模块,用于:
获取所述终端所处的位置;
所述装置还包括:
发送模块,用于向服务器发送所述终端所处的位置;
接收模块,用于接收所述服务器发送的所述目标物体的信息,其中,所述目标物体为所述服务器基于所述终端所处的位置确定的。
在一种可能的实现中,所述获取模块,用于:
获取所述终端所处的位置;
根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:
所述多个物体中距离所述终端所处的位置更近的至少一个物体;
所述多个物体中随机确定的至少一个物体;
所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;
所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别;相应的,所述提示信息,包括如下信息的至少一种:所述目标物体的位置、由所述终端所处的位置至所述目标物体的位置的导航信息、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述发送模块,用于向所述服务器发送所述第一位姿信息;所述获取模块,用于接收所述服务器发送的用于指示所述第一位姿信息满足位姿异常条件的第一信息;所述显示模块,用于根据所述第一信息,显示用于指示拍摄目标物体的提示信息。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
第四方面,本申请提供了一种位姿确定装置,所述装置包括:
获取模块,用于获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
获取所述终端所处的位置;
目标物体确定模块,用于当所述第一位姿信息满足位姿异常条件时,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;
发送模块,用于向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
在一种可能的实现中,所述获取模块,用于:
获取终端发送的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的;或,
所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
在一种可能的实现中,所述目标物体确定模块,用于根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:
所述多个物体中距离所述终端所处的位置更近的至少一个物体;
所述多个物体中随机确定的至少一个物体;
所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;
所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
在一种可能的实现中,所述获取模块,用于:
获取所述目标物体在所述目标图像中的第一像素位置;
获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置;
根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
在一种可能的实现中,所述获取模块,具体用于:
接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
在一种可能的实现中,所述获取模块,具体用于:
接收所述终端发送的目标图像;
根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
在一种可能的实现中,所述获取模块,具体用于:
接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
在一种可能的实现中,所述获取模块,具体用于:
获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;
根据所述2D-3D对应关系,确定所述第二位姿信息。
在一种可能的实现中,所述第一位置信息包括拍摄设备拍摄所述目标对象得到第一图像时所对应的全局位姿;相应的,所述第二位姿信息表示终端拍摄所述目标图像时所对应 的全局位姿。
第五方面,本申请提供了一种位姿确定装置,包括:显示屏;摄像头;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序。其中,一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被位姿确定装置执行时,使得位姿确定装置执行上述第一方面及第一方面中任一项可能实现方式所述的步骤。
第六方面,本申请提供了一种服务器,包括:一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序。其中,一个或多个计算机程序被存储在存储器中,一个或多个计算机程序包括指令。当指令被一个或多个处理器执行时,使得一个或多个处理器执行上述第二方面及第二方面中任一项可能实现方式所述的步骤。
第七方面,本申请提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备或服务器上运行时,执行上述第一方面及第一方面中任一项可能实现方式、第二方面及第二方面中任一项可能实现方式中任一项所述的步骤。
第九方面,本申请提供了一种计算机程序产品,当计算机程序产品在电子设备或服务器上运行时,执行上述第一方面及第一方面中任一项可能实现方式、第二方面及第二方面中任一项可能实现方式中任一项所述的步骤。
本申请实施例提供了一种位姿确定方法,所述方法包括:获取第一图像;根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。通过上述方式,一方面,在无法进行高精度的位姿信息确定时,利用场景中的目标物体进行位姿定位,利用场景中的有效信息,实现了更高精度的位姿信息的确认;且在另一方面,在进行终端设备的位姿信息确认过程中,显示指引用户拍摄目标物体的提示信息,指引用户拍摄目标物体,避免用户不知如何操作或是扫描到无效目标物体等情况的出现。
图1为本申请实施例提供的终端设备的结构示意图;
图2a为本申请实施例的终端设备的软件结构框图;
图2b为本申请实施例的服务器结构框图;
图2c为本申请实施例的位姿确定系统的结构框图;
图3为本申请实施例提供的一种位姿确定方法的实施例示意图;
图4a为本申请实施例中的一种终端界面的示意;
图4b为本申请实施例中的一种终端界面的示意;
图5为本申请实施例中的一种终端界面的示意;
图6为本申请实施例中的一种终端界面的示意;
图7为本申请实施例中的一种终端界面的示意;
图8a为本申请实施例中的一种终端界面的示意;
图8b为本申请实施例中的一种终端界面的示意;
图9a为本申请实施例提供的一种位姿确定方法的示意;
图9b为本申请实施例提供的离线数据采集示意;
图9c为本申请实施例提供的一种位姿确定方法的示意;
图10为本申请实施例中的一种终端界面的示意;
图11为本申请实施例中的一种终端界面的示意;
图12为本申请实施例提供的一种位姿确定方法的示意;
图13为本申请实施例提供的一种位姿确定装置的结构示意;
图14为本申请实施例提供的一种位姿确定装置的结构示意;
图15为本申请实施例提供的终端设备的一种结构示意图;
图16为本申请实施例提供的服务器的一种结构示意图。
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
为便于理解,下面将对本申请实施例提供的终端100的结构进行示例说明。参见图1,图1是本申请实施例提供的终端设备的结构示意图。
如图1所示,终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器 180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现终端100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现终端100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现终端100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为终端100充电,也可以用于终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在本申请另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过终端100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决 方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液 晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏194,N为大于1的正整数。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内 部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行终端100的各种功能应用以及数据处理。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端100可以设置至少一个麦克风170C。在另一些实施例中,终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。终端100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,终端100根据压力传感器180A检测所述触摸操作强度。终端100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定终端100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定终端100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测终端100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,终端100通过气压传感器180C测 得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。终端100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当终端100是翻盖机时,终端100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测终端100在各个方向上(一般为三轴)加速度的大小。当终端100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。终端100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,终端100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端100通过发光二极管向外发射红外光。终端100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端100附近有物体。当检测到不充分的反射光时,终端100可以确定终端100附近没有物体。终端100可以利用接近光传感器180G检测用户手持终端100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。终端100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测终端100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。终端100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,终端100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,终端100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端100对电池142加热,以避免低温导致终端100异常关机。在其他一些实施例中,当温度低于又一阈值时,终端100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解 析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和终端100的接触和分离。终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端100中,不能和终端100分离。
终端100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明终端100的软件结构。
图2a是本公开实施例的终端100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2a所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2a所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于 构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合拍照场景,示例性说明终端100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
本申请实施例还提供了一种服务器1300。
服务器1300可以包括处理器1310、收发器1320,收发器1320可以与处理器1310连接,如图2b所示。收发器1320可以包括接收器和发送器,可以用于接收或者发送消息或数据,收发器1320可以是网卡。服务器1300还可以包括加速部件(可称为加速器),当加 速部件为网络加速部件时,加速部件可以为网卡。处理器1310可以是服务器1300的控制中心,利用各种接口和线路连接整个服务器1300的各个部分,如收发器1320等。在本发明中,处理器1310可以是中央处理器(Central Processing Unit,CPU),可选的,处理器1310可以包括一个或多个处理单元。处理器1310还可以是数字信号处理器、专用集成电路、现场可编程门阵列、GPU或者其他可编程逻辑器件等。服务器1300还可以包括存储器1330,存储器1330可用于存储软件程序以及模块,处理器1310通过读取存储在存储器1330的软件代码以及模块,从而执行服务器1300的各种功能应用以及数据处理。
本申请实施例还提供了一种位姿确定系统,如图2c所示,该系统可以包括终端设备和服务器。其中,终端设备可以是可移动终端、人机交互设备、车载视觉感知设备,如手机、扫地机、智能机器人、无人驾驶车辆、智能监控器、增强现实(Augmented Reality,AR)穿戴设备等。相应地,本公开实施例提供的方法可以用于人机交互、车载视觉感知、增强现实、智能监控、无人驾驶等应用领域中。
为了便于理解,结合附图和应用场景,对本申请实施例提供的一种位姿确定方法进行具体阐述。
参照图3,图3为本申请实施例提供的一种位姿确定方法的实施例示意图,如图3示出的那样,本申请提供的位姿确定方法,包括:
301、获取第一图像。
本申请实施例中,为了进行AR界面的显示,终端可以获取到所述终端拍摄的视频流,所述第一图像为终端拍摄的视频流中的一个图像帧。
302、根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿。
本申请实施例中,为了进行AR界面的显示,终端可以获取到所述终端拍摄的视频流,并基于视频流来获取所述终端拍摄所述视频流时所对应的位姿。
接下来描述,如何基于视频流来获取所述终端拍摄所述视频流时所对应的位姿。
在一种实现中,终端可以通过自身携带的拍摄设备以及一些与定位相关的传感器获取到的数据进行自身位姿的计算,终端也可以通过将自身携带的拍摄设备以及一些与定位相关的传感器获取到的数据发送到云侧的服务器,由服务器进行终端位姿的计算,并将计算得到的位姿发送至终端。
以由服务器进行终端位姿信息的计算为例,具体的,终端设备可以将获取到的基于自身拍摄设备拍摄得到的视频流、终端所处的位置信息(例如基于全球定位系统(Global Positioning System,GPS)获取到的位置信息或基于位置的服务(Location Based Services,LBS)获取到的位置信息)、历史时刻即时定位与地图构建(simultaneous localization and mapping,SLAM)位姿、历史时刻定位位姿等数据发送到服务器。其中,历史时刻SLAM位姿是在终端设备记录的之前做在线定位时的SLAM位姿变化,历史时刻定位位姿是在终端设备记录的之前在线定位时的定位位姿结果。
服务器可以从接收的视频流中提取一帧图像作为输入帧,然后提取出输入帧的全局特征,利用全局特征在数字地图中搜索与输入帧相似的图像,得到多帧候选帧。其中,搜索出的候选帧与输入帧存在共视关系,所谓共视关系,是指搜索出的候选帧是在输入帧位置附近X米以内,拍摄角度相差Y度以内的图像,拍摄内容与输入帧存在共视关系的图像,X、Y可以是预先设定的值。
应理解,数字地图是组织、存储和管理地图数据的仓库。其中可以包括有场景地图数据的图像、特征数据(包括全局特征和局部特征)和点云数据、3D物体数据的图像、点云、特征数据(包括全局特征和局部特征),这些数据是在离线注册处理后添加到数字地图中的。关于如何构建数字地图将在后续的实施例中描述,这里不再赘述。
在获取到候选帧之后,服务器可以提取输入帧的局部特征,对输入帧和多帧候选帧做图像匹配,可以得到2D-2D对应关系,从数字地图中可获取到候选帧的2D点和点云的匹配对,由此可得输入帧和点云的2D-3D对应关系。
服务器可以基于输入帧和点云的2D-3D对应关系,通过位姿求解算法计算出输入帧的位姿,即终端设备的位姿初步结果。其中,位姿求解算法可以包括但不限于透视n个点的位姿求解算法(perspective n points,pnp)、透视2个点的位姿求解算法(perspective 2 points,p2p)等等。
以上以服务器进行终端设备所处的位姿信息的计算为例进行的说明,接下来以终端设备自身完成所处的位姿信息的计算为例进行说明:
终端设备可以获取到基于自身拍摄设备拍摄得到的视频流、终端所处的位置信息(例如基于全球定位系统(Global Positioning System,GPS)获取到的位置信息或基于位置的服务(Location Based Services,LBS)获取到的位置信息)、历史时刻即时定位与地图构建(simultaneous localization and mapping,SLAM)位姿、历史时刻定位位姿等数据。并从视频流中提取一帧图像作为输入帧,然后提取出输入帧的全局特征,利用全局特征在数字地图中搜索与输入帧相似的图像,得到多帧候选帧。在获取到候选帧之后,终端设备可以提取输入帧的局部特征,对输入帧和多帧候选帧做图像匹配,可以得到2D-2D对应关系,从数字地图中可获取到候选帧的2D点和点云的匹配对,由此可得输入帧和点云的2D-3D对应关系。终端设备可以基于输入帧和点云的2D-3D对应关系,通过位姿求解算法计算出输入帧的位姿。
需要说明的是,本申请实施例中的位姿可以包括拍摄图像时,所述终端设备的三维位置坐标、偏航角、俯仰角和横滚角。
在获取到拍摄所述视频流时所对应的位姿的情况下,可以基于所述拍摄所述视频流时所对应的位姿,显示增强现实AR界面,其中,所述AR界面可以包括所述视频流对应的预览流,具体的,终端在获取到拍摄所述视频流时所对应的位姿之后,可以基于位姿进行AR界面的显示,其中,AR界面可以包括当前终端设备所处的环境图像(预览流)以及基于自身的位姿信息生成的标识,例如,若AR界面为AR导航界面,则标识可以为导航指引,若AR界面为场景讲解AR界面,例如博物馆中的展品讲解界面,则标识可以为展品的指示标记。
本申请实施例中,所述第一图像为所述视频流中的一帧,所述拍摄所述视频流时所对应的位姿包括所述第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿。
303、当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
在一些场景中,由于拍摄视频时所处的环境原因,使得无法从提取的输入帧中提取出有效的特征点,例如在某些室内场景,例如博物馆、艺术品展馆等,现场环境光照条件差,墙壁地面大多数是弱纹理和重复纹理,无法提取有效的视觉特征点;在某些室外场景,例如公园、大型广场等,环境中大部分是绿植,不存在显著的建筑物,提取到的视觉特征点是无法用于定位的。
具体的,本实施例中,终端可以基于视频流对应的位姿,来进行AR界面的显示,然而在计算视频流中第一图像对应的位姿时,由于拍摄第一图像时终端所处的环境原因,得到的位姿计算结果(第一位姿信息)的位姿精度满足位姿异常条件,位姿异常条件可以包括无法获取到位姿信息;或,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
其中,无法获取到位姿信息可以理解为在T1时间内无法获取到位姿信息,或者基于图像并不能算出位姿信息,例如终端在T1时间内始终无法接收到服务器计算得到的位姿信息,或者,终端接收到服务器反馈的无法计算出位姿信息的指示,或者终端自身无法基于图像算出位姿信息。应理解,T1可以为预先设定的时间,T1可以是在0-0.5秒内的值,例如T1可以是0.1秒或者0.3秒。
其中,正确位姿信息可以理解为服务器基于标准的数字地图可以算出的位姿信息,该正确位姿信息可以客观上正确的表示终端当前所处的位姿;
其中,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值可以理解为:当前确定出的位姿信息与终端当前所处的正确位姿之间的偏差过大。在一种实现中,以位姿基于6自由度(6DOF)表示为例,针对于每个自由度,可以设置对应的阈值,或者将位置坐标点(包括X坐标、Y坐标、Z坐标)设置对应的阈值,剩余的角度信息(偏航角θ1、俯仰角θ2以及翻滚角θ3)分别设置对应的阈值。具体的,针对于X、Y坐标,对应的阈值可以设置为0-2m之间的一个数值,例如X、Y坐标对应的阈值可以设置为0.5m或者1m,针对于Z坐标可以设置为0-0.5m之间的一个数值,例如Z坐标对应的阈值可以设置为0.1m或者0.25m,针对于偏航角θ1、俯仰角θ2以及翻滚角θ3,对应的阈值可以设置为0-10度之间的一个数值,例如偏航角θ1、俯仰角θ2以及翻滚角θ3对应的阈值可以设置为5度或者4度。在这种情况下,当6自由度中的任意自由度与对应的正确位姿值之间的偏差超过对应的阈值,则认为当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值,或者当6自由度中指定的几个自由度与对应的正确位姿值之间的偏差超过对应的阈值时,认为当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
在一种实现中,位姿计算结果除了包括位姿信息,还可以包括该位姿信息对应的置信度,置信度可以基于重投影误差,内点数量等方式确定,本申请实施例并不限定。当置信度过低时,可以认为当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值,例如假设置信度满分是1.0,当置信度低于0.6则认为当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值,或者当置信度低于0.7则认为当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
在上述场景中,通常存在有显著的视觉特征的物体(所谓有显著的视觉特征,是指物体在视觉上的纹理特征的辨识度较高,具体的,所谓纹理特征的辨识度较高,是指在世界上具有和目标物体相同纹理特征的物体较少,基于目标物体的纹理特征就可以确定出目标物体),例如博物馆的文物、公园里的雕像等,把这些物体作为定位目标可以大大提升定位成功率,但是在现有的数字地图建模过程中,是按照固定的线路对大范围的场景采集视频序列,再通过离线处理生成场景的稀疏点云,针对于单个的物体只有少量的稀疏点云,无法达到用于3D物体定位的需求。因此,可以对3D物体做单独的采集和离线处理,生成稠密的点云和图像等数据。
在一种实现中,服务器可以对算出的位姿初步结果(第一位姿信息)做判定,如果第一位姿信息满足位姿异常条件,则可以基于终端所处的位置从数字地图中确定位于终端周围的物体(本实施例中称之为目标物体),并将包含目标物体的信息发送至终端。其中,所谓在终端周围,可以理解为所述目标物体与所述终端所处的位置在预设距离范围内,由于距离较近,用户可以很容易移动到目标物体附近,所谓在终端周围,也可以理解为所述目标物体与所述终端所处的位置在同一区域的地图内,例如在博物馆的场景中,目标物体和第一物体都在博物馆内,进而用户可以很容易移动到目标物体附近,所谓在终端周围,也可以理解为所述目标物体与所述终端所处的位置之间没有其他障碍物。
具体的,数字地图中可以包括预先采集好的多个物体的信息,信息可以包括但不限于物体的位置、物体的图像、物体的点云等等。在服务器确定终端设备的实时位姿的精度满足位姿异常条件时,可以从数字地图中获取到位于终端所处位置周围的物体(包括目标物体),并将指示这些物体的信息发送至终端,进而,终端可以在目标界面上显示这些物体的信息。进而,终端可以拍摄包括这些物体的目标图像,并基于该目标图像进行位姿的重新确定。
应理解,所述目标物体不在所述第一图像中,在一种实现中,第一图像中不包括目标物体的任何一部分,在一种实现中,第一图像中仅包括目标物体的一部分,目标物体的另一部分不在第一图像中,且第一图像中包括的目标物体的一部分并不足以用于确定终端的位子信息。
具体的,在数字地图中可以包括多个物体的3D点云信息,其中,在数字地图中第一物体对应第一3D点云信息,目标物体在数字地图中对应第二3D点云信息,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
本申请实施例中,所述目标物体为能够在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体,其中,所谓能够在当前终端的拍摄参数下能够完整成像, 可以理解为目标物体为中小型物体,进而用户可以在当前终端的拍摄参数下拍摄到目标物体的全貌,所谓物理位置相对固定,不是指目标物体不可以被移动,而是指在自然状态下,目标物体相对于地面处于静止状态,例如在博物馆的场景中,目标物体可以为展品。
在一种实现中,数字地图中包括多个位于终端所处位置周围的物体,服务器或者终端可以基于预设条件从多个物体中选择至少一个物体(包括目标物体)。接下来描述如何基于预设条件从多个物体中选择至少一个物体:
本申请实施例中,可以根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:
所述多个物体中距离所述终端所处的位置更近的至少一个物体;
所述多个物体中随机确定的至少一个物体;
所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;
所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
为了能够使得携带终端的用户可以方便移动至目标物体附近,可以从多个物体中选择距离所述终端所处的位置更近的至少一个物体,或者从多个物体中选择与所述终端所处的位置之间没有其他障碍物的至少一个物体,或者从多个物体中选择所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
更具体的,服务器可以向终端设备发送目标界面显示指示,目标界面显示指示可以包括目标物体的信息,相应的,终端设备可以显示指示拍摄目标物体的提示信息。
在一种实现中,终端设备可以自己计算第一位姿信息,并对算出的位姿初步结果(第一位姿信息)做判定,如果第一位姿信息满足位姿异常条件,例如第一位姿信息求解失败或者第一位姿信息的位姿精度相比正确值的偏差大于阈值,则可以基于自身所处的位置从数字地图中确定位于终端一定距离内的物体(本实施例中称之为目标物体)。或者,终端设备可以对算出的位姿初步结果(第一位姿信息)做判定,如果第一位姿信息满足位姿异常条件,则可以向服务器发送位姿精度满足位姿异常条件的指示,服务器可以从数字地图中确定位于终端一定距离内的物体(本实施例中称之为目标物体),并将包含目标物体的信息发送至终端。
接下来描述,终端设备如何显示指示拍摄目标物体的提示信息。
在一种实现中,终端设备可以接收到服务器发送的目标物体的信息。
本申请实施例中,目标物体的信息可以包括所述目标物体的位置;相应的,终端设备可以显示所述目标物体的位置,或显示由所述终端所处的位置至所述目标物体的位置的导航信息。目标物体的信息还可以包括所述目标物体的图像、名称和/或类别,相应的,终端设备可以显示所述目标物体的图像、名称和/或类别,其中,图像可以是预先拍摄目标物体得到的,名称可以是目标物体的具体名称,例如在博物馆的场景中,目标物体的名称可以是展品的名称、展品的序号、展品的类别等等。
应理解,服务器可以将位于终端设备所处的位置附近的多个物体的信息发送至终端设备,目标物体为多个物体中的一个,相应的,目标界面上可以包括多个指示目标物体的信 息,用户可以从多个物体中选择一个。
在一种实现中,终端设备可以自己从数字地图中获取目标物体的信息。
具体的,可以参照图4a,图4a为本申请实施例中的一种终端界面的示意,终端设备可以显示应用导航界面,其中,图4a示出的界面中包括AR导航应用,用户可以打开该AR导航应用,进而,终端可以显示如图4b所示的界面,图4b为本申请实施例中的一种终端界面的示意,其中,如图4b所示,AR导航界面可以包括终端设备拍摄的预览流以及导航标识,其中,导航标识为基于终端设备拍摄得到预览流时所对应的实时位姿信息生成的。若实时位姿信息的位姿精度低于阈值,则终端设备可以显示如图5所示的终端界面,其中,图5可以包括用于指示当前定位失败的标识、用于指示开启基于目标物体进行位姿信息确定的控件(如图5中示出的打开物体识别定位控件),此外还可以包括重新定位控件。
如图5所示,用户可以点击打开物体识别定位控件,响应于用户的点击打开物体识别定位控件的操作,终端设备可以显示如图6所示的终端界面,其中,终端界面可以包括用于指示用户拍摄目标物体的提示信息,例如图6中示出的目标物体的名称(图6中示出的“A”、“B”、“C”以及“D”),以及目标物体的位置(图6中示出的“位置1”、“位置2”、“位置3”以及“位置4”)。
此外,响应于用户的点击打开物体识别定位控件的操作,终端设备可以显示如图7所示的终端界面,其中,终端界面可以包括目标物体的信息(例如由所述终端所处的位置至所述目标物体的位置的导航信息),例如图6中示出的导航界面,其中,导航界面可以为平面地图,包括了终端设备所在位置的指示,以及目标物体在平面地图中所在位置的指示。
应理解,图4b至图7示出的终端界面中的界面布局,控件类型仅为一种示意,并不构成对本实施例的限定。
本申请实施例中,终端可以获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体。
本申请实施例中,用户可以根据所述提示找到目标物体所在的位置。
例如,在博物馆的场景中,终端设备可以显示至少一个展品的名称、图像或者位置信息,用户可以选择其中的一个展品(目标物体),并基于名称、图像或者位置信息找到目标物体所在的位置。
进而,用户可以拍摄目标物体得到目标图像,或者用户可以拍摄目标图像得到视频流,目标图像为视频流中的一个图像帧。
接下来,描述终端如何拍摄目标物体得到目标图像。
如图6所示,若用户到达目标物体附近时,可以点击图6中终端界面上显示的“开始拍摄”控件,响应于用户点击“开始拍摄”控件,终端设备可以显示如图8a或图8b所示的拍摄界面,如图7所示,若用户到达目标物体附近时,可以点击图7中终端界面上显示的“开始拍摄”控件,响应于用户点击“开始拍摄”控件,终端设备可以显示如图8a或图8b所示的拍摄界面。
在一种实现中,终端设备在获取到目标图像之后,可以将目标图像发送至服务器,以便服务器基于目标图像进行终端设备的位姿信息计算。
在一种实现中,终端设备在获取到包括目标图像的视频流之后,可以将视频流发送至服务器,以便服务器基于视频流中的目标图像进行终端设备的位姿信息计算。
在一种实现中,终端设备在获取到目标图像之后,可以基于目标图像进行终端设备的位姿信息计算。
在一种实现中,终端设备在获取到包括目标图像的视频流之后,可以基于视频流中的目标图像进行终端设备的位姿信息计算。
如图8a所示,图8a为终端设备显示的拍摄目标物体的界面示意,用户可以通过图8a示出的拍摄界面拍摄目标物体得到包括目标物体的目标图像。
如图8b所示,图8b为终端设备显示的拍摄目标物体的界面示意,用户可以通过图8b示出的拍摄界面扫描目标物体,以得到包括目标物体的视频流。
本申请实施例中,在获取到用户根据所述提示信息拍摄得到的目标图像之后,可以根据所述目标图像中的所述目标物体,获取所述第二位姿信息。
本申请实施例中,可以获取所述目标物体在所述目标图像中的第一像素位置,获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置,并根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
本申请实施例中,终端设备可以获取到目标物体在所述目标图像中的第一像素位置。
在一种实现中,第一像素位置的确定可以由终端设备独立完成,或者由终端设备和服务器的交互来实现,即,服务器确定第一像素位置,并将第一像素位置发送给终端设备。
本申请实施例中,终端设备可以获取所述目标物体在数字地图中对应的第一位置信息。
在一种实现中,第一位置信息的确定可以由终端设备独立完成,或者由终端设备和服务器的交互来实现,即服务器确定第一位置信息,并将第一位置信息发送给终端设备。
本申请实施例中,终端设备可以获取第二位姿信息。
在一种实现中,根据所述第一像素位置和所述第一位置信息来确定第二位姿信息的步骤可以由终端设备独立完成,或者由终端设备和服务器的交互来实现,即,服务器确定第二位姿信息,并将第二位姿信息发送给终端设备。
本申请实施例中,终端设备可以向服务器发送所述目标图像,接收所述服务器发送的第二位姿信息,其中,所述位姿信息为所述服务器根据所述目标物体在所述目标图像中的第一像素位置以及所述目标物体在数字地图中对应的第一位置信息确定的,所述第一位置信息表示所述目标物体在所述数字地图中的位置。
所述第二位姿信息为根据所述第一像素位置以及所述第一位置信息的2D-3D对应关系确定的,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系。
本申请实施例中,终端设备可以获取所述目标物体在所述目标图像中的第一像素位置,向所述服务器发送所述目标图像中的第一像素位置,接收所述服务器发送的第二位姿信息,其中,所述位姿信息为所述服务器根据所述目标物体在所述目标图像中的第一像素位置以及所述目标物体在数字地图中对应的第一位置信息确定的,所述第一位置信息表示所述目 标物体在所述数字地图中的位置。
本申请实施例中,终端设备可以获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置,向所述服务器发送所述目标图像和所述第一位置信息,接收所述服务器发送的第二位姿信息,其中,所述位姿信息为所述服务器根据所述目标物体在所述目标图像中的第一像素位置以及所述目标物体在数字地图中对应的第一位置信息确定的,所述第一位置信息表示所述目标物体在所述数字地图中的位置。
本申请实施例中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度,因此,基于目标物体确定的所述第二位姿信息不满足所述位姿异常条件,例如第二位姿信息的求解成功,且与正确位姿信息之间的差异小于阈值。
接下来描述如何构建本实施例中的数字地图,参照图9b,可以预先采集目标物体的视频帧序列,例如可以360度环绕采集目标物体的视频帧序列,并处理视频帧序列,离线进行3D物体建模,输出目标物体的多个图像、每个图像的局部位姿以及全局位姿、3D点云数据等等。可以对需要定位的场景做视频帧序列采集,并处理场景的视频帧序列,离线完成场景的稀疏重建,输出场景图像数据库,场景图像数据库可以包括场景的图像数据、全局位姿和点云。然后在场景图像数据库中,搜索包含有目标物体的图像,输出场景地图的多帧关联图像。并对多帧关联图像做特征提取,与目标物体的局部位姿做图像匹配,可输出多帧关联图像特征与目标物体图像特征的2D-2D对应关系,并基于多帧关联图与目标物体的3D点云的2D-3D对应关系,由位姿求解算法即可求解出多帧关联帧与目标物体的位姿相对关系,并结合关联帧的全局位姿,即可计算出目标物体的全局位姿。数字地图中可以包括上述计算得到的目标物体的数据和全局位姿。
此外,还可以对全局位姿做图优化算法优化,进而得到更鲁棒的目标物体的全局位姿。具体的,如图9a所示,对多帧关联图计算得到了P1,P2,P3三帧的位姿,目标物体上的各个特征点X1~X6与相机的光心的连线会与图像相交,这个交点与真实物体点在图像平面上的投影(也就是图像上的像素点)存在差值,这个差值不可能恰好为0,此时需要将这个差值最小化来获取最优的相机位姿。求解这个最优的问题叫做BA优化,可以利用LM(Levenbrg-Marquardt)算法并在此基础上利用BA模型的稀疏性质来进行计算,其中,LM算法是最速下降法(梯度下降法)和Gauss-Newton的结合。
传统的位姿求解是用3D物体图像的2D特征和场景3D点云的2D-3D关系来求解3D物体位姿。但是由于场景地图的采集和稀疏重建的过程,提取是大范围环境内的视觉特征,在场景内的3D物体上提取到的特征和点云是稀疏,用这个稀疏点云来和3D物体的图像匹配和位姿求解,准确度和成功率都不是最优。本实施例中反向位姿求解是用场景的图像和稠密的3D物体点云来做匹配和位姿求解,这样定位的准确度和成功率都有较大的提升。
接下来描述如何获取到目标物体在所述目标图像中的第一像素位置以及目标物体在数字地图中对应的第一位置信息:
本申请实施例中,第一像素位置可以是目标物体在目标图像中的特征点或者是特征线 的像素位置。特征点可以是目标物体在目标图像中的角点,特征线可以是目标物体在目标图像中的边缘线,本实施例并不限定。
本申请实施例中,第一位置信息可以包括目标物体在数字地图中的三维3D物体点云信息,所述第一位置信息还可以包括拍摄设备拍摄所述目标对象得到第一图像时所对应的全局位姿。相应的,最后计算得到的第二位姿信息可以表示终端拍摄所述目标图像时所对应的全局位姿。
本申请实施例中,可以获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系,根据所述2D-3D对应关系,确定所述第二位姿信息。
具体的,在获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系之后,可以通过位姿求解算法计算第二位姿信息,位姿求解算法可以包括但不限于透视n个点的位姿求解算法(perspective n points,pnp)、透视2个点的位姿求解算法(perspective 2 points,p2p)等等。
在一种实现中,在获取到目标图像之后,可以首先对目标图像做物体识别。具体可以是基于深度学习的神经网络模型来识别出目标图像中的目标物体,并输出一个初步的拍摄目标物体的终端设备位姿信息;然后再提取目标图像中的局部视觉特征(第一像素位置),同数字地图中的目标物体的图像做2D-2D匹配,再结合数字地图中的3D物体点云(第一位置信息)可得到2D-3D对应关系,把2D-3D对应关系输入至位姿求解算法做位姿求解,最后得到更精确的3D物体位姿(第二位姿信息)。关于2D-3D匹配的示意可以参照图9c。其中,点P1在数字地图中的3D点云为X1-X4,P2和P3类似。
本申请实施例中,所述第二位姿信息可以包括拍摄所述目标图像时,所述终端设备的偏航角、俯仰角和横滚角。
应理解,在获取到第二位姿信息之后,还可以基于第二位姿信息以及终端设备的历史时刻SLAM位姿和定位位姿结果,做位姿优化,输出优化后的第二位姿信息。
具体的,参照图9a,终端设备在对着目标物体(图9a中示出的狮子雕像)一边移动一边定位;在T1时刻,视觉定位全局位姿为Tvps_1,SLAM定位局部位姿为Tslam_1;在T2时刻,视觉定位全局位姿为Tvps_2,SLAM定位局部位姿为Tslam_2;在T3时刻,获取到的第二位姿信息为T3d_3,SLAM定位局部位姿为Tslam_3;在T4时刻,获取到的第二位姿信息为T3d_4,SLAM定位局部位姿为Tslam_4;在T5时刻,获取到的第二位姿信息为Tvps_5,SLAM定位局部位姿为Tslam_5;
获取到的第二位姿信息的结果是全局位姿,这些位姿间存在这样的约束关系:即任意两个时刻全局位姿间的变换矩阵,应该同对应时刻局部SLAM位姿间的变换矩阵相等。根据这个约束条件,使用图优化使两个变换矩阵的差值最小,输出优化后的T5时刻的第二位姿信息。
本申请实施例中,终端设备在获取到第二位姿信息,或者服务器获取到第二位姿信息并将第二位姿信息发送至终端设备之后,可以显示如图10所示的终端界面,其中该终端界面用于指示定位成功,进而,终端设备可以返回AR界面(例如图11所示的AR导航界面)。
本申请实施例中,终端设备在获取到第二位姿信息,或者服务器获取到第二位姿信息并将第二位姿信息发送至终端设备之后,终端设备可以还可以获取终端设备的位姿变化;并根据第二位姿信息和获取到的终端设备的位姿变化,确定实时位姿。
在本申请实施中,终端设备可以将获取到的第二位姿信息作为初始位姿,并通过即时定位与地图构建(simultaneous localization and mapping,slam)跟踪技术,确定终端设备的位姿变化,基于初始位姿和终端的位姿变化,确定实时位姿。终端设备可以基于实时位姿,进行导航、路线规划、避障等处理。例如在进行路径规划时,终端设备根据所述坐标位置进行路径规划,得到规划后的路径,其中所述规划后的路径的起点或终点为所述坐标位置,显示二维导航界面,所述二维导航界面包括所述规划后的路径。或者,显示AR导航界面,所述AR导航界面包括当前终端设备所处的环境图像以及导航指引,所述导航指引为基于所述终端设备的偏航角、俯仰角和横滚角确定的。
可选地,除了可以基于实时位姿,进行导航、路线规划、避障等处理之外,在获取到第二位姿信息之后,终端设备还可以获取当前场景的预览流;根据第二位姿信息,确定预览流中的场景对应的数字地图中包含的预设媒体内容;在预览流中渲染媒体内容。
参照图10,图10为终端获取到第二位姿信息之后显示的界面,如图10所示,第二位姿信息不满足位姿异常条件,则相当于第二位姿信息与位姿的正确值之间的差异小于阈值,即第二位姿信息可以正确的表示出终端当前所处的位姿,进而,如图11所示,AR导航应用可以以终端当前所处的位姿开始继续进行AR导航的界面显示。
在本申请实施中,如果终端设备为手机或者AR穿戴式设备等,可以基于位姿信息构建虚拟场景。首先,终端设备可以获取当前场景的预览流,例如,用户可以在某商场中拍摄当前环境的预览流。接着,终端设备可以根据上面提到过的方法确定第二位姿信息作为初始位姿。随后,终端设备可以获取数字地图,数字地图记录了世界坐标系中的各个位置的三维坐标,预设的三维坐标位置处存在对应的预设媒体内容,终端可以在数字地图中,确定实时位姿对应的目标三维坐标,如果目标三维坐标处存在对应的预设媒体内容,则获取预设媒体内容。例如,用户对着一个目标店铺进行拍摄,终端识别到实时位姿,确定当前摄像头正对着一个目标店铺进行拍摄,可以获取目标店铺对应的预设媒体内容,目标店铺对应的预设媒体内容可以是目标店铺的说明信息,比如目标店铺中有哪些商品是值得购买的商品等。基于此终端可以在预览流中渲染媒体内容,此时,用户可以在手机中目标店铺对应的图像附近的预设区域内,查看到目标店铺对应的预设媒体内容。在用户查看完目标店铺对应的预设媒体内容之后,就可以对目标店铺有大概的了解。
可以针对不同场所设置不同的数字地图,这样当用户移动到其他场所时,也可以基于本公开实施例中提供的渲染媒体内容的方式,获取实时位姿对应的预设媒体内容,在预览流中渲染媒体内容。
本申请实施例提供了一种位姿确定方法,所述方法包括:获取第一图像;根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目 标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。通过上述方式,一方面,在无法进行高精度的位姿信息确定时,利用场景中的目标物体进行位姿定位,利用场景中的有效信息,实现了更高精度的位姿信息的确认;且在另一方面,在进行终端设备的位姿信息确认过程中,显示指引用户拍摄目标物体的提示信息,指引用户拍摄目标物体,避免用户不知如何操作或是扫描到无效目标物体等情况的出现。
参照图12,图12为本申请实施例提供的一种位姿确定方法的实施例示意图,如图12示出的那样,本申请提供的位姿确定方法,包括:
1201、服务器获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
步骤1201的具体描述可以参照步骤301以及步骤302中与服务器获取第一位姿信息相关的描述,这里不再赘述。
1202、获取所述终端所处的位置;
步骤1202的具体描述可以参照步骤302中与服务器获取所述终端所处的位置息相关的描述,这里不再赘述。
1203、当所述第一位姿信息满足位姿异常条件,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;
步骤1203的具体描述可以参照步骤303中与获取目标物体的信息相关的描述,这里不再赘述。
1204、向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
步骤1204的具体描述可以参照步骤303中与向所述终端发送所述目标物体的信息相关的描述,这里不再赘述。
在一种可能的实现中,服务器还可以获取终端发送的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的或,
所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
在一种可能的实现中,所述根据所述终端所处的位置确定目标物体,包括:
根据所述终端所处的位置,从数字地图中确定所述目标物体,其中,所述数字地图包括多个物体,所述目标物体为所述多个物体中在所述终端所处的位置周围的物体。
在一种可能的实现中,服务器还可以获取所述目标物体在所述目标图像中的第一像素位置;获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置;
根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
在一种可能的实现中,服务器还可以接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
在一种可能的实现中,服务器还可以接收所述终端发送的目标图像;
根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
在一种可能的实现中,服务器还可以接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
在一种可能的实现中,服务器还可以获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;
根据所述2D-3D对应关系,确定所述第二位姿信息。
在一种可能的实现中,所述第一位置信息包括预先拍摄所述目标对象时拍摄设备的全局位姿;相应的,所述第二位姿信息表示信息终端拍摄所述目标图像时所对应的全局位姿。
本申请实施例提供了一种位姿确定方法,所述方法包括:获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;获取所述终端所处的位置;当所述第一位姿信息满足位姿异常条件,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。通过上述方式,在无法进行高精度的位姿信息确定时,利用场景中的目标物体进行位姿定位,利用场景中的有效信息,实现了位姿信息的确认。
本申请还提供了一种位姿确定装置,位姿确定装置可以是终端设备,参照图13,图13为本申请实施例提供的一种位姿确定装置的结构示意,如图13中示出的那样,所述位姿确定装置1300包括:
获取模块1301,用于获取第一图像;
关于获取模块1301的具体描述,可以参照步骤301对应的实施例中的描述,这里不再赘述。
位姿确定模块1302,用于根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;
关于位姿确定模块1302的具体描述,可以参照步骤302对应的实施例中的描述,这里不再赘述。
显示模块1303,用于当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
关于显示模块1303的具体描述,可以参照步骤304对应的实施例中的描述,这里不再赘述。
在一种可能的实现中,所述获取模块1301,用于获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息。
在一种可能的实现中,所述目标物体在所述终端所处的位置周围,包括:所述目标物体与所述终端所处的位置在预设距离范围内、所述目标物体与所述终端所处的位置在同一区域的地图内、所述目标物体与所述终端所处的位置之间没有其他障碍物。
在一种可能的实现中,所述获取模块1301,用于:
获取所述终端所处的位置;
所述装置还包括:
发送模块,用于向服务器发送所述终端所处的位置;
接收模块,用于接收所述服务器发送的所述目标物体的信息,其中,所述目标物体为所述服务器基于所述终端所处的位置确定的。
在一种可能的实现中,所述获取模块1301,用于:
获取所述终端所处的位置;
根据所述终端所处的位置,从数字地图中获取所述目标物体的信息,其中,所述数字地图包括多个物体,所述目标物体为所述多个物体中在所述终端所处的位置周围的物体。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别;相应的,所述提示信息,包括如下信息的至少一种:所述目标物体的位置、由所述终端所处的位置至所述目标物体的位置的导航信息、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述发送模块,用于向所述服务器发送所述第一位姿信息;所 述获取模块,用于接收所述服务器发送的用于指示所述第一位姿信息满足位姿异常条件的第一信息;所述显示模块,用于根据所述第一信息,显示用于指示拍摄目标物体的提示信息。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
本申请还提供了一种位姿确定装置,位姿确定装置可以是服务器,参照图14,图14为本申请实施例提供的一种位姿确定装置的结构示意,如图14中示出的那样,所述位姿确定装置1400包括:
获取模块1401,用于获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;获取所述终端所处的位置;
关于获取模块1401的具体描述,可以参照步骤1201以及步骤1202对应的实施例中的描述,这里不再赘述。
目标物体确定模块1402,用于当所述第一位姿信息满足位姿异常条件,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;
关于目标物体确定模块1402的具体描述,可以参照步骤1203对应的实施例中的描述,这里不再赘述。
发送模块1403,用于当所述第一位姿信息满足位姿异常条件,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中。
关于发送模块1403的具体描述,可以参照步骤1203对应的实施例中的描述,这里不再赘述。
在一种可能的实现中,所述获取模块1401,用于:
获取终端发送的目标图像,所述目标图像包括所述目标物体;
根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
在一种可能的实现中,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
在一种可能的实现中,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
在一种可能的实现中,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
在一种可能的实现中,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的;或,
所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
在一种可能的实现中,所述位姿异常条件,包括:
无法获取到位姿信息;或,
当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。在一种可能的实现中,所述目标物体确定模块,用于根据所述终端所处的位置,从数字地图中确定所述目标物体,其中,所述数字地图包括多个物体,所述目标物体为所述多个物体中在所述终端所处的位置周围的物体。
在一种可能的实现中,所述获取模块1401,用于:
获取所述目标物体在所述目标图像中的第一像素位置;
获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置;
根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
在一种可能的实现中,所述获取模块1401,具体用于:
接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
在一种可能的实现中,所述获取模块1401,具体用于:
接收所述终端发送的目标图像;
根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
在一种可能的实现中,所述获取模块1401,具体用于:
接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
在一种可能的实现中,所述获取模块,具体用于:
获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;
根据所述2D-3D对应关系,确定所述第二位姿信息。
在一种可能的实现中,所述第一位置信息包括拍摄设备拍摄所述目标对象得到第一图像时所对应的全局位姿;相应的,所述第二位姿信息表示终端拍摄所述目标图像时所对应的全局位姿。
接下来介绍本申请实施例提供的一种终端设备,终端设备可以为图13中的位姿确定装置,请参阅图15,图15为本申请实施例提供的终端设备的一种结构示意图,终端设备1500具体可以表现为虚拟现实VR设备、手机、平板、笔记本电脑、智能穿戴设备等,此处不做限定。具体的,终端设备1500包括:接收器1501、发射器1502、处理器1503和存储器1504(其中终端设备1500中的处理器1503的数量可以一个或多个,图15中以一个处理器为例),其中,处理器1503可以包括应用处理器15031和通信处理器15032。在本申请的一些实施例中,接收器1501、发射器1502、处理器1503和存储器1504可通过总线或其它方式连接。
存储器1504可以包括只读存储器和随机存取存储器,并向处理器1503提供指令和数据。存储器1504的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1504存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1503控制终端设备的操作。具体的应用中,终端设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1503中,或者由处理器1503实现。处理器1503可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1503中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1503可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1503可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1504,处理器1503读取存储器1504中的信息,结合其硬件完成上述方法的步骤。具体的,处理器1503可以读取存储器1504中的信息,结合其硬件完成上述实施例中步骤301至这步骤303中与数据处理相关的步骤。
接收器1501可用于接收输入的数字或字符信息,以及产生与终端设备的相关设置以及功能控制有关的信号输入。发射器1502可用于通过第一接口输出数字或字符信息;发射器1502还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1502还可以包括显示屏等显示设备。
本申请实施例还提供了一种服务器,服务器可以为图14中的位姿确定装置,请参阅图16,图16是本申请实施例提供的服务器一种结构示意图,具体的,服务器1600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1616(例如,一个或一个以上处理器)和存储器1632,一个或一个以上存储应用程序1642或数据1644的存储介质1630(例如一个或一个以上海量存储设备)。其中,存储器1632和存储介质1630可以是短暂存储或持久存储。存储在存储介质1630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1616可以设置为与存储介质1630通信,在服务器1600上执行存储介质1630中的一系列指令操作。
服务器1600还可以包括一个或一个以上电源1626,一个或一个以上有线或无线网络接口1650,一个或一个以上输入输出接口1658;或,一个或一个以上操作系统1641,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
具体的,中央处理器1616可以完成上述实施例中步骤1201至这步骤1204中与数据处理相关的步骤。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行位姿确定方法的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述实施例描述的方法中的位姿确定方法的步骤。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
Claims (52)
- 一种位姿确定方法,其特征在于,所述方法包括:获取第一图像;根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体;根据所述目标图像,获取所述第二位姿信息。
- 根据权利要求1或2所述的方法,其特征在于,所述目标物体在所述终端所处的位置周围,包括:所述目标物体与所述终端所处的位置在预设距离范围内、所述目标物体与所述终端所处的位置在同一区域的地图内、所述目标物体与所述终端所处的位置之间没有其他障碍物。
- 根据权利要求1至3任一所述的方法,其特征在于,所述显示用于指示拍摄目标物体的提示信息之前,所述方法还包括:获取所述终端所处的位置;向服务器发送所述终端所处的位置;接收所述服务器发送的所述目标物体的信息,其中,所述目标物体为所述服务器基于所述终端所处的位置确定的。
- 根据权利要求1至3任一所述的方法,其特征在于,所述显示用于指示拍摄目标物体的提示信息之前,所述方法还包括:获取所述终端所处的位置;根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:所述多个物体中距离所述终端所处的位置更近的至少一个物体;所述多个物体中随机确定的至少一个物体;所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
- 根据权利要求4或5所述的方法,其特征在于,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别;相应的,所述提示信息,包括如下信息的至少一种:所述目标物体的位置、由所述终端所处的位置至所述目标物体的位置的导航信息、所述目标物体的图像、名称以及类别。
- 根据权利要求1至6任一所述的方法,其特征在于,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
- 根据权利要求1至7任一所述的方法,其特征在于,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
- 根据权利要求1至8任一所述的方法,其特征在于,所述当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息,包括:向所述服务器发送所述第一位姿信息;接收所述服务器发送的用于指示所述第一位姿信息满足位姿异常条件的第一信息,并根据所述第一信息,显示用于指示拍摄目标物体的提示信息。
- 根据权利要求1至9任一所述的方法,其特征在于,所述位姿异常条件,包括:无法获取到位姿信息;或,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
- 一种位姿确定方法,其特征在于,所述方法包括:获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;获取所述终端所处的位置;当所述第一位姿信息满足位姿异常条件时,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
- 根据权利要求11所述的方法,其特征在于,所述方法还包括:获取终端发送的目标图像,所述目标图像包括所述目标物体;根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
- 根据权利要求11或12所述的方法,其特征在于,所述目标物体的信息包括如下 信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
- 根据权利要求11至13任一所述的方法,其特征在于,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
- 根据权利要求11至14任一所述的方法,其特征在于,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
- 根据权利要求15所述的方法,其特征在于,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的;或,所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
- 根据权利要求11至16任一所述的方法,其特征在于,所述位姿异常条件,包括:无法获取到位姿信息;或,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
- 根据权利要求11至17任一所述的方法,其特征在于,所述根据所述终端所处的位置确定目标物体,包括:根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:所述多个物体中距离所述终端所处的位置更近的至少一个物体;所述多个物体中随机确定的至少一个物体;所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
- 根据权利要求12至18任一所述的方法,其特征在于,所述根据所述目标图像,获取第二位姿信息,包括:获取所述目标物体在所述目标图像中的第一像素位置;获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置;根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
- 根据权利要求19所述的方法,其特征在于,所述获取所述目标物体在目标图像中的第一像素位置,包括:接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
- 根据权利要求19或20所述的方法,其特征在于,所述获取所述目标物体在数字地图中对应的第一位置信息,包括:接收所述终端发送的目标图像;根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
- 根据权利要求19或20所述的方法,其特征在于,所述获取所述目标物体在数字地图中对应的第一位置信息,包括:接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
- 根据权利要求19至22任一所述的方法,其特征在于,所述根据所述第一像素位置以及所述第一位置信息确定第二位姿信息,包括:获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;根据所述2D-3D对应关系,确定所述第二位姿信息。
- 根据权利要求19至23任一所述的方法,其特征在于,所述第一位置信息包括预先拍摄所述目标对象时拍摄设备的全局位姿;相应的,所述第二位姿信息表示信息终端拍摄所述目标图像时所对应的全局位姿。
- 一种位姿确定装置,其特征在于,所述装置包括:获取模块,用于获取第一图像;位姿确定模块,用于根据所述第一图像确定第一位姿信息,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;显示模块,用于当所述第一位姿信息满足位姿异常条件时,显示用于指示拍摄目标物体的提示信息;其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
- 根据权利要求25所述的装置,其特征在于,所述获取模块,用于:获取用户根据所述提示信息拍摄得到的目标图像,所述目标图像包括所述目标物体;根据所述目标图像,获取所述第二位姿信息。
- 根据权利要求25或26所述的装置,其特征在于,所述目标物体在所述终端所处的位置周围,包括:所述目标物体与所述终端所处的位置在预设距离范围内、所述目标物 体与所述终端所处的位置在同一区域的地图内、所述目标物体与所述终端所处的位置之间没有其他障碍物。
- 根据权利要求25至27任一所述的装置,其特征在于,所述获取模块,用于:获取所述终端所处的位置;所述装置还包括:发送模块,用于向服务器发送所述终端所处的位置;接收模块,用于接收所述服务器发送的所述目标物体的信息,其中,所述目标物体为所述服务器基于所述终端所处的位置确定的。
- 根据权利要求25至27任一所述的装置,其特征在于,所述获取模块,用于:获取所述终端所处的位置;根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:所述多个物体中距离所述终端所处的位置更近的至少一个物体;所述多个物体中随机确定的至少一个物体;所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
- 根据权利要求28或29所述的装置,其特征在于,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别;相应的,所述提示信息,包括如下信息的至少一种:所述目标物体的位置、由所述终端所处的位置至所述目标物体的位置的导航信息、所述目标物体的图像、名称以及类别。
- 根据权利要求25至30任一所述的装置,其特征在于,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
- 根据权利要求25至31任一所述的装置,其特征在于,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
- 根据权利要求25至32任一所述的装置,其特征在于,所述发送模块,用于向所述服务器发送所述第一位姿信息;所述获取模块,用于接收所述服务器发送的用于指示所述第一位姿信息满足位姿异常条件的第一信息;所述显示模块,用于根据所述第一信息,显示用于指示拍摄目标物体的提示信息。
- 根据权利要求25至33任一所述的装置,其特征在于,所述位姿异常条件,包括:无法获取到位姿信息;或,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
- 一种位姿确定装置,其特征在于,所述装置包括:获取模块,用于获取第一位姿信息,所述第一位姿信息为根据第一图像确定的,所述第一位姿信息表示终端拍摄所述第一图像时所对应的位姿;获取所述终端所处的位置;目标物体确定模块,用于当所述第一位姿信息满足位姿异常条件时,根据所述终端所处的位置确定目标物体,其中,所述目标物体在所述终端所处的位置周围,且所述目标物体不在所述第一图像中;发送模块,用于向所述终端发送所述目标物体的信息,所述目标物体用于获得第二位姿信息,所述第二位姿信息表示终端拍摄所述目标物体时所对应的位姿,且所述第二位姿信息不满足所述位姿异常条件。
- 根据权利要求35所述的装置,其特征在于,所述获取模块,用于:获取终端发送的目标图像,所述目标图像包括所述目标物体;根据所述目标图像,获取所述第二位姿信息,并向所述终端发送所述第二位姿信息。
- 根据权利要求35或36所述的装置,其特征在于,所述目标物体的信息包括如下信息的至少一种:所述目标物体的位置、所述目标物体的图像、名称以及类别。
- 根据权利要求35至37任一所述的装置,其特征在于,所述目标物体为在当前终端的拍摄参数下能够完整成像,且物理位置相对固定的标志性物体。
- 根据权利要求35至38任一所述的方法,其特征在于,所述第一图像包括第一物体,所述第一物体用于确定所述第一位姿信息,且所述目标物体的纹理特征比所述第一物体的纹理特征具有更高的辨识度。
- 根据权利要求39所述的装置,其特征在于,所述第一位姿信息为基于所述第一物体在数字地图中对应的第一3D点云信息确定的;或,所述第二位姿信息为基于所述目标物体在数字地图中对应的第二3D点云信息确定的,且所述第二3D点云信息的点云密度高于所述第一3D点云信息的点云密度。
- 根据权利要求35至40任一所述的装置,其特征在于,所述位姿异常条件,包括:无法获取到位姿信息;或,当前确定出的位姿信息与正确位姿信息之间的偏差大于阈值。
- 根据权利要求35至41任一所述的装置,其特征在于,所述目标物体确定模块,用于根据所述终端所处的位置,从数字地图中确定满足预设条件的所述目标物体,其中,所述数字地图包括多个物体,所述多个物体为在所述终端所处的位置周围的物体,所述预设条件包括如下的至少一个:所述多个物体中距离所述终端所处的位置更近的至少一个物体;所述多个物体中随机确定的至少一个物体;所述多个物体中与所述终端所处的位置之间没有其他障碍物的至少一个物体;所述终端从所处的位置移动至所述多个物体中所需移动距离更少的至少一个物体。
- 根据权利要求36至42任一所述的装置,其特征在于,所述获取模块,用于:获取所述目标物体在所述目标图像中的第一像素位置;获取所述目标物体在数字地图中对应的第一位置信息,其中,所述第一位置信息表示所述目标物体在所述数字地图中的位置;根据所述第一像素位置以及所述第一位置信息确定第二位姿信息。
- 根据权利要求43所述的装置,其特征在于,所述获取模块,具体用于:接收所述终端发送的所述目标物体在目标图像中的第一像素位置。
- 根据权利要求42或43所述的装置,其特征在于,所述获取模块,具体用于:接收所述终端发送的目标图像;根据所述目标图像在数字地图中确定所述目标物体对应的第一位置信息。
- 根据权利要求42或43所述的装置,其特征在于,所述获取模块,具体用于:接收所述终端发送的所述目标物体在数字地图中对应的第一位置信息。
- 根据权利要求42至46任一所述的装置,其特征在于,所述获取模块,具体用于:获取所述第一像素位置以及所述第一位置信息的2D-3D对应关系,其中,所述2D-3D对应关系表示所述目标对象在所述目标图像中的二维坐标与在实际空间中的三维坐标的对应关系;根据所述2D-3D对应关系,确定所述第二位姿信息。
- 根据权利要求42至47任一所述的装置,其特征在于,所述第一位置信息包括拍摄设备拍摄所述目标对象得到第一图像时所对应的全局位姿;相应的,所述第二位姿信息表示终端拍摄所述目标图像时所对应的全局位姿。
- 一种位姿确定装置,其特征在于,所述位姿确定装置包括处理器、存储器、收发 器、摄像头和总线,其中:所述处理器、所述存储器、所述收发器和所述摄像头通过所述总线连接;所述摄像头,用于拍摄图像;所述收发器,用于接收和发送数据;所述存储器,用于存放计算机程序;所述处理器,用于控制所述存储器、收发器和摄像头,执行所述存储器上所存放的程序,以实现权利要求1-10任一所述的方法步骤。
- 一种服务器,其特征在于,所述终端包括处理器、存储器、收发器和总线,其中:所述处理器、所述存储器和所述收发器通过所述总线连接;所述收发器,用于接收和发送数据;所述存储器,用于存放计算机程序;所述处理器,用于执行所述存储器上所存放的程序,以实现权利要求11-24任一所述的方法步骤。
- 一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至24中任一项所述的方法。
- 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在终端上运行时,使得所述终端执行所述权利要求1-24中任一权利要求所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22745260.4A EP4276760A4 (en) | 2021-01-30 | 2022-01-26 | METHOD FOR DETERMINING INSTALLATION AND ASSOCIATED DEVICE |
US18/361,010 US20230368417A1 (en) | 2021-01-30 | 2023-07-28 | Pose determining method and related device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110134812.5A CN114842069A (zh) | 2021-01-30 | 2021-01-30 | 一种位姿确定方法以及相关设备 |
CN202110134812.5 | 2021-01-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/361,010 Continuation US20230368417A1 (en) | 2021-01-30 | 2023-07-28 | Pose determining method and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022161386A1 true WO2022161386A1 (zh) | 2022-08-04 |
Family
ID=82561383
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/073944 WO2022161386A1 (zh) | 2021-01-30 | 2022-01-26 | 一种位姿确定方法以及相关设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230368417A1 (zh) |
EP (1) | EP4276760A4 (zh) |
CN (1) | CN114842069A (zh) |
WO (1) | WO2022161386A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115619837A (zh) * | 2022-12-20 | 2023-01-17 | 中科航迈数控软件(深圳)有限公司 | 一种ar图像生成方法及相关设备 |
CN116664684A (zh) * | 2022-12-13 | 2023-08-29 | 荣耀终端有限公司 | 定位方法、电子设备及计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120300020A1 (en) * | 2011-05-27 | 2012-11-29 | Qualcomm Incorporated | Real-time self-localization from panoramic images |
CN107918482A (zh) * | 2016-10-08 | 2018-04-17 | 天津锋时互动科技有限公司深圳分公司 | 沉浸式vr系统中避免过度刺激的方法与系统 |
CN111256701A (zh) * | 2020-04-26 | 2020-06-09 | 北京外号信息技术有限公司 | 一种设备定位方法和系统 |
US20200372672A1 (en) * | 2019-05-21 | 2020-11-26 | Microsoft Technology Licensing, Llc | Image-based localization |
CN112179330A (zh) * | 2020-09-14 | 2021-01-05 | 浙江大华技术股份有限公司 | 移动设备的位姿确定方法及装置 |
CN112284394A (zh) * | 2020-10-23 | 2021-01-29 | 北京三快在线科技有限公司 | 一种地图构建及视觉定位的方法及装置 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IL169934A (en) * | 2005-07-27 | 2013-02-28 | Rafael Advanced Defense Sys | Real-time geographic information system and method |
-
2021
- 2021-01-30 CN CN202110134812.5A patent/CN114842069A/zh active Pending
-
2022
- 2022-01-26 WO PCT/CN2022/073944 patent/WO2022161386A1/zh unknown
- 2022-01-26 EP EP22745260.4A patent/EP4276760A4/en active Pending
-
2023
- 2023-07-28 US US18/361,010 patent/US20230368417A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120300020A1 (en) * | 2011-05-27 | 2012-11-29 | Qualcomm Incorporated | Real-time self-localization from panoramic images |
CN107918482A (zh) * | 2016-10-08 | 2018-04-17 | 天津锋时互动科技有限公司深圳分公司 | 沉浸式vr系统中避免过度刺激的方法与系统 |
US20200372672A1 (en) * | 2019-05-21 | 2020-11-26 | Microsoft Technology Licensing, Llc | Image-based localization |
CN111256701A (zh) * | 2020-04-26 | 2020-06-09 | 北京外号信息技术有限公司 | 一种设备定位方法和系统 |
CN112179330A (zh) * | 2020-09-14 | 2021-01-05 | 浙江大华技术股份有限公司 | 移动设备的位姿确定方法及装置 |
CN112284394A (zh) * | 2020-10-23 | 2021-01-29 | 北京三快在线科技有限公司 | 一种地图构建及视觉定位的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4276760A4 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116664684A (zh) * | 2022-12-13 | 2023-08-29 | 荣耀终端有限公司 | 定位方法、电子设备及计算机可读存储介质 |
CN116664684B (zh) * | 2022-12-13 | 2024-04-05 | 荣耀终端有限公司 | 定位方法、电子设备及计算机可读存储介质 |
CN115619837A (zh) * | 2022-12-20 | 2023-01-17 | 中科航迈数控软件(深圳)有限公司 | 一种ar图像生成方法及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
US20230368417A1 (en) | 2023-11-16 |
EP4276760A1 (en) | 2023-11-15 |
EP4276760A4 (en) | 2024-06-19 |
CN114842069A (zh) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110495819B (zh) | 机器人的控制方法、机器人、终端、服务器及控制系统 | |
US20220262035A1 (en) | Method, apparatus, and system for determining pose | |
WO2021170129A1 (zh) | 一种位姿确定方法以及相关设备 | |
WO2022127787A1 (zh) | 一种图像显示的方法及电子设备 | |
WO2023284715A1 (zh) | 一种物体重建方法以及相关设备 | |
CN112087649B (zh) | 一种设备搜寻方法以及电子设备 | |
WO2022161386A1 (zh) | 一种位姿确定方法以及相关设备 | |
WO2022007707A1 (zh) | 家居设备控制方法、终端设备及计算机可读存储介质 | |
WO2022206494A1 (zh) | 目标跟踪方法及其装置 | |
WO2022179604A1 (zh) | 一种分割图置信度确定方法及装置 | |
CN114283195B (zh) | 生成动态图像的方法、电子设备及可读存储介质 | |
WO2022161011A1 (zh) | 生成图像的方法和电子设备 | |
WO2023216957A1 (zh) | 一种目标定位方法、系统和电子设备 | |
CN111249728B (zh) | 一种图像处理方法、装置及存储介质 | |
WO2022152174A1 (zh) | 一种投屏的方法和电子设备 | |
CN115032640A (zh) | 手势识别方法和终端设备 | |
WO2022062902A1 (zh) | 一种文件传输方法和电子设备 | |
WO2022033344A1 (zh) | 视频防抖方法、终端设备和计算机可读存储介质 | |
CN114812381B (zh) | 电子设备的定位方法及电子设备 | |
WO2022222702A1 (zh) | 屏幕解锁方法和电子设备 | |
WO2022222705A1 (zh) | 设备控制方法和电子设备 | |
WO2024114785A1 (zh) | 一种图像处理方法、电子设备及系统 | |
CN117762279A (zh) | 控制方法、电子设备、存储介质及程序产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22745260 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022745260 Country of ref document: EP Effective date: 20230810 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |