WO2023216957A1 - 一种目标定位方法、系统和电子设备 - Google Patents

一种目标定位方法、系统和电子设备 Download PDF

Info

Publication number
WO2023216957A1
WO2023216957A1 PCT/CN2023/092023 CN2023092023W WO2023216957A1 WO 2023216957 A1 WO2023216957 A1 WO 2023216957A1 CN 2023092023 W CN2023092023 W CN 2023092023W WO 2023216957 A1 WO2023216957 A1 WO 2023216957A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
electronic device
pose
server
image
Prior art date
Application number
PCT/CN2023/092023
Other languages
English (en)
French (fr)
Inventor
于海星
张磊
秦瑞
李江伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023216957A1 publication Critical patent/WO2023216957A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Definitions

  • Embodiments of the present application relate to the field of electronic technology, and in particular, to a target positioning method, system and electronic device.
  • 3D recognition and tracking algorithms can only identify and track predefined objects, and cannot instantly add target objects of interest through online updates.
  • the number of target objects that can be recognized and tracked in 3D is limited and cannot meet the requirements of the digital twin system. Capability requirements for data sustainability expansion and users’ diverse and personalized identification needs.
  • the present application provides a target positioning method, system and electronic device.
  • the pose estimation model library in the server includes multiple pose estimation models, and the target corresponds to the pose estimation model.
  • the server adds a positionable target, it only needs to add the pose estimation model corresponding to the target in the pose estimation model library, which can meet the user's need to expand the number of targets online and significantly increase the number of positionable targets.
  • embodiments of the present application provide a target positioning method, which is applied to a server.
  • the method includes:
  • the server receives a positioning request sent by the first electronic device, where the positioning request includes the image to be processed;
  • the server identifies objects in the image to be processed
  • the server searches for the target pose estimation model corresponding to the target from the pose estimation model library.
  • the pose estimation model library includes pose estimation models corresponding to multiple objects;
  • the server obtains the target's pose based on the image to be processed and the target pose estimation model
  • the server sends the target's pose to the first electronic device.
  • the server after receiving the image to be processed sent by the electronic device, the server can identify the target in the image to be processed, search for the target pose estimation model corresponding to the target from the pose estimation model library, and then, based on the to-be-processed image Image and target pose estimation model to obtain the target pose.
  • the pose estimation model library in the server includes pose estimation models corresponding to multiple objects.
  • the method before the server searches for the target pose estimation model corresponding to the target from the pose estimation model library, the method further includes:
  • the server receives the three-dimensional model corresponding to the target sent by the second electronic device
  • the server renders the three-dimensional model corresponding to the target and generates multiple training images
  • the server trains the initial pose estimation model based on multiple training images to obtain the target pose estimation model.
  • the server can receive a three-dimensional model corresponding to the target from the electronic device, such as a computer-aided design model or a point cloud model of the target; and then, based on the three-dimensional model, generate a pose estimation model corresponding to the target, that is, Target pose estimation model.
  • the user can send a three-dimensional model of the target of interest to the server through an electronic device, so that the server can realize the function of locating the target.
  • This method can realize the user's need to expand the number of targets online and greatly increase the number of targets that the server can locate. The number of targets.
  • the electronic device is installed with a positioning application, and the user can send a three-dimensional model corresponding to the target to the server through the positioning application; the server can receive the three-dimensional model corresponding to the target sent from different electronic devices, and generate multiple target corresponding
  • the pose estimation model is used to achieve target positioning when an electronic device needs to position the target.
  • the server's identification of targets in the image to be processed includes:
  • the server extracts feature vectors from the image to be processed
  • the server queries the feature vector library for the feature vector with the highest similarity to the extracted feature vector; the feature vector library includes feature vectors corresponding to the identifiers of multiple objects;
  • the server determines the target based on the identifier corresponding to the queried feature vector.
  • the server can first determine the image block including the target from the image to be processed, and then extract the feature vector from the image block. This method can avoid pairs of objects other than the target in the image to be processed. Recognition causes interference, thereby improving the accuracy of identifying targets.
  • the target pose estimation model includes a key point identification model and an N-point perspective algorithm.
  • the server obtains the target pose based on the image to be processed and the target pose estimation model, including:
  • the server inputs the image to be processed into the key point recognition model and obtains the two-dimensional coordinates of at least four key points corresponding to the target;
  • the server determines the three-dimensional coordinates of at least four key points based on the three-dimensional model corresponding to the target;
  • the server Based on the two-dimensional coordinates and three-dimensional coordinates of at least four key points, the server obtains the pose of the target through the N-point perspective algorithm.
  • the method before the server sends the pose of the target to the first electronic device, the method further includes:
  • the server Based on the pose of the target, the server reprojects the three-dimensional model corresponding to the target onto the image to be processed to obtain the rendered image;
  • the server performs M optimization processes, where M is a positive integer.
  • the optimization process includes: calculating the optimized pose based on the rendered image and the image to be processed; the server calculates the pose error based on the image to be processed and the optimized pose;
  • the server updates the optimized pose to the target pose; when the pose error is not less than the preset error value, the server updates the rendered image based on the optimized pose and performs the optimization process.
  • the server can optimize the pose through the above reprojection process to improve the accuracy of positioning.
  • the method before the server receives the positioning request sent by the first electronic device, the method further includes:
  • the three-dimensional model corresponding to the target is sent to the first electronic device, and the three-dimensional model is used by the first electronic device to track the target in combination with the pose of the target.
  • the server after obtaining the three-dimensional model corresponding to the target, can send the three-dimensional model to the first electronic device; the first electronic device can store the three-dimensional model, and further, after receiving the pose of the target, the third electronic device can an electronic device The device no longer needs to obtain the 3D model from the server, but directly obtains the 3D model from the storage to track the target.
  • This method can improve the efficiency of target tracking by the electronic device by sending the three-dimensional model to the electronic device in advance.
  • the target can also be tracked by the server.
  • the electronic device can send each frame of image acquired to the server; the server can perform target positioning on the key frame image, and then, based on the positioning results of the target in the previous frame image of the two key frame images, the two frames of key frame images Target tracking is performed on the images between the frame images, wherein the method for positioning the target by the server can be consistent with the method for positioning the target by the above-mentioned electronic device, and the method for tracking the target can be consistent with the method for tracking the target by the above-mentioned electronic device. ;Furthermore, the server sends the tracking results of the target to the electronic device.
  • embodiments of the present application provide a target positioning method applied to electronic devices.
  • the method includes:
  • the electronic device acquires the image to be processed
  • the electronic device sends a positioning request to the server, and the positioning request includes the image to be processed; the positioning request is used to request identification of the target in the image to be processed and to obtain the first posture of the target in the image to be processed; the first posture is based on the position of the server. Obtained from the target pose estimation model corresponding to the queried target in the pose estimation model library.
  • the pose estimation model library includes pose estimation models corresponding to multiple objects;
  • the electronic device receives the first position sent by the server
  • the electronic device renders virtual information on the image to be processed based on the first posture
  • the electronic device displays the rendered image.
  • the method further includes:
  • the electronic device acquires the current image frame, and the current frame image is the image acquired after acquiring the image to be processed;
  • the electronic device determines the second pose of the target in the current image frame based on the current image frame, the three-dimensional model corresponding to the target, and the first pose.
  • the electronic device determines the second pose of the target in the current image frame based on the current image frame, the three-dimensional model of the target and the first pose, the following includes:
  • the electronic device receives the identification of the target sent by the server
  • the electronic device obtains the three-dimensional model from the storage based on the identification of the target.
  • the three-dimensional model is obtained by the electronic device from the server and stored in the storage.
  • determining the second pose of the target in the current image frame includes:
  • the electronic device performs N optimization processes, where N is a positive integer.
  • the optimization process includes: the electronic device calculates the pose correction amount based on the energy function, the three-dimensional model and the second pose; updates the second pose based on the pose correction amount; updates the second pose based on the energy function; Function and second pose, calculate the energy function value;
  • the electronic device When the energy function value meets the preset conditions, the electronic device outputs the second pose; otherwise, the optimization process is performed.
  • the energy function includes at least one of a gravity axis constraint, a region constraint, a pose estimation algorithm constraint, or a regularization constraint
  • the gravity axis constraint is used to constrain the third The error of the two poses in the direction of the gravity axis; the area constraint term is used to constrain the outline error of the target in the second pose based on the pixel value of the current frame image; the pose estimation algorithm constraint term is used to constrain the second pose based on the estimated pose
  • the error, the estimated pose is based on the first pose and the pose of the electronic device based on the real-time positioning and synchronized map construction algorithm; the regular term constraint term is based on the target's corresponding three-dimensional model when constraining the second pose Contour error.
  • the electronic device optimizes the tracking posture of the target through the above energy function, which can greatly improve the tracking posture of the target. High pose accuracy.
  • the method before the electronic device sends a positioning request to the server, the method includes:
  • the electronic device displays a user interface, and the user interface includes registration controls;
  • the electronic device When the electronic device detects a user operation on the registered control, it sends the three-dimensional model corresponding to the target to the server.
  • the three-dimensional model is obtained by the electronic device receiving external input or generated by the electronic device based on user operations.
  • the embodiments of this application provide an exemplary registration interface for users to register locatable targets.
  • the user can send the three-dimensional model corresponding to the target to the server through operations on the registration interface displayed on the electronic device, so that the server can based on the three-dimensional model, Generate a pose estimation model corresponding to the target, thereby increasing the number of targets that can be identified.
  • this application provides a server.
  • the server may include memory and a processor.
  • memory can be used to store computer programs.
  • the processor may be used to call a computer program so that the electronic device executes the first aspect or any possible implementation of the first aspect.
  • the present application provides an electronic device.
  • the electronic device may include memory and a processor.
  • memory can be used to store computer programs.
  • the processor may be used to call a computer program so that the electronic device executes the second aspect or any possible implementation of the second aspect.
  • the present application provides a computer program product containing instructions, which when the computer program product is run on an electronic device, causes the electronic device to execute the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer program product containing instructions, which when the computer program product is run on an electronic device, causes the electronic device to execute the second aspect or any possible implementation of the second aspect.
  • the present application provides a computer-readable storage medium, including instructions, which when the above instructions are run on an electronic device, cause the electronic device to execute the first aspect or any possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium, including instructions, which when the above instructions are run on an electronic device, cause the electronic device to execute the second aspect or any possible implementation of the second aspect.
  • inventions of the present application provide a target positioning system.
  • the target positioning system includes a server and an electronic device.
  • the server is the server described in the third aspect
  • the electronic device is the electronic device described in the fourth aspect.
  • the electronic devices provided by the third and fourth aspects, the computer program products provided by the fifth and sixth aspects, and the computer-readable storage media provided by the seventh and eighth aspects are all used to execute the present application. Examples of methods provided. Therefore, the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
  • Figure 1 is a schematic architectural diagram of a target positioning system provided by an embodiment of the present application.
  • Figure 2A is a schematic diagram of the hardware structure of an electronic device 100 provided by an embodiment of the present application.
  • Figure 2B is a software structure block diagram of the electronic device 100 provided by the embodiment of the present application.
  • Figure 3 is a target positioning method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a scenario provided by an embodiment of the present application.
  • Figure 5 is a flow chart of another target positioning method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of a subject detection model provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of determining an image block where a target is located according to an embodiment of the present application.
  • Figure 8 is a schematic diagram of a server storing data provided by an embodiment of the present application.
  • Figure 9 is a flow chart of a method for tracking a target provided by an embodiment of the present application.
  • Figures 10A-10D are user interfaces implemented on the electronic device provided by the embodiment of the present application.
  • first and second are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.
  • GUI graphical user interface
  • the pose of an object refers to the position and attitude of the object in a certain coordinate system, which can be described by the relative attitude of the coordinate system attached to the object.
  • an object can be represented by coordinate system B attached to the object, then the attitude of the object relative to coordinate system A is equivalent to the attitude of coordinate system B relative to coordinate system A.
  • the attitude of the robot relative to the environment coordinate system F0 is equivalent to the attitude of the robot relative to the robot coordinate system F2 and the environment coordinate system F0.
  • the attitude of coordinate system B relative to coordinate system A can be expressed by the rotation matrix R and the translation matrix T.
  • the attitude of coordinate system B relative to coordinate system A can be expressed as Then the attitude of the object relative to the coordinate system A can be used express. It should be noted that when the coordinate system A is the environment coordinate system F0, the pose of the object can be used express.
  • SLAM laser SLAM based on lidar
  • VSLAM vision-based VSLAM
  • laser SLAM is based on the point cloud information returned by lidar
  • visual SLAM is based on the image information returned by the camera.
  • the pixel coordinate system is a two-dimensional coordinate system, the unit of the pixel coordinate system is pixel, and the coordinate origin O is in the upper left corner.
  • a pixel coordinate system can be established.
  • the units of the abscissa u and ordinate v of the pixel coordinate system are both pixels. It can be understood that the abscissa u and ordinate v of a pixel in the image respectively indicate the number of columns and rows where the pixel is located.
  • the PnP algorithm is a method for solving the motion of 3D to 2D point pairs. It is used to estimate the pose of an object based on n 3D space points and the projected positions of n 3D space points.
  • n 3D space points are points on the object, and n is a positive integer.
  • the projection positions of n 3D space points can be obtained based on the structured light system, and the projection positions of n 3D space points can be expressed by coordinates in the pixel coordinate system.
  • PnP Planar Biharmonic Deformation
  • DLT direct linear transformation
  • EPnP that use three pairs of points to estimate pose.
  • nonlinear optimization can be used to construct a least squares problem and solve it iteratively.
  • a three-dimensional model can refer to a triangular mesh model of an object, that is, a Computer Aided Design (CAD) model.
  • CAD Computer Aided Design
  • a rigid body is an object whose shape and size remain unchanged after being acted upon by force during motion, and the relative positions of its internal points remain unchanged. Absolute rigid bodies actually do not exist, but are only an ideal model, because any object will deform to a greater or lesser extent after being subjected to force. If the degree of deformation is extremely small relative to the geometric size of the object itself, when studying the object Deformation during movement is negligible.
  • the position of a rigid body in space must be determined based on the spatial position of any point in the rigid body and the position of the rigid body when it rotates around that point, so the rigid body has six degrees of freedom in space.
  • the three-dimensional model of the target may be three-dimensional spatial data used to represent a certain rigid body, such as a CAD model or point cloud model of the rigid body.
  • Figure 1 is a schematic architectural diagram of a target positioning system provided by an embodiment of the present application.
  • the system may include a first electronic device 101, a server 102 and a second electronic device 103.
  • the second electronic device 103 may include a plurality of electronic devices, wherein:
  • the first electronic device 101 is used for user registration of identifiable targets.
  • the first electronic device may receive a registration operation input by the user and send to the server 102 a three-dimensional model corresponding to the target that the user wants to register.
  • the first electronic device 101 may also include multiple electronic devices. For example, multiple users may send three-dimensional models corresponding to different targets to the server through different electronic devices.
  • the server 102 is used to generate a pose estimation model, a point cloud model and a feature vector corresponding to the target based on the three-dimensional model corresponding to the target.
  • the point cloud model is a model converted into a preset point cloud format for the three-dimensional model; and is also used to generate a pose estimation model, a point cloud model and a feature vector corresponding to the target based on the three-dimensional model.
  • the corresponding three-dimensional model, pose estimation model and point cloud model are used to locate the target in the image to be processed.
  • the server can be a cloud server or an edge server.
  • the second electronic device 103 is used to send the image to be processed to the server; to receive the pose of the target in the image to be processed sent by the server; and to track the target based on the pose of the target.
  • first electronic device 101 and the second electronic device 103 may also be the same electronic device, that is to say, the electronic device has the functions of the first electronic device 101 and the second electronic device 103 .
  • the user is in the After an electronic device registers an identifiable target, the user can identify and locate the target through the target application of the first electronic device.
  • a user may register an identifiable target via the first electronic device 101 .
  • the first electronic device 101 detects a user operation of registering an identifiable target, it can send a three-dimensional model corresponding to the target to the server 102; the server 102 can generate a pose estimation model and point cloud corresponding to the target based on the three-dimensional model corresponding to the target. model and feature vector; furthermore, the server 102 can save the pose estimation model, point cloud model and feature vector corresponding to the target, and send the point cloud model corresponding to the target to all second electronic devices 103 installed with the target application.
  • the target applications can be Huawei AR Map, Hetu Map Tool, Hetu City Check-in Application, etc., which can be used for special effects display, virtual spraying and stylization, etc.
  • the user can identify the target through the second electronic device 103.
  • the target is an object registered in the server, that is, the server includes the pose estimation model, point cloud model and feature vector corresponding to the target.
  • the second electronic device 103 detects a user operation to identify the target, it may send a recognition request to the server 102.
  • the recognition request may include an image to be processed; the server 102 generates a pose estimation model corresponding to the target based on the three-dimensional model corresponding to the target. , point cloud model and feature vector, locate the target in the image to be processed, and obtain the pose of the target; the server 102 can send the pose of the target to the second electronic device 103.
  • the second electronic device 103 can also track the target based on the pose of the target and the point cloud model corresponding to the target.
  • the above-mentioned first electronic device 101 and the above-mentioned second electronic device 103 are both electronic devices, and the electronic devices may be mobile phones, tablet computers, desktop computers, laptop computers, handheld computers, notebook computers, Ultra-mobile personal computers (UMPCs), netbooks, and cellular phones, personal digital assistants (PDAs), augmented reality (AR) devices, virtual reality (VR) devices , artificial intelligence (AI) devices, wearable devices, vehicle-mounted devices, smart home devices and/or smart city devices, etc.
  • PDAs personal digital assistants
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable devices wearable devices
  • vehicle-mounted devices smart home devices and/or smart city devices, etc.
  • FIG. 2A exemplarily shows a schematic diagram of the hardware structure of the electronic device 100.
  • electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different component configuration.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the electronic device 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194 and Subscriber identification module (SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor. (application processor, AP), modem processor, graphics processing unit (GPU), image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • UART universal asynchronous receiver and transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 can separately couple the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 can be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 can be coupled with the audio module 170 through the I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface to implement the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface to implement the function of answering calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to implement the shooting function of the electronic device 100 .
  • the processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, display screen 194, wireless communication module 160, audio module 170, sensor module 180, etc.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the SIM interface can be used to communicate with the SIM card interface 195 to implement the function of transmitting data to the SIM card or reading data in the SIM card.
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationships between the modules illustrated in the embodiments of the present application are only schematic illustrations and do not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system ( quasi-zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the optical signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • Intelligent cognitive applications of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the electronic device 100 .
  • the internal memory 121 may include a program storage area and a data storage area.
  • the stored program area can store the operating system, at least one application required for the function (such as face recognition function, fingerprint recognition function, mobile payment function, etc.).
  • the storage data area can store data created during the use of the electronic device 100 (such as face information template data, fingerprint information templates, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to hands-free calls.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In this embodiment of the present application, the electronic device 100 can be provided with two microphones 170C, which in addition to collecting sound signals can also implement a noise reduction function. In the embodiment of the present application, the electronic device 100 can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D may be a USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • the capacitive pressure sensor may be composed of at least two Parallel plates of conductive material.
  • the electronic device 100 determines the intensity of the pressure based on the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position based on the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch location but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the angle at which the electronic device 100 shakes, calculates the distance that the lens module needs to compensate based on the angle, and allows the lens to offset the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • Air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • Magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 may utilize the magnetic sensor 180D to detect opening and closing of the flip holster.
  • the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. Then, based on the detected opening and closing status of the leather case or the opening and closing status of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices and be used in horizontal and vertical screen switching, pedometer and other applications.
  • Distance sensor 180F for measuring distance.
  • Electronic device 100 can measure distance via infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may utilize the distance sensor 180F to measure distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light outwardly through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect when the user holds the electronic device 100 close to the ear for talking, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in holster mode, and pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touching.
  • Fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access to application locks, fingerprint photography, fingerprint answering of incoming calls, etc.
  • Temperature sensor 180J is used to detect temperature.
  • the electronic device 100 utilizes the temperature detected by the temperature sensor 180J to execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of a processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid the low temperature causing the electronic device 100 to shut down abnormally.
  • the electronic device 100 when the temperature is below yet another threshold, the electronic device 100 The output voltage of 142 is boosted to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K can be disposed on the display screen 194.
  • the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near the touch sensor 180K.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 in a position different from that of the display screen 194 .
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to synthesize requests, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the electronic device 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the electronic device 100 can execute the target positioning method through the processor 110 .
  • FIG. 2B is a software structure block diagram of the electronic device 100 provided by the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the Android system is divided into four layers, from top to bottom: application layer, application framework layer, Android runtime and system libraries, and kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and target application (such as positioning application).
  • applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, and target application (such as positioning application).
  • the user can send the CAD model of the target to the server through the target application, and can also identify, locate, and track the target through the target application. For details, see the relevant description below.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a display manager, a sensor manager, a cross-device connection manager, an event manager, an activity manager, a window manager, and a content provider. , view system, resource manager, notification manager, etc.
  • the display manager is used for system display management and is responsible for the management of all display-related transactions, including creation, destruction, orientation switching, size and status changes, etc.
  • the sensor manager is responsible for managing the status of sensors, managing applications to monitor sensor events, and reporting events to applications in real time.
  • Cross-device connection manager is used to establish communication connections with other devices to share resources.
  • the event manager is used for the event management service of the system. It is responsible for receiving the events uploaded by the underlying layer and distributing them to each window, and completing the reception and distribution of events.
  • the task manager is used for the management of task (Activity) components, including startup management, life cycle management, task direction management, etc.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • the window manager is also responsible for window display management, including window display mode, display size, display coordinate position, display level and other related management.
  • Content providers are used to store and retrieve data and make this data accessible to applications.
  • Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library contains two parts: one is the functional functions that need to be called by the Java language, and the other is the core library of Android.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media libraries (Media Libraries), 3D graphics processing libraries (for example: OpenGL ES), 2D graphics engines (for example: SGL) and event data, etc.
  • surface manager surface manager
  • media libraries Media Libraries
  • 3D graphics processing libraries for example: OpenGL ES
  • 2D graphics engines for example: SGL
  • event data etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • Figure 3 is a target positioning method provided by an embodiment of the present application.
  • any one of the first user operation or the second user operation may be the user's touch operation (such as a click operation, a long press operation, a slide-up operation, a slide-down operation, or a side-swipe operation). ), it can also be a non-contact operation (such as air gesture), or it can be a user's voice command, which is not specifically limited in the embodiment of the present application.
  • the method includes the following steps:
  • the first electronic device When detecting the first user's operation, the first electronic device sends the CAD model corresponding to the target to the server.
  • the object that needs to be positioned is the above-mentioned target, and the object can be an object or a person, which is not limited here; the CAD model can be spatial point cloud model data or mesh model data. It should be noted that the objects targeted by the embodiments of the present application are generally rigid bodies, such as stationary vehicles and cultural relics in museums.
  • the target application can be installed on the first electronic device, and the electronic device can display a relevant user interface on the target application, for example, there is a control to confirm the upload on the interface; when the first electronic device detects that the user has targeted the control
  • the first electronic device may send the CAD model corresponding to the target to the server.
  • the CAD model corresponding to the target may be generated by the first electronic device.
  • the user generates the CAD model corresponding to the target through the target application or other applications; it may also be obtained by the first electronic device from other devices, which is not limited here. .
  • the first electronic device can also send the auxiliary information of the target to the server, where the auxiliary information of the target can be information such as the name of the target or characteristics of the target; the auxiliary information of the target can be used by the electronic device when including the target.
  • the image is used to render virtual information.
  • the user can draw a CAD model of the vehicle on the first electronic device.
  • the first electronic device detects the user's drawing operation, in response to the user's drawing operation, the first electronic device can generate a CAD model of the vehicle; the first electronic device can generate a CAD model of the vehicle; An electronic device may also send the CAD model of the vehicle to the server when detecting the user's operation of uploading the CAD model of the vehicle to the server.
  • the user can also send the vehicle's auxiliary information to the server, such as vehicle length and vehicle width.
  • the first electronic device detects that the user wants to upload the vehicle's auxiliary information to the server, the first electronic device An electronic device can send vehicle auxiliary information to the server.
  • the applications used by the first electronic device to generate the CAD model and to send the CAD model may be the same application, or they may be different applications.
  • the first electronic device can also be multiple electronic devices, that is, different users can send CAD models corresponding to the targets that the user is interested in to the server through different electronic devices.
  • the server obtains the point cloud model corresponding to the target based on the CAD model corresponding to the target.
  • the server receives the CAD model corresponding to the target sent by the first electronic device, and then converts the CAD model corresponding to the target into a preset point cloud data format to obtain the point cloud model corresponding to the target.
  • the server can also process the position information of the point cloud model corresponding to the target according to the preset coordinate system.
  • the server can also identify noise points in the point cloud model corresponding to the target, and then remove noise points in the target point cloud model corresponding to the target.
  • the server when the server obtains the point cloud model corresponding to the target, it can store the point cloud model in storage.
  • the server can have a three-dimensional model library. After the server obtains the point cloud model corresponding to the target, it can store the point cloud model in the three-dimensional model library.
  • the server can calibrate the target, that is, determine the identity of the target, so that the identity of the target is associated with other data of the target, such as point cloud model and feature vector.
  • the identifier of the target can be the identity identifier of the target (Identification, ID), such as the fine-grained category of the target; taking the target as a vehicle as an example, the identity of the vehicle can be the name of the vehicle.
  • the server stores the point cloud model corresponding to the target in the three-dimensional model library. When the server receives the query for the point cloud model corresponding to the target, it can query the point cloud model corresponding to the target from the three-dimensional model library based on the target's identification. .
  • the server may have different databases, such as any one or more of a CAD model library or a point cloud model library.
  • the server may store the received CAD model in the CAD model library and the point cloud model in the CAD model library. Enter the point cloud model library.
  • the server can obtain data corresponding to the target, such as a point cloud model, from the database based on the target's identification.
  • the server sends the point cloud model corresponding to the target to the second electronic device.
  • the second electronic device may be one or more electronic devices.
  • the second electronic device may be an electronic device with the target application installed. Then, after obtaining the point cloud model corresponding to the target, the server may send the point cloud model corresponding to the target to all electronic devices installed with the target application. .
  • the server can add a point cloud model of the positionable object in the second electronic device without the user being aware of it. It can be understood that after the server obtains the point cloud model corresponding to the target, it sends the point cloud model corresponding to the target to the second electronic device, and then the second electronic device can store the point cloud model, and when the target needs to be tracked The point cloud model can be obtained directly from storage to improve tracking efficiency.
  • first electronic device and the second electronic device may also be the same electronic device.
  • the server renders the point cloud model corresponding to the target based on the automated rendering algorithm and generates multiple training images of the target.
  • the automated rendering algorithm can use neural rendering technology (Neural render).
  • the server can take the point cloud model corresponding to the target as input, select different perspective poses, simulate different background environments to render the point cloud model, and generate a rendered image with pose labels, obtaining Multiple training images.
  • the generated rendered images with pose labels can be of different lighting and different backgrounds, different lighting but the same background, or the same lighting but different backgrounds. There is no limit here; the perspective pose and The background environment can be preset data.
  • the server obtains the pose estimation model and feature vector corresponding to the target based on multiple training images of the target.
  • the server can store the pose estimation model and feature vector in storage.
  • the server has a feature vector retrieval library and a pose estimation model library.
  • the server can store the pose estimation model and the feature vector corresponding to the target respectively. into the pose estimation model library and feature vector retrieval library.
  • the server can train the original pose estimation model based on multiple training images, that is, finetune the original pose estimation model through multiple training images to obtain the pose estimation model corresponding to the target. ;
  • the server can extract feature vectors from the training images based on the feature extractor to obtain the feature vector corresponding to the target. It can be understood that there is a one-to-one correspondence between the target and the trained pose estimation network.
  • the feature vector of the target extracted by the server can be obtained based on one training image, or it can be obtained based on the feature vectors extracted from multiple training images.
  • the feature extractor please refer to the following. related content.
  • both the pose estimation model corresponding to the target and the feature vector corresponding to the target can be associated with the identity of the target. Then the server can obtain the pose estimation model corresponding to the target and the feature vector corresponding to the target from the pose estimation model library and the feature vector retrieval library respectively based on the identification of the target.
  • the second electronic device When detecting the second user operation, the second electronic device sends a positioning request to the server, where the positioning request includes the image to be processed.
  • the second electronic device may send the image to be processed to the server when detecting the second user operation.
  • the image to be processed obtained by the second electronic device please refer to the relevant content of step S501.
  • the server locates the target in the image to be processed based on the point cloud model, pose estimation model and feature vector corresponding to the target, and obtains the pose of the target.
  • the server can extract the feature vector of the target in the image to be processed; based on the extracted feature vector, obtain the feature vector corresponding to the target in the server storage; and then, based on the feature vector corresponding to the target, determine the identity of the target; based on the target identification, determine the pose estimation model and point cloud model corresponding to the target; based on the pose estimation model and point cloud model corresponding to the target, locate the target in the image to be processed, and obtain the pose of the target.
  • step S503 Regarding the specific content of the server positioning the target, please refer to the relevant content in the following step S503 to step S508.
  • the server sends the target's pose to the second electronic device.
  • the server may send the identification of the target and the pose of the target to the second electronic device.
  • the server may also send the target's auxiliary information to the second electronic device.
  • the second electronic device tracks the target based on the pose of the target and the point cloud model corresponding to the target.
  • the second electronic device stores a point cloud model corresponding to the target, and then after receiving the pose of the target and the identification of the target, the second electronic device can query the point cloud model corresponding to the target based on the identification of the target; Furthermore, the target in the image obtained after the image to be processed can be tracked based on the target's pose and point cloud model.
  • the second electronic device can also display the target's auxiliary information at a preset position of the target on the display screen.
  • the image including the target and the auxiliary information of the target is a combined virtual and real image rendered after the second electronic device locates the target.
  • the auxiliary information of the target displayed on the virtual combined image may also be called virtual information.
  • the target is a vehicle
  • the auxiliary information of the target is the outline of the vehicle and the introduction information of the vehicle.
  • the second electronic device can render the outline of the vehicle on the vehicle in the image to be processed, and render the introduction information of the vehicle at a position around the vehicle.
  • the contour line and the introduction information are virtual content rendered on the image to be processed, the contour line is used to indicate the target, and the introduction information is used to introduce the identified target.
  • the vehicle is a white vehicle, and the contour lines in the target corresponding content information are blue, then the rendered blue lines on the vehicle contour in the image displayed on the second electronic device.
  • the above target positioning method may be executed only by the electronic device.
  • the electronic device can generate a point cloud model, pose estimation model, and feature vector corresponding to the target based on the CAD model corresponding to the target, and position the target based on the point cloud model, pose estimation model, and feature vector corresponding to the target;
  • the first electronic device can generate the point cloud model, pose estimation model and feature vector corresponding to the target based on the CAD model corresponding to the target, and send the point cloud model, pose estimation model and feature vector corresponding to the target to the second electronic device
  • the second electronic device can locate the target based on the point cloud model, pose estimation model and feature vector corresponding to the target.
  • the scene includes a user, an electronic device, and a target, where the user is the user of the electronic device, and the target is the object that the electronic device is positioning and tracking, such as a vehicle.
  • a user can hold an electronic device to photograph a vehicle, and then the electronic device can locate the vehicle based on the captured image.
  • the electronic device can also track the vehicle or render and display auxiliary information of the vehicle around the vehicle. any one or more functions in the information.
  • the image displayed by the electronic device is a rendered image.
  • the rendered image has auxiliary information such as vehicle width and length around the vehicle.
  • the display of the auxiliary information presents an AR effect.
  • the electronic device can position and track the target, thereby displaying auxiliary information of the target on the display screen of the electronic device.
  • the electronic device can be a mobile phone
  • the target can be a cultural relic in the museum
  • the auxiliary information of the target can be the introduction content of the cultural relic, such as the name, year and historical story of the cultural relic; specifically, the user is visiting the cultural relic.
  • electronic devices can position and track targets, thereby displaying virtual content corresponding to the targets in the virtual scene.
  • the electronic device can be VR glasses
  • the target can be other game users
  • the virtual content corresponding to the target is the virtual image of other game users in the game scene; specifically, the user can wear VR glasses
  • the VR glasses can capture images including other game users, and then position and track the game user to show that the game user is in the game. Avatar in the scene.
  • FIG. 5 schematically shows the flow of another target positioning method provided by the embodiment of the present application. It should be noted that this target positioning method describes a specific implementation of identifying and tracking a target. The embodiment shown in FIG. 5 can be considered as a specific implementation of steps S306 to S309 in FIG. 3 .
  • the user operation may be the user's touch operation (such as a click operation, a long press operation, an up sliding operation, a sliding operation, or a side sliding operation), or it may be a non-contact operation (such as a spaced operation). Empty gesture), or the user's voice command, which is not specifically limited in the embodiments of the present application.
  • this targeting method may include some or all of the following steps:
  • the electronic device captures an image to be processed through the camera, and the image to be processed includes the target.
  • the targets can be different objects.
  • the target in a museum scene, the target can be a cultural relic; in a VR game scene, the target can be a game prop, and the target is not limited here.
  • the image to be processed may include multiple objects, among which the object that needs to be positioned is the above-mentioned target; the object may be an object or a person, which is not limited here.
  • there are many objects such as object A, object B, and object C in the image to be processed, and the target that the user wants to identify and locate is object A, then object A is the above-mentioned target.
  • the electronic device is installed with the target application; when the electronic device detects a user operation on the target application, in response to the user operation, the electronic device can turn on the camera and capture the image to be processed.
  • the target applications can be Huawei AR Map, Hetu Map Tool, Hetu City Check-in Application, etc., which can be used for special effects display, virtual spraying and stylization, etc.
  • the user holds an electronic device, points the camera of the electronic device at the target, and inputs a user operation for the target application on the electronic device. Accordingly, the electronic device turns on the camera to shoot, and obtains an image to be processed including the target.
  • the electronic device can acquire multiple images to be processed. Then, the electronic device can perform the target positioning method provided by the embodiment of the present application on one or more images to be processed.
  • the electronic device sends a positioning request to the server.
  • the positioning request includes the image to be processed.
  • the positioning request is used to request positioning of the target in the image to be processed.
  • the electronic device is installed with the target application; when the electronic device detects the user's operation on the target application, in response to the user's operation on the target application, the electronic device can capture the image to be processed through the camera, and then, The image to be processed is sent to the server.
  • the target application displays the user interface 62 as shown in Figure 10B.
  • the electronic device detects the user's operation on the option bar 622 and the confirmation control 623, in response to the user's operation on the target application, the electronic device can obtain the image to be processed, Send the image to be processed to the server.
  • the server identifies the subject from the image to be processed.
  • the server can input the image to be processed into the subject detection model to obtain the position of the target in the image to be processed.
  • the subject detection model can be trained based on the sample image and the marked target; the position of the target in the image to be processed can be represented by the two-dimensional coordinates of the point corresponding to the target in the pixel coordinate system corresponding to the image to be processed, for example, The position of the target can be represented by the two-dimensional coordinates of the upper left corner and the upper right corner of the rectangle centered on the geometric center of the target.
  • the subject detection model can use Enhanced-ShuffleNet as the backbone network and Cross-Stage-Partial Path-Aggregation Network (CSP-PAN) as the detection head.
  • CSP-PAN Cross-Stage-Partial Path-Aggregation Network
  • the first module in the subject detection model includes a 3 ⁇ 3 convolution layer (3 ⁇ 3Conv), a maximum pooling layer (Max Pool) and Enhanced Shuffle Block;
  • the second module includes a 1 ⁇ 1 convolution layer (1 ⁇ 1Conv) with a stride of 8, 16 and 32 respectively, an upsample layer (Upsample), and a cross-stage Partial network (Cross Stage Partial, CSP) and detection heads (Heads) with strides of 8, 16 and 32 respectively; finally, after the three-channel data are processed by classification (Class) and detection box (box) respectively, Through Non-Maximum Suppression (NMS), the subject detection result can be output, that is, the identified subject.
  • NMS Non-Maximum Suppression
  • the label allocation mechanism can use a label allocation algorithm such as SimOTA.
  • the loss function can use the zoom loss function (Varifocal Loss) and Generalized Intersection over Union Loss (GIoU Loss). For example, the loss matrix is calculated using a weighted combination of Varifocal Loss and GIoU Loss. The subject detection model is optimized through this loss matrix. When the loss meets the preset conditions When , the above subject detection model is obtained.
  • the subject detection model shown in Figure 6 and the above training method are an implementation provided by the embodiment of the present application.
  • Other neural networks can also be used as the subject detection model, and the training method and loss function can also have other choices.
  • the server may also determine the object with the largest area in the image to be processed or the object at the focus as the target. There is no limitation on the method of the server identifying the target.
  • the server determines the image block where the target is located from the image to be processed.
  • the server may intercept the image block where the target is located from the image to be processed based on the subject recognized in the image to be processed. It is understandable that the subsequent server can locate the target based on the image block where the target is located, which can avoid the influence of other objects in the image to be processed on the positioning target.
  • FIG. 7 is a schematic diagram of determining an image block where a target is located according to an embodiment of the present application.
  • (a) in Figure 7 is the image to be processed, and the image to be processed includes street lights and vehicles; assuming that the vehicle is the target, point A and point B in (b) in Figure 7 can be
  • the two-dimensional coordinates of the pixel coordinate system corresponding to the processed image represent the location of the subject recognized by the server.
  • the black dots represent point A and point B; further, the server can determine based on the two-dimensional coordinates of point A and point B.
  • the image block where the target is located, the image block where the target is located can be shown in (c) in Figure 7.
  • the server extracts the feature vector from the image block where the target is located.
  • the server can input image patches into a feature extractor to obtain feature vectors.
  • the feature vector may be a 1024-dimensional feature vector; the feature extractor may be trained based on the sample image and the labeled feature vector.
  • the feature extractor can be a convolutional neural network (Convolutional Neural Networks, CNN) or a transformation neural network (Transformer).
  • the feature extractor can be a deep self-attention transform network (Vision Transformer, VIT), such as VIT-16, no limitation here. It should be noted that this feature extractor can also be called a feature extraction network.
  • the server when the server trains the feature extractor, it can adopt the Metric Learning training strategy, calculate the loss matrix through the Angular Contrastive Loss function, and train the feature extractor based on the loss matrix. Get the above feature extractor.
  • the server may not execute steps S504 and S505. Instead, after identifying the target from the image to be processed in step S503, it may directly extract the feature vector from the image to be processed based on the position of the target. , and then execute step S506. For example, any one or more of the image data or the position of the target can be used as the input of the feature extractor, thereby obtaining the extracted feature vector. It can be understood that in step S505, the server extracts feature vectors from the image block where the target is located, which can avoid errors caused by objects other than the target in the image to be processed when the server extracts feature vectors.
  • the server obtains the target feature vector from the feature vector retrieval library based on the extracted feature vector.
  • the server can search for the feature vector corresponding to the extracted feature vector from the feature vector retrieval library, and the found feature vector is the target feature vector. For example, the server can obtain the feature vector with the greatest cosine similarity to the extracted feature vector from the feature vector retrieval library based on the cosine similarity algorithm, and determine the feature vector with the greatest cosine similarity to the extracted feature vector as the target feature vector.
  • the feature vector retrieval library may include at least two feature vectors.
  • the target feature vector is a feature vector generated based on the three-dimensional model corresponding to the target.
  • the server when the user uploads a three-dimensional model corresponding to the target when registering the target, the server generates a training image based on the three-dimensional model corresponding to the target; generates a feature vector corresponding to the target based on the training image; and stores the feature vector corresponding to the target in Feature vector retrieval library.
  • the server can use a related algorithm (such as the above-mentioned cosine similarity algorithm) to use the feature vector in the feature vector retrieval library that is closest to the feature vector extracted in step S505 as the target feature vector.
  • a related algorithm such as the above-mentioned cosine similarity algorithm
  • the extraction method of the feature vector in the feature vector retrieval library and the extraction method of the feature vector in step S505 can be The same, such as using the same feature extractor.
  • the server may first normalize the extracted feature vectors, and then search for the target feature vector from the feature vector retrieval library based on the normalized feature vectors.
  • the server can first perform L2 normalization (L2 Normalization) on the feature vector, and then based on the cosine similarity algorithm, obtain the feature vector with the largest cosine similarity to the processed feature vector from the feature vector retrieval library, and then This feature vector serves as the target feature vector.
  • L2 Normalization L2 Normalization
  • cosine similarity algorithm obtain the feature vector with the largest cosine similarity to the processed feature vector from the feature vector retrieval library, and then This feature vector serves as the target feature vector.
  • normalization processing can improve the accuracy of data processing, and thus the server can find the target feature vector more accurately.
  • the server obtains the target pose estimation model and the target point cloud model based on the target feature vector.
  • the feature vector of the target corresponds to the pose estimation model and point cloud model of the target
  • the service The device can obtain the pose estimation model and point cloud model corresponding to the target feature vector, and obtain the target pose estimation model and target point cloud model.
  • the server can obtain the pose estimation model corresponding to W from the pose estimation model library, and obtain the point cloud model corresponding to W from the point cloud model library.
  • the pose estimation model corresponding to W is the same as the pose estimation model corresponding to W.
  • the point cloud model is the target pose estimation model and the target point cloud model.
  • the target has a unique identifier, and the target identifier corresponds to the target's feature vector, pose estimation model, and point cloud model.
  • the server can obtain the target's identifier based on the target feature vector; the server can obtain the target's identifier. Identify the corresponding pose estimation model and point cloud model, and obtain the target pose estimation algorithm model and target point cloud model.
  • the identification of the target may be the ID of the target, such as the fine-grained category of the target. Taking the target as a vehicle as an example, the identity of the vehicle may be the name of the vehicle.
  • the target pose estimation model and the target point cloud model can be stored in the server, that is, the server can query the target pose estimation model and the target point cloud model from the stored data.
  • the server can include a feature vector retrieval library, a pose estimation model library, and a point cloud model library.
  • the server can store them separately.
  • the server can also query the feature vector, pose estimation model and point cloud model corresponding to the target from the feature vector retrieval library, pose estimation model library and point cloud model library based on the target's identification.
  • the server obtains the pose of the target based on the image block where the target is located, the target pose estimation model, and the target point cloud model.
  • the target pose estimation model includes key point identification model and PnP algorithm.
  • the server can input the image block where the target is located into a key point recognition model to obtain the positions of at least four key control points in the image block, where the key point recognition model is used to obtain the predetermined target point cloud model. The defined position of the key control point in the image block.
  • the position of the key control point in the image block can be the two-dimensional coordinates of the key control point in the pixel coordinate system corresponding to the image block where the target is located; the server can be based on at least four key
  • the position of the control point in the target point cloud model and the image block, as well as the device camera parameters, are used to determine the pose of the target in the camera coordinate system through the pnp algorithm, and the initial pose of the target is obtained; finally, the initial pose can be The pose is the pose of the target in the camera coordinate system.
  • the key point identification model can be a pixel-wise voting network (PVNet); it can also be Efficient-PVNet that uses the EfficientNet-b1 network structure to optimize PVnet; it can also be other deep learning models. There are no restrictions anywhere. It should be noted that the key point identification model can also be called a key point regression neural network.
  • PVNet can use a deep residual network (ResNet) as the backbone network (backbone), such as ResNet-18.
  • backbone backbone
  • ResNet-18 backbone network
  • Efficient-PVNet can be used as the key point identification model. Since Efficient-PVNet uses the EfficientNet-b1 network structure to optimize the structure of PVnet, it can improve the performance of the network at higher resolution (640*480). fitting ability.
  • Efficient-PVNet requires less computing power, is faster, and has stronger generalization capabilities.
  • the server can optimize the initial pose of the target and use the optimized pose as the target's pose in the camera coordinate system.
  • the image to be processed is a real RGB image
  • the server can reproject the target onto the real RGB image based on the initial pose, target point cloud model, and camera parameters, and render the rendered RGB image; respectively, in the real The RGB image and the rendered RGB image determine the feature points of several targets, and pair the feature points in the real RGB image with the feature points in the rendered RGB image one by one; then, based on the feature points in the rendered RGB image and the target point cloud Correspondence between the 3D points of the model, and the pairing relationship between the feature points in the real RGB image and the feature points in the rendered RGB image, determine the two-dimensional coordinates of at least four feature points in the image to be processed and in the target point cloud model three-dimensional coordinates (that is, at least four 2D-3D matching point pairs); finally, based on at least four 2D-3D matching point pairs, the optimized pose is obtained through the PnP method, and the optimized pose is used as the target in The pose in the camera coordinate system.
  • the feature point M in the real RGB image and the feature point m in the rendered RGB image are two matching points
  • the feature point m in the rendered RGB image and the 3D point N of the target point cloud model are two matching points.
  • the feature point m in the rendered RGB image and the 3D point N of the target point cloud model can be a pair of 2D-3D matching points.
  • the specific process for the server to determine the feature points of several targets in the real RGB image and the rendered RGB image may include: first, based on the initial pose, target point cloud model and camera parameters, reproject the target onto the image to be processed, render Then, the depth map is used as a mask to obtain the bounding box of the target from the rendered RGB image; based on the bounding box, image blocks are intercepted from the real RGB image and the rendered RGB image respectively, and two images including the target are obtained.
  • image blocks use two-dimensional image feature extraction methods, such as superpoint features or image processing detection point (Oriented FAST and Rotated BRIEF, ORB) features, to extract feature points and corresponding descriptors of two image blocks including the target , and perform one-to-one matching at the same time.
  • two-dimensional image feature extraction methods such as superpoint features or image processing detection point (Oriented FAST and Rotated BRIEF, ORB) features, to extract feature points and corresponding descriptors of two image blocks including the target , and perform one-to-one matching at the same time.
  • the server can also calculate the pose error between the optimized pose and the target in the real RGB image; when the pose error is not less than the preset error value, the optimized pose The pose is iteratively optimized.
  • the specific optimization process can be seen in the above related content. Until the pose error is less than the preset error value, the iteratively optimized pose is used as the target pose in the camera coordinate system.
  • the initial pose of the target cannot guarantee sufficient accuracy to support the subsequent 3D tracking process; the embodiment of this application does not guarantee the initial pose of the target.
  • the pose is optimized twice iteratively, using iterative rendering to calculate the reprojection error and update the pose, which can improve the precision and accuracy of the target's pose in the camera coordinate system.
  • the server sends the target's pose to the electronic device.
  • the server sends the identification of the target and the pose of the target to the electronic device.
  • the identification of the target can be used by the electronic device to obtain the point cloud model corresponding to the target.
  • the server renders the target in the image to be processed based on the pose of the target, and displays the rendered image.
  • the color is rendered on the shell of the vehicle and the rendered image is displayed so that the user can clearly see the recognized vehicle.
  • the server also sends the target's auxiliary information to the electronic device. Then the electronic device can receive the target's pose and the target's auxiliary information. After receiving the target's pose, the electronic device can, based on the target's pose, Render the auxiliary information of the target in the image to be processed, and then display the rendered image.
  • the electronic device tracks the target based on the target's pose and the target point cloud model.
  • the electronic device stores the target point cloud model. After receiving the pose of the target and the identification of the target, the electronic device can query the target point cloud model based on the identification of the target; furthermore, the electronic device can query the target point cloud model based on the location of the target. pose and target point cloud models to track targets in images acquired after the image to be processed.
  • the following is an exemplary method for tracking a target, see Figure 9.
  • the method includes the following steps:
  • the following server obtains the pose of the target based on the above pose estimation model, which is called the positioning pose.
  • the pose of the target in the next frame image obtained by the electronic device based on the target's pose in the previous frame image is called Tracking pose. It should be noted that in the process of acquiring multiple images, the positioning of the target in the image by the electronic device can be calculated every 100 frames. The positioning pose of the image, and all frame tracking between the two positionings, must use the previous positioning pose.
  • the electronic device acquires the current image frame, and the current frame image includes the target.
  • tracking a target means taking continuously acquired images including the target as input during rigid body motion, and solving the pose of the target in each frame.
  • the current image frame is one of the continuously acquired images including the target.
  • the electronic device after the electronic device receives the positioning pose of the first frame of the image to be processed sent by the server, it can use the image taken in the next frame as the current image frame; it can also use the image taken after the first frame of the image to be processed.
  • Each frame including the image of the target is regarded as the current image frame.
  • the electronic device calculates the pose correction amount based on the energy function and the pose of the target in the current image frame.
  • the electronic device can solve the optimal correction amount corresponding to the current image frame through the Gauss-Newton method based on the energy function and the pose P 0 of the target in the current image frame.
  • the pose P 0 of the target in the current image frame can be the pose of the target obtained by interpolating the tracking pose of the previous frame image of the current image frame and the current SLAM pose of the electronic device, where the SLAM pose is the electronic device.
  • the device obtains the current position and orientation of the electronic device in the environment coordinate system based on the SLAM algorithm.
  • the tracking pose of the previous frame of the image to be processed may be the positioning pose of the target obtained by the server performing S503 to step S508 on the previous frame of the image.
  • the electronic device can calculate the position and orientation of the current electronic device in the environmental coordinate system based on the SLAM algorithm.
  • x is the two-dimensional pixel point of the target object in the image
  • R is the rotation component
  • T is the component
  • is the projection transformation used for image imaging
  • the default is the camera imaging model function.
  • the energy function E( ⁇ ) can use the parameter ⁇ as a variable to represent the pose of the target.
  • E gravity ( ⁇ ) is the gravity axis constraint
  • E RBOT ( ⁇ ) is the area constraint
  • E cloudrefine ( ⁇ ) is the pose estimation algorithm constraint
  • E regular ( ⁇ ) is the regular term constraint
  • ⁇ 1 , ⁇ 2 , ⁇ 3 and ⁇ 4 are the weight coefficients of each term respectively; for ⁇ , corresponding methods in Lie algebra such as the perturbation model can be used for derivation and calculation.
  • the gravity axis constraint can be as follows:
  • V1 is the three-dimensional coordinates of several sampling points randomly sampled on a circle with the origin as the center and a preset radius on a cross section parallel to the horizontal plane in a three-dimensional coordinate system established with the geometric center of the target as the origin;
  • R is the three-dimensional coordinates to be Optimize the rotation component;
  • (RV) z is the z-axis component after rotation transformation of the sampling point, that is, the gravity axis direction component.
  • the above gravity axis constraint term can be added to the energy function.
  • the direction of the gravity axis is the direction perpendicular to the horizontal plane, and the object with a constant direction of the gravity axis refers to an object that will not flip in the direction of the gravity axis.
  • the vehicle can be considered to be an object with a constant direction of the gravity axis; for example, if the posture of a sphere usually flips in the direction of the gravity axis during rolling, it can be considered a sphere. It is not an object with a constant direction of gravity axis.
  • V2 is the sampling point on the target
  • R cloud and T cloud represent the target pose interpolated based on the positioning pose of the previous frame image of the current image frame and the current SLAM pose of the electronic device
  • R, T is the pose variable to be optimized.
  • the previous frame of the current image frame refers to the image uploaded to the server for positioning in the previous frame, which can also be called a key frame.
  • the interval between key frames can be dynamically determined based on the server load. , for example, there can be dozens to hundreds of frames between key frames.
  • correcting the target's pose through the positioning pose of the previous frame of the current image frame can reduce the cumulative error, improve the continuous tracking capability of the algorithm, and increase the robustness of the algorithm in harsh lighting environments.
  • I represents the current frame image
  • P f (I(x)) represents the probability distribution that I(x) belongs to the foreground area.
  • P b (I(x)) represents the probability distribution that I(x) belongs to the background area;
  • ⁇ (x) 0 ⁇
  • ⁇ f represents the foreground area (that is, the area where the target is located); ⁇ b represents the background area; C represents the outline of the target;
  • s is the hyperparameter of the He function, which can be set to 1.2.
  • the electronic device can use the rendering tool to perform fragment-by-fragment rendering based on the target's pose P 0 in the current image frame, the target point cloud data, and the camera imaging model function, and generate a The depth map of the target object under the current perspective of the device; furthermore, based on the depth map information as a mask, the outline of the target projection in the current image frame is obtained.
  • c is the contour point of the target in the current image frame
  • C is the three-dimensional space point corresponding to the contour point of the target in the current image frame
  • R and T are the rotation and translation components of the pose to be optimized
  • is the camera imaging model function.
  • the contour points of the target are sampling points on the contour of the target.
  • this energy function is used to ensure that the projection of the three-dimensional points of the contour after the pose is optimized does not differ too much from the previous two-dimensional position, so as to ensure the coherence of the changes in the contour points.
  • the electronic device updates the pose of the target in the current image frame based on the pose correction amount.
  • the electronic device may update the pose of the target in the current image frame based on the pose correction amount.
  • the electronic device calculates the energy function value based on the energy function and the pose of the target in the current image frame.
  • the electronic device after updating the pose of the current image frame, can input the pose of the target in the current image frame into the energy function to obtain the energy function value.
  • the preset condition may be that the energy function value is less than a preset threshold or the number of iterations reaches a predetermined number.
  • the electronic device can use the pose of the target in the current image frame as the tracking pose of the target; otherwise, perform steps S902 to S905 to The pose of the target in the current image frame is iteratively optimized until the tracking pose of the target is output.
  • the electronic device can track the target and display the auxiliary information of the target on each frame of image including the target.
  • the target is a vehicle
  • the auxiliary information of the target is the vehicle model, vehicle manufacturer, vehicle data, etc.
  • the electronic device can render the vehicle model, manufacturer, vehicle data, etc. in the image to be processed.
  • the vehicle can be displayed around the vehicle.
  • Vehicle model, manufacturer, vehicle data, etc.; for example, in a video recording scene, each frame of image displayed by the electronic device has auxiliary information for rendering.
  • the display of auxiliary information can present AR effects.
  • the target can also be tracked by the server.
  • the electronic device can send each frame of image acquired to the server; the server can perform target positioning on the key frame image, and the server can also position the two frames of key frame images based on the positioning results of the target in the previous frame image.
  • Target tracking is performed on images between key frame images.
  • the method for positioning the target by the server can be consistent with the method for positioning the target by the above-mentioned electronic device.
  • the method for tracking the target by the server can be the same as the method for tracking the target by the above-mentioned electronic device. The method is the same; the server can send the tracking results of the target to the electronic device.
  • Figures 10A-10D are user interfaces implemented on electronic devices.
  • Figure 10A shows an exemplary user interface 61 on the first terminal for displaying installed applications.
  • the user interface 61 may include: a status bar, a calendar indicator, a weather indicator, an icon 611 of a target application, an icon of a gallery application, icons of other applications, etc.
  • the status bar may include: one or more signal strength indicators of mobile communication signals (also called cellular signals), one or more signal strength indicators of Wi-Fi signals, battery status indicators, and time indicators. wait.
  • the user interface 61 shown in FIG. 10A may be a home screen. It can be understood that FIG. 10A only illustrates a user interface of an electronic device and should not be construed as limiting the embodiments of the present application.
  • the electronic device when the electronic device detects a user operation on the icon 611 of the target application, in response to the user operation, the electronic device may display the user interface 62 shown in FIG. 10B .
  • the user can also trigger the electronic device to display the user interface 62 through other methods, which is not limited in the embodiment of the present application.
  • the electronic device can enter a preset URL in the input box of the browser to trigger the electronic device to display the user interface 62 .
  • the user interface 62 may display an option bar 621 , an option bar 622 , and a confirmation control 623 .
  • the option column 621 is used for the user to register an identifiable target
  • the option column 622 is used for the user to identify the target.
  • the user can click the option bar 621; then, as shown in Figure 10B, click the confirmation control 623.
  • the electronic device can display the image User interface 63 shown in 10C.
  • the user can also click the option bar 621.
  • the electronic device can display the user interface 63 shown in FIG. 10C.
  • the user interface 63 may include a name input box 631, an import control 632, an information input field 633, an import control 634 and a confirmation control 635.
  • the name input box 631 is used to input the name of the target;
  • the import control 632 is used to input the CAD model corresponding to the target;
  • the information input box 633 is used to input the information corresponding to the target; and
  • the import control 634 is used to input related images of the target.
  • the user can click the confirmation control 635 after inputting the target data on the user interface 63.
  • the electronic device detects the above user operation and uploads the input target data to the server in response to the above user operation.
  • the user can click the option bar 622; then, as shown in Figure 10B, click the confirmation control 623.
  • the electronic device can display User interface 64 shown in Figure 10D.
  • the user can also click the confirmation option bar 622.
  • the electronic device can display the user interface 63 shown in FIG. 10C.
  • the user interface 64 may display the captured image or the image obtained by performing the above target positioning method on the captured image.
  • Figure 10D exemplarily shows an image rendered after positioning the target, in which the vehicle is the target; the dotted lines around the vehicle as well as the vehicle length, vehicle width and other displayed contents are the auxiliary information of the vehicle.
  • the vehicle is the real information captured in the image
  • the dotted lines around the vehicle, vehicle length, vehicle width and other displayed contents are virtual information rendered in the preset area around the target after positioning the target. This virtual information is only displayed electronically.
  • the information displayed on the device's display is not real-world information.
  • Embodiments of the present application also provide an electronic device.
  • the electronic device includes one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, and the one or more memories are used to For storing computer program code, the computer program code includes computer instructions.
  • the electronic device When one or more processors execute the computer instructions, the electronic device causes the electronic device to perform the method described in the above embodiments.
  • Embodiments of the present application also provide a computer program product containing instructions, which when the computer program product is run on an electronic device, causes the electronic device to execute the method described in the above embodiment.
  • Embodiments of the present application also provide a computer-readable storage medium, which includes instructions.
  • the electronic device When the instructions are run on an electronic device, the electronic device causes the electronic device to execute the method described in the above embodiment.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product.
  • said computer program product package Contains one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in this application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated therein.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请提供了一种目标定位方法、系统和电子设备,方法包括:服务器接收第一电子设备发送的定位请求,定位请求包括待处理图像;服务器识别待处理图像中的目标;服务器从位姿估计模型库中查找目标对应的目标位姿估计模型,位姿估计模型库包括多个对象分别对应的位姿估计模型;服务器基于待处理图像和目标位姿估计模型,得到目标的位姿;服务器将目标的位姿发送至第一电子设备。

Description

一种目标定位方法、系统和电子设备
本申请要求于2022年05月11日提交中国专利局、申请号为202210508427.7、申请名称为“一种目标定位方法、系统和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及电子技术领域,尤其涉及一种目标定位方法、系统和电子设备。
背景技术
随着增强现实(Augmented Reality)技术的发展,出现了越来越多的AR产品,例如华为AR地图等。在AR场景中,对真实世界物体的识别和空间定位是沟通现实世界和虚拟世界的重要桥梁,也是数字孪生(digital twins)的关键技术。
目前,三维识别和跟踪算法只能对预先定义的物体进行识别与追踪,无法通过在线更新的方式,即时添加感兴趣的目标物体,导致能够三维识别跟踪的目标物体有限,无法满足数字孪生系统对数据可持续性拓展的能力要求以及用户多样化、个性化的识别需求。
发明内容
本申请提供了一种目标定位方法、系统和电子设备,该方法中,服务器中的位姿估计模型库中包括多个位姿估计模型,目标与位姿估计模型相对应。通过该方法,服务器增加可定位目标时仅需在位姿估计模型库中增加该目标对应的位姿估计模型,可以满足用户在线拓展目标数量的需求,大幅增加可定位的目标的数量。
第一方面,本申请实施例提供了一种目标定位方法,应用于服务器,该方法包括:
服务器接收第一电子设备发送的定位请求,定位请求包括待处理图像;
服务器识别待处理图像中的目标;
服务器从位姿估计模型库中查找目标对应的目标位姿估计模型,位姿估计模型库包括多个对象分别对应的位姿估计模型;
服务器基于待处理图像和目标位姿估计模型,得到目标的位姿;
服务器将目标的位姿发送至第一电子设备。
实施本申请实施例,服务器可以在接收到电子设备发送的待处理图像后,识别待处理图像中的目标,从位姿估计模型库中查找目标对应的目标位姿估计模型,进而,基于待处理图像和目标位姿估计模型,得到目标的位姿。其中,服务器中的位姿估计模型库包括多个对象分别对应的位姿估计模型。通过该方法,服务器增加可定位目标时仅需在位姿估计模型库中增加该目标对应的位姿估计模型,可以满足用户在线拓展目标数量的需求,大幅增加可定位的目标的数量。
结合第一方面,在一种可能的实现方式中,在服务器从位姿估计模型库中查找目标对应的目标位姿估计模型之前,方法还包括:
服务器接收第二电子设备发送的目标对应的三维模型;
服务器对目标对应的三维模型进行渲染,生成多张训练图像;
服务器基于多张训练图像对初始位姿估计模型进行训练,得到目标位姿估计模型。
在本申请实施例中,服务器可以接收来自电子设备的目标对应的三维模型,如目标的计算机辅助设计模型或点云模型;进而,基于该三维模型,生成该目标对应的位姿估计模型,即目标位姿估计模型。该方法中,用户可以通过电子设备向服务器发送自己感兴趣的目标的三维模型,从而使服务器实现对目标进行定位的功能,该方法可以实现用户在线拓展目标数量的需求,大幅增加服务器可定位的目标的数量。
在一种可能的实现方式中,电子设备安装有定位应用,用户可以通过定位应用向服务器发送目标对应的三维模型;服务器可以接收来自不同电子设备发送的目标对应的三维模型,生成多个目标对应的位姿估计模型,以使在有电子设备需要对目标进行定位时实现对目标的定位。
结合第一方面,在一种可能的实现方式中,服务器识别待处理图像中的目标包括:
服务器从待处理图像中提取特征向量;
服务器从特征向量库中查询与提取的特征向量相似度最高的特征向量;特征向量库包括多个对象的标识分别对应的特征向量;
服务器基于查询到的特征向量对应的标识,确定目标。
在一种可能的实现方式中,服务器可以先从待处理图像中确定包括目标的图像块,进而,从该图像块中提取特征向量,该方法可以避免待处理图像中除目标外的其它对象对识别造成干扰,从而提高识别目标的准确度。
结合第一方面,在一种可能的实现方式中,目标位姿估计模型包括关键点识别模型和N点透视算法,服务器基于待处理图像和目标位姿估计模型,得到目标的位姿,包括:
服务器将待处理图像输入关键点识别模型,得到目标对应的至少四个关键点的二维坐标;
服务器基于目标对应的三维模型,确定至少四个关键点的三维坐标;
服务器基于至少四个关键点的二维坐标和三维坐标,通过N点透视算法,得到目标的位姿。
结合第一方面,在一种可能的实现方式中,服务器将目标的位姿发送至第一电子设备之前,方法还包括:
服务器基于目标的位姿,将目标对应的三维模型重投影至待处理图像上,得到渲染图像;
服务器执行M次优化过程,M为正整数,优化过程包括:基于渲染图像和待处理图像,计算优化位姿;服务器基于待处理图像和优化位姿,计算位姿误差;
服务器在位姿误差小于预设误差值时,将优化位姿更新为目标的位姿;在位姿误差不小于预设误差值时,基于优化位姿更新渲染图像,执行优化过程。
在本申请实施例中,服务器通过上述重投影过程可以对位姿进行优化,提高定位的准确性。
结合第一方面,在一种可能的实现方式中,在服务器接收第一电子设备发送的定位请求之前,方法还包括:
将目标对应的三维模型发送至第一电子设备,三维模型用于第一电子设备结合目标的位姿对目标进行跟踪。
在本申请实施例中,服务器可以在得到目标对应的三维模型后,将三维模型发送至第一电子设备;第一电子设备可以存储该三维模型,进而,在接收到目标的位姿后,第一电子设 备无需再从服务器获取该三维模型,而是直接从存储中获取该三维模型对目标进行跟踪。该方法通过预先将三维模型发送至电子设备,可以提高电子设备进行目标跟踪的效率。
在一种可能的实现方式中,也可以由服务器对目标进行跟踪。例如,电子设备可以将获取的每一帧图像都发送至服务器;服务器可以对关键帧图像进行目标定位,进而,基于两帧关键帧图像中的前一帧图像中目标的定位结果对两帧关键帧图像之间的图像进行目标跟踪,其中,服务器对目标进行定位的方法可以与上述电子设备对目标进行定位的方法一致,对目标进行跟踪的方法可以与上述电子设备对目标进行跟踪的方法一致;进而,服务器将目标的跟踪结果发送至电子设备。
第二方面,本申请实施例提供了一种目标定位方法,应用于电子设备,该方法包括:
电子设备获取待处理图像;
电子设备向服务器发送定位请求,定位请求包括待处理图像;定位请求用于请求识别待处理图像中的目标和获得目标在待处理图像中的第一位姿;第一位姿是服务器基于从位姿估计模型库中查询的目标对应的目标位姿估计模型得到的,位姿估计模型库包括多个对象分别对应的位姿估计模型;
电子设备接收服务器发送的第一位姿;
电子设备基于第一位姿在待处理图像上渲染虚拟信息;
电子设备显示渲染后的图像。
结合第二方面,在一种可能的实现方式中,电子设备接收服务器发送的第一位姿之后,方法还包括:
电子设备获取当前图像帧,当前帧图像为获取待处理图像后获取的图像;
电子设备基于当前图像帧、目标对应的三维模型和第一位姿,确定目标在当前图像帧中的第二位姿。
结合第二方面,在一种可能的实现方式中,在电子设备基于当前图像帧、目标的三维模型和第一位姿,确定目标在当前图像帧中的第二位姿之前,包括:
电子设备接收服务器发送的目标的标识;
电子设备基于目标的标识从存储中获取三维模型,三维模型是电子设备从服务器获取后保存在存储中的。
结合第二方面,在一种可能的实现方式中,确定目标在当前图像帧中的第二位姿,包括:
电子设备执行N次优化过程,N为正整数,优化过程包括:电子设备基于能量函数、三维模型和第二位姿,计算位姿矫正量;基于位姿矫正量更新第二位姿;基于能量函数和第二位姿,计算能量函数值;
电子设备在能量函数值满足预设条件时,输出第二位姿;反之,执行优化过程。
结合第二方面,在一种可能的实现方式中,能量函数包括重力轴约束项、区域约束项、位姿估计算法约束项或正则项约束中的至少一项,重力轴约束项用于约束第二位姿在重力轴方向的误差;区域约束项用于基于当前帧图像的像素值约束第二位姿时目标的轮廓误差;位姿估计算法约束项用于基于估计位姿约束第二位姿的误差,估计位姿是基于第一位姿和基于实时定位与同步地图构建算法得到的电子设备的位姿得到的;正则项约束项是基于目标对应的三维模型约束第二位姿时目标的轮廓误差。
本申请实施例中,电子设备通过上述能量函数对目标的跟踪位姿进行优化,可以大幅提 高位姿的准确性。
结合第二方面,在一种可能的实现方式中,在电子设备向服务器发送定位请求之前,方法包括:
电子设备显示用户界面,用户界面包括注册控件;
电子设备在检测到针对注册控件的用户操作时,向服务器发送目标对应的三维模型。
结合第二方面,在一种可能的实现方式中,三维模型是电子设备接收外部输入得到的或电子设备基于用户操作生成的。
本申请实施例示例性提供了用户注册可定位目标的注册界面,用户可以通过在电子设备上显示的注册界面上的操作,向服务器发送目标对应的三维模型,以使服务器可以基于该三维模型,生成该目标对应的位姿估计模型,从而提高可以识别的目标的数量。
第三方面,本申请提供一种服务器。该服务器可包括存储器和处理器。其中,存储器可用于存储计算机程序。处理器可用于调用计算机程序,使得电子设备执行如第一方面或第一方面中任一可能的实现方式。
第四方面,本申请提供一种电子设备。该电子设备可包括存储器和处理器。其中,存储器可用于存储计算机程序。处理器可用于调用计算机程序,使得电子设备执行如第二方面或第二方面中任一可能的实现方式。
第五方面,本申请提供一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得该电子设备执行如第一方面或第一方面中任一可能的实现方式。
第六方面,本申请提供一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得该电子设备执行如第二方面或第二方面中任一可能的实现方式。
第七方面,本申请提供一种计算机可读存储介质,包括指令,当上述指令在电子设备上运行时,使得该电子设备执行如第一方面或第一方面中任一可能的实现方式。
第八方面,本申请提供一种计算机可读存储介质,包括指令,当上述指令在电子设备上运行时,使得该电子设备执行如第二方面或第二方面中任一可能的实现方式。
第九方面,本申请实施例提供了一种目标定位系统,该目标定位系统包括服务器和电子设备,该服务器为第三方面描述的服务器,该电子设备为第四方面描述的电子设备。
可以理解地,上述第三方面和第四方面提供的电子设备、第五方面和第六方面提供的计算机程序产品、第七方面和第八方面提供的计算机可读存储介质均用于执行本申请实施例所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
附图说明
图1是本申请实施例提供的一种目标定位系统的架构示意图;
图2A是本申请实施例提供的一种电子设备100的硬件结构示意图;
图2B是本申请实施例提供的电子设备100的软件结构框图;
图3是本申请实施例提供的一种目标定位方法;
图4是本申请实施例提供的一种场景示意图;
图5是本申请实施例提供的另一种目标定位方法流程图;
图6是本申请实施例提供的一种主体检测模型的示意图;
图7是本申请实施例提供的一种确定目标所在的图像块的示意图;
图8是本申请实施例提供的一种服务器存储数据的示意图;
图9是本申请实施例提供的一种跟踪目标的方法流程图;
图10A-图10D是本申请实施例提供的电子设备上实现的用户界面。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
本申请以下实施例中的术语“用户界面(user interface,UI)”,是应用程序或操作系统与用户之间进行交互和信息交换的介质接口,它实现信息的内部形式与用户可以接受形式之间的转换。用户界面是通过java、可扩展标记语言(extensible markup language,XML)等特定计算机语言编写的源代码,界面源代码在电子设备上经过解析,渲染,最终呈现为用户可以识别的内容。用户界面常用的表现形式是图形用户界面(graphic user interface,GUI),是指采用图形方式显示的与计算机操作相关的用户界面。它可以是在电子设备的显示屏中显示的文本、图标、按钮、菜单、选项卡、文本框、对话框、状态栏、导航栏、Widget等可视的界面元素。
首先,下面先对本申请实施例中涉及的技术术语进行描述。
1、位姿的定义
物体的位姿,是指物体在某个坐标系中的位置和姿态,可以用附着于物体上的坐标系的相对姿态来描述。
例如,物体可以用附着于物体上的坐标系B来表示,则物体相对于坐标系A的姿态等价与坐标系B相对于坐标系A的姿态。例如,机器人上以一固定点为原点,建立机器人坐标系F2,则机器人相对于环境坐标系F0的姿态等价于机器人相对于机器人坐标系F2相对于环境坐标系F0的姿态。
其中,坐标系B相对于坐标系A的姿态可以通过旋转矩阵R和平移矩阵T来表示,坐标系B相对于坐标系A的姿态可以表示为则物体相对于坐标系A的姿态可以用表示。需要说明的是,在坐标系A为环境坐标系F0时,物体的位姿可以用表示。
2、实时定位与同步地图构建(Simultaneous Localization And Mapping,SLAM)算法
目前用在SLAM上的传感器主要分为这两类,一种是基于激光雷达的激光SLAM(Lidar SLAM)和基于视觉的VSLAM(Visual SLAM)。其中,激光SLAM基于激光雷达返回的点云信息,视觉SLAM基于相机返回的图像信息。
3、像素坐标系
像素坐标系为二维坐标系,像素坐标系的单位是像素,坐标原点O在左上角。
以图像的左上角为原点O,可以建立像素坐标系,该像素坐标系的横坐标u和纵坐标v的单位均为像素。可以理解的,图像中像素的横坐标u与纵坐标v分别指示该像素所在的列数与行数。
4、N点透视算法(Perspective-n-Point,PnP)
PnP算法是求解3D到2D点对运动的方法,用于基于n个3D空间点以及n个3D空间点的投影位置时,估计物体的位姿。其中,n个3D空间点为物体上的点,n为正整数。其中,n个3D空间点的投影位置可以基于结构光系统得到,n个3D空间点的投影位置可以用像素坐标系中的坐标来表示。
PnP问题有很多种求解方法,例如用三对点估计位姿的P3P、直接线性变换(DLT)、EPnP。此外,还能用非线性优化的方式,构建最小二乘问题并迭代求解。
5、三维模型
三维模型是可以是指物体的三角形网格模型,即计算机辅助设计(Computer Aided Design,CAD)模型。
6.刚体
刚体是指在运动中受力作用后,形状和大小不变,而且内部各点的相对位置不变的物体。绝对刚体实际上是不存在的,只是一种理想模型,因为任何物体在受力作用后,都或多或少地变形,如果变形的程度相对于物体本身几何尺寸来说极为微小,在研究物体运动时变形就可以忽略不计。刚体在空间的位置,必须根据刚体中任一点的空间位置和刚体绕该点转动时的位置来确定,所以刚体在空间有六个自由度。
在本申请实施例中,目标的三维模型可以是用于代表某一刚体的三维空间数据,例如该刚体的CAD模型或点云模型。
为了更加清楚、详细地介绍本申请实施例提供的目标定位方法,下面先介绍本申请实施例提供的系统架构。
请参见图1,图1是本申请实施例提供的一种目标定位系统的架构示意图。如图1所示,该系统可以包括第一电子设备101,服务器102和第二电子设备103,第二电子设备103可以包括多个电子设备,其中:
第一电子设备101用于用户注册可识别目标。例如,第一电子设备可以接收用户输入的注册操作,向服务器102发送用户想要注册的目标对应的三维模型。其中,第一电子设备101也可以包括多个电子设备,如多个用户可以分别通过不同的电子设备向服务器发送不同目标对应的三维模型。
服务器102用于基于目标对应的三维模型生成目标对应的位姿估计模型、点云模型和特征向量,该点云模型为该三维模型进行预设点云格式转换后的模型;还用于基于目标对应的三维模型、位姿估计模型和点云模型,对待处理图像中的目标进行定位。该服务器可以为云端服务器,也可以为边缘服务器。
第二电子设备103用于向服务器发送待处理图像;接收服务器发送的待处理图像中目标的位姿;还可以用于基于目标的位姿对目标进行跟踪。
需要说明的是,第一电子设备101和第二电子设备103也可以是同一个电子设备,也即是说,该电子设备具备第一电子设备101和第二电子设备103的功能。也就说是,用户在第 一电子设备在注册可识别目标后,用户可以通过第一电子设备的目标应用对该目标进行识别和定位。
在一些实施例中,用户可以通过第一电子设备101注册可识别目标。例如,第一电子设备101可以检测到注册可识别目标的用户操作时,向服务器102发送该目标对应的三维模型;服务器102可以基于目标对应的三维模型生成目标对应的位姿估计模型、点云模型和特征向量;进而,服务器102可以保存目标对应的位姿估计模型、点云模型和特征向量,向所有安装有目标应用的第二电子设备103发送目标对应的点云模型。其中,目标应用可以为华为AR地图,河图图鸦工具,河图城市打卡应用等,可用于特效展示,虚拟喷涂以及风格化等。
本申请实施例中,用户可以通过第二电子设备103识别目标。假设该目标为注册在服务器中的对象,即服务器包括该目标对应的位姿估计模型、点云模型和特征向量。例如,第二电子设备103可以在检测到识别目标的用户操作时,向服务器102发送识别请求,该识别请求可以包括待处理图像;服务器102基于目标对应的三维模型生成目标对应的位姿估计模型、点云模型和特征向量,对待处理图像中的目标进行定位,得到目标的位姿;服务器102可以将目标的位姿发送至第二电子设备103。进一步的,第二电子设备103还可以基于目标的位姿和目标对应的点云模型对目标进行跟踪。
在本申请实施例中,上述第一电子设备101和上述第二电子设备103均为电子设备,该电子设备可以是手机、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备和/或智慧城市设备等等。
图2A示例性示出了电子设备100的硬件结构示意图。
应该理解的是,电子设备100可以具有比图中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
电子设备100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器 (application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
SIM接口可以被用于与SIM卡接口195通信,实现传送数据到SIM卡或读取SIM卡中数据的功能。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。本申请实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency  modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星目标定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用(比如人脸识别功能,指纹识别功能、移动支付功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如人脸信息模板数据,指纹信息模板等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。本申请实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。本申请实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有 导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。本申请实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池 142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。本申请实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于合成请求,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。
本申请实施例中,电子设备100可以通过处理器110执行所述目标定位方法。
图2B是本申请实施例提供的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2B所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息和目标应用(例如定位应用)等应用程序。
在一些实施例中,用户可以通过目标应用向服务器发送目标的CAD模型,还可以通过目标应用对目标进行识别、定位以及跟踪,具体内容可见下文中的相关描述。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2B所示,应用程序框架层可以包括显示(display)管理器,传感器(sensor)管理器,跨设备连接管理器,事件管理器,任务(activity)管理器,窗口管理器,内容提供器,视图系统,资源管理器,通知管理器等。
显示管理器用于系统的显示管理,负责所有显示相关事务的管理,包括创建、销毁、方向切换、大小和状态变化等。
传感器管理器负责传感器的状态管理,并管理应用向其监听传感器事件,将事件实时上报给应用。
跨设备连接管理器用于和其他设备建立通信连接以共享资源。
事件管理器用于系统的事件管理服务,负责接收底层上传的事件并分发给各窗口,完成事件的接收和分发等工作。
任务管理器用于任务(Activity)组件的管理,包括启动管理、生命周期管理、任务方向管理等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。窗口管理器还用于负责窗口显示管理,包括窗口显示方式、显示大小、显示坐标位置、显示层级等相关的管理。
以上各个实施例的具体执行过程可以参见下文中目标定位方法的相关内容。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库(也可称为数据管理层)可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)和事件数据等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
基于图1所示的系统架构、图2A和图2B所示的电子设备100,介绍本申请实施例提供的目标定位方法。
请参见图3,图3是本申请实施例提供的一种目标定位方法。需要说明的是,本申请实施例中,第一用户操作或第二用户操作中的任意一个可以为用户的触控操作(例如点击操作、长按操作、上滑操作、下滑操作或侧滑操作),也可以为非接触操作(例如隔空手势),还可以为用户的语音指令,本申请实施例对此不做具体限制。
如图3所示,该方法包括以下步骤:
S301、第一电子设备检测到第一用户操作时,将目标对应的CAD模型发送至服务器。
其中,需要定位的对象为上述目标,对象可以为物体,也可以为人物,此处不作限定;CAD模型可以为空间点云模型数据或网格(mesh)模型数据。需要说明的是,本申请实施例针对的对象一般为刚体,例如静止的车辆以及博物馆内的文物等。
在一种实现中,第一电子设备上可以安装有目标应用,电子设备可以显示目标应用上的相关用户界面,例如该界面上有确认上传的控件;当第一电子设备检测到用户针对该控件的第一用户操作,响应于用户针对该控件的第一用户操作,第一电子设备可以向服务器发送目标对应的CAD模型。其中,目标对应的CAD模型可以是第一电子设备生成的,例如用户通过该目标应用,或其他应用生成目标对应的CAD模型;也可以是第一电子设备从其他设备获取的,此处不作限定。
可选地,第一电子设备还可以将目标的辅助信息发送至服务器,其中,目标的辅助信息可以为目标的名称或目标的特征等信息;该目标的辅助信息可以用于电子设备在包括目标的图像进行渲染虚拟信息。
例如用户可以在第一电子设备上绘制车辆的CAD模型,示例性的,当第一电子设备检测到用户的绘制操作,响应于用户的绘制操作,第一电子设备可以生成车辆的CAD模型;第一电子设备还可以在检测到用户要将车辆的CAD模型上传至服务器的操作时,向服务器发送车辆的CAD模型。可选地,用户还可以向服务器发送车辆的辅助信息,如车长和车宽等信息,示例性的,当第一电子设备检测到用户要将车辆的辅助信息上传至服务器的操作时,第一电子设备可以向服务器发送车辆的辅助信息。其中,第一电子设备生成CAD模型和发送CAD模型的应用可以为同一个应用,也可以为不同应用。
可以理解的,第一电子设备也可以为多个电子设备,也即是,不同的用户可以分别通过不同的电子设备向服务器发送用户感兴趣的目标对应的CAD模型。
S302、服务器基于目标对应的CAD模型,得到目标对应的点云模型。
在一种实现中,服务器接收第一电子设备发送的目标对应的CAD模型,进而,将目标对应的CAD模型转化为预设点云数据格式,得到目标对应的点云模型。
可选的,服务器还可以根据预设坐标系对目标对应的点云模型的位置信息进行处理。
可选地,服务器还可以识别目标对应的点云模型中的杂点,进而,去除目标对应的目标点云模型的杂点。
可选地,服务器在得到该目标对应的点云模型时,可以将该点云模型存入存储中。例如,服务器中可以有三维模型库,服务器在得到目标对应的点云模型后,可以将该点云模型存入三维模型库中。
可选地,服务器可以对目标进行标定,即确定该目标的标识,以使该目标的标识与该目标的其它数据例如点云模型和特征向量等关联。其中,目标的标识可以为目标的身份标识 (Identification,ID),如目标的细粒度类别;以目标为车辆为例,该车辆的身份标识可以为该车辆的名称。例如服务器将目标对应的点云模型存入三维模型库中,则服务器在接收到查询该目标对应的点云模型时,可以基于目标的标识,从三维模型库中查询该目标对应的点云模型。
在一些实施例中,服务器中可以有不同的数据库,例如CAD模型库或点云模型库中的任意一个或多个,则服务器可以将接收到CAD模型存入CAD模型库,将点云模型存入点云模型库。进而,服务器可以基于目标的标识,从数据库中获取目标对应的数据,例如点云模型。
S303、服务器向第二电子设备发送目标对应的点云模型。
其中,第二电子设备可以为一个或多个电子设备。
在一些实施例中,第二电子设备可以为安装有目标应用的电子设备,则服务器可以在得到目标对应的点云模型后,可以向所有安装有目标应用的电子设备发送目标对应的点云模型。
在一种实现中,服务器可以在用户无感知的情况下,增加第二电子设备中可定位对象的点云模型。可以理解的,服务器在得到该目标对应的点云模型后,将目标对应的点云模型发送至第二电子设备,进而第二电子设备可以存储该点云模型,则在需要对目标进行跟踪时可以直接从存储中获取该点云模型,提高跟踪效率。
需要说明的是,第一电子设备和第二电子设备也可以是同一个电子设备。
S304、服务器基于自动化渲染算法对目标对应的点云模型进行渲染,生成目标的多张训练图像。
其中,自动化渲染算法可以采用神经渲染技术(Neural render)。
在一种实现中,服务器可以将目标对应的点云模型作为输入,选定不同的视角位姿,模拟不同的背景环境对该点云模型进行渲染,生成带有位姿标签的渲染图像,得到多张训练图像。其中,生成的带有位姿标签的渲染图像可以是不同光照且不同背景的,也可以是不同光照但相同背景的,还可以是相同光照但不同背景的,此处不作限定;视角位姿和背景环境可以为预设数据。
S305、服务器基于目标的多张训练图像,得到目标对应的位姿估计模型和特征向量。
在一些实施例中,服务器在得到目标对应的位姿估计模型和特征向量后,可以将该位姿估计模型和特征向量存入存储中。例如服务器中有特征向量检索库和位姿估计模型库,服务器在得到目标对应的位姿估计模型和目标对应的特征向量后,可以分别将目标对应的位姿估计模型和目标对应的特征向量存入位姿估计模型库和特征向量检索库中。
在一种实现中,服务器可以基于多张训练图像对原始位姿估计模型进行训练,即通过多张训练图像对原始位姿估计模型进行网络调参(finetune),得到目标对应的位姿估计模型;服务器可以基于特征提取器从训练图像中提取特征向量,得到目标对应的特征向量。可以理解的,目标与训练后的位姿估计网络是一一对应的。
其中,服务器提取目标的特征向量可以是基于一张训练图像得到的,也可以是基于多张训练图像提取的特征向量得到的,此处不做限定;关于特征提取器的详细内容可以参见下文中的相关内容。
可选地,目标对应的位姿估计模型和目标对应的特征向量均可以与目标的标识相关联。则服务器可以基于目标的标识,分别从位姿估计模型库和特征向量检索库获取目标对应的位姿估计模型和目标对应的特征向量。
S306、第二电子设备在检测到第二用户操作时,向服务器发送定位请求,所述定位请求包括待处理图像。
在一些实施例中,第二电子设备可以在检测到第二用户操作时,可以向服务器发送待处理图像。第二电子设备获取待处理图像的具体内容可以参见步骤S501的相关内容。
S307、服务器基于目标对应的点云模型、位姿估计模型和特征向量,对待处理图像中的目标进行定位,得到目标的位姿。
在一些实施例中,服务器可以提取待处理图像中目标的特征向量;基于提取的特征向量,获取服务器存储中目标对应的特征向量;进而,基于目标对应的特征向量,确定目标的标识;基于目标的标识,确定目标对应的位姿估计模型和点云模型;基于目标对应的位姿估计模型和点云模型,对待处理图像中的目标进行定位,得到目标的位姿。
关于服务器对目标进行定位的具体内容可以参见以下步骤S503至步骤S508中的相关内容。
S308、服务器向第二电子设备发送目标的位姿。
在一些实施例中,服务器可以将目标的标识和目标的位姿发送至第二电子设备。
可选地,服务器还可以将目标的辅助信息发送至第二电子设备。
S309、第二电子设备基于目标的位姿和目标对应的点云模型对目标进行跟踪。
在一些实施例中,第二电子设备存储有目标对应的点云模型,则第二电子设备在接收到目标的位姿和目标的标识后,可以基于目标的标识查询目标对应的点云模型;进而,可以基于目标的位姿和点云模型,对待处理图像后获取的图像中的目标进行跟踪。
可选地,第二电子设备还可以在显示屏上目标的预设位置上显示目标的辅助信息。可以理解的,包括目标和目标的辅助信息的图像为第二电子设备定位目标后渲染的虚实结合的图像,显示在虚拟结合图像上的目标的辅助信息也可以称为虚拟信息。例如目标为车辆,目标的辅助信息为车辆的轮廓线条和车辆的介绍信息,第二电子设备可以在待处理图像中车辆上渲染车辆的轮廓线条,在车辆周围的位置渲染车辆的介绍信息,可以理解的,该轮廓线条和介绍信息均为在待处理图像上渲染的虚拟内容,该轮廓线条用于指示目标,介绍信息用于介绍识别的目标。又例如该车辆为白色车辆,目标对应内容信息中的轮廓线条为蓝色,则第二电子设备上显示的图像中车辆轮廓上渲染后的蓝色线条。
本申请实施例中,上述目标定位方法可以仅由电子设备执行。例如,电子设备可以基于目标对应的CAD模型生成目标对应的点云模型、位姿估计模型和特征向量,并基于目标对应的点云模型、位姿估计模型和特征向量对目标进行定位;示例性的,第一电子设备可以基于目标对应的CAD模型生成目标对应的点云模型、位姿估计模型和特征向量,将目标对应的点云模型、位姿估计模型和特征向量发送至第二电子设备,第二电子设备可以基于目标对应的点云模型、位姿估计模型和特征向量对目标进行定位。
以下示例性的介绍几种目标定位的应用场景。
请参见图4,图4是本申请实施例提供的一种场景示意图。如图4所示,该场景包括用户、电子设备和目标,其中,用户为电子设备的使用者,目标为电子设备定位跟踪的对象,例如车辆。例如,用户可以手持电子设备拍摄车辆,进而,电子设备可以基于拍摄的图像对车辆进行定位。可选地,电子设备还可以实现跟踪车辆或在车辆周围渲染显示车辆的辅助信 息中的任意一项或多项功能。如图4所示,电子设备显示的图像为渲染后的图像,该渲染后的图像在车辆周围有如车宽和车长等辅助信息,该辅助信息的显示呈现AR效果。
在AR场景中,电子设备可以对目标进行定位跟踪,从而在电子设备的显示屏上显示目标的辅助信息。例如在博物馆的场景中,电子设备可以为手机,目标可以为博物馆的文物,目标的辅助信息可以为文物的介绍内容,如该文物的名称、年份以及历史故事等;具体的,用户在参观文物的过程中,想要了解某文物的详细资料,可以在手机上打开目标应用,将摄像头对准文物;相应的,当手机检测到用户的用户操作,响应于用户的用户操作,手机可以打开摄像头,拍摄图像,在该文物的周围位置显示该文物的介绍内容。
在VR场景中,电子设备可以对目标进行定位跟踪,从而在虚拟场景中显示目标对应的虚拟内容。例如,在VR游戏场景中,电子设备可以为VR眼镜,目标可以为其他游戏用户,目标对应的虚拟内容为其他游戏用户在该游戏场景中的虚拟形象;具体的,用户可以戴上VR眼镜,可以观看到一种虚拟的沉浸式场景,其他游戏用户位于用户前方,相应的,VR眼镜可以拍摄到包括其他游戏用户的图像,进而,对该游戏用户进行定位跟踪,显示该游戏用户在该游戏场景中的虚拟形象。
需要说明的是,以上仅为本申请实施例示例性提供的应用场景,本申请实施例提供的目标定位方法还可以应用在其他场景中,此处不做限定。
图5示例性示出了本申请实施例提供的另一种目标定位方法流程。需要说明的是,该目标定位方法描述了对目标进行识别跟踪的具体实现,可以认为图5所示的实施例为上述图3中步骤S306至步骤S309的一种具体实现。
需要说明的是,本申请实施例中,用户操作可以为用户的触控操作(例如点击操作、长按操作、上滑操作、下滑操作或侧滑操作),也可以为非接触操作(例如隔空手势),还可以为用户的语音指令,本申请实施例对此不做具体限制。
请参见图5,该目标定位方法可以包括以下部分或全部步骤:
S501、电子设备响应于用户操作,通过摄像头拍摄待处理图像,待处理图像包括目标。
可以理解的,随着场景不同,目标可以为不同对象。例如在博物馆的场景中,目标可以为文物;在VR游戏场景中,目标可以为游戏道具,此处对目标不做限定。需要说明的,待处理图像中可以包括多个对象,其中,需要定位的对象为上述目标;对象可以为物体,也可以为人物,此处不作限定。例如待处理图像中存在对象A、对象B和对象C等诸多物体,用户想要识别定位的目标为对象A,则对象A为上述目标。
在一些实施例中,电子设备安装有目标应用;当电子设备检测到用户针对目标应用的用户操作,响应于该用户操作,电子设备可以打开摄像头,拍摄待处理图像。其中,目标应用可以为华为AR地图,河图图鸦工具,河图城市打卡应用等,可用于特效展示,虚拟喷涂以及风格化等。例如,用户手持电子设备,将电子设备的摄像头对着目标,在电子设备输入针对目标应用的用户操作,相应的,电子设备打开摄像头进行拍摄,获得包括目标的待处理图像。
需要说明的是,在录像场景中,电子设备可以获取多张待处理图像,那么,电子设备可以对一张或多张待处理图像执行本申请实施例提供的目标定位方法。
S502、电子设备向服务器发送定位请求,该定位请求包括待处理图像,该定位请求用于请求对待处理图像中的目标进行定位。
在一些实施例中,电子设备安装有目标应用;当电子设备检测到用户针对目标应用的用户操作,响应于用户针对目标应用的该用户操作,电子设备可以通过摄像头拍摄待处理图像,进而,将待处理图像发送至服务器。例如,目标应用显示如图10B的用户界面62,当电子设备检测到用户针对选项栏622和确认控件623的用户操作,响应于用户针对目标应用的该用户操作,电子设备可以获取待处理图像,将待处理图像发送至服务器。
S503、服务器从待处理图像中识别主体。
在一些实施例中,服务器在获取待处理图像后,可以将待处理图像输入主体检测模型,得到目标在待处理图像中的位置。其中,该主体检测模型可以是基于样本图像和标记目标训练得到的;目标在待处理图像中的位置可以用目标对应的点在待处理图像对应的像素坐标系中的二维坐标表示,例如,可以用以目标的几何中心为中心的矩形的左上角和右上角两点的二维坐标表示目标的位置。
在一种实现中,主体检测模型可以采用增强混洗网络(Enhanced-ShuffleNet)作为主干网络,将跨阶段局部路径聚合网络(Cross-Stage-Partial Path-Aggregation Network,CSP-PAN)作为检测头。请参见图6,图6为本申请实施例提供的一种主体检测模型的示意图。如图6所示,服务器可以将待处理图像输入主体检测模型,其中,主体检测模型中的第一个模块包括3×3卷积层(3×3Conv)、最大池化层(Max Pool)和增强混洗模块(Enhanced Shuffle Block);第二个模块包括跳格平移(Stride)分别为8、16和32的1×1卷积层(1×1Conv)、上采样层(Upsample)、跨阶段局部网络(Cross Stage Partial,CSP),跳格平移(Stride)分别为8、16和32的检测头(Head);最后,三路数据分别经过分类(Class)和检测框(box)处理后,通过非极大值抑制(Non-Maximum Suppression,NMS),可以输出主体检测结果,即识别到的主体。
可选地,对该主体检测模型进行训练时,可以采用不使用锚点(anchor free)的训练策略,标注分配机制可以采用标签分配算法如SimOTA,损失函数可以采用变焦损失函数(Varifocal Loss)和广义交并比损失函数(Generalized Intersection over Union Loss,GIoU Loss),例如,以使用Varifocal Loss和GIoU Loss的加权组合计算损失矩阵,通过该损失矩阵对主体检测模型进行优化,在损失满足预设条件时,得到上述主体检测模型。
需要说明的是,图6所示的主体检测模型和上述训练方法为本申请实施例提供的一种实现,还可以将其他神经网络作为主体检测模型,训练方法以及损失函数也可以有其他选择,此处不作限定。例如,服务器也可以将待处理图像中面积最大的对象或者位于焦点的对象确定为目标,此处对服务器识别目标的方法不做限定。
S504、服务器基于识别到的主体,从待处理图像中确定目标所在的图像块。
在一些实施例中,服务器可以基于在待处理图像中识别到的主体,从待处理图像中截取目标所在的图像块。可以理解的,后续服务器可以基于目标所在的图像块对目标进行定位,可以避免待处理图像中其他对象对定位目标的影响。
请参见图7,图7是本申请实施例提供的一种确定目标所在的图像块的示意图。如图7所示,图7中的(a)为待处理图像,待处理图像中包括路灯和车辆;假设车辆为目标,可以由图7中的(b)中的点A和点B在待处理图像对应的像素坐标系的二维坐标代表服务器识别到的主体的位置,图7中以黑色圆点代表点A和点B;进而,服务器可以基于点A和点B的二维坐标,确定目标所在的图像块,目标所在的图像块可以如图7中的(c)所示。
S505、服务器从目标所在的图像块中提取特征向量。
在一些实施例中,服务器可以将图像块输入特征提取器,得到特征向量。其中,特征向量可以为1024维特征向量;该特征提取器可以是基于样本图像和标记特征向量训练得到的。
其中,特征提取器可以为卷积神经网络(Convolutional Neural Networks,CNN),也可以为变换神经网络(Transformer),例如,特征提取器可以为深度自注意力变换网络(Vision Transformer,VIT),如VIT-16,此处不做限定。需要说明的,该特征提取器也可以称为特征提取网络。
可选地,服务器在对特征提取器进行训练时,可以采用尺度学习(Metric Learning)的训练策略,通过角度对比损失函数(Angular Contrastive Loss)计算损失矩阵,基于损失矩阵对特征提取器进行训练,得到上述特征提取器。
需要说明的是,本申请实施例中,服务器也可以不执行步骤S504和步骤S505,而是在步骤S503从待处理图像中识别到目标后,直接基于目标的位置从待处理图像中提取特征向量,进而,执行步骤S506。示例性的,可以以图像数据或目标的位置中的任意一项或多项作为特征提取器的输入,从而得到提取到的特征向量。可以理解的,步骤S505中服务器从目标所在的图像块中提取特征向量,可以避免待处理图像中除目标外的其它对象在服务器提取特征向量时造成的误差。
S506、服务器基于提取到的特征向量,从特征向量检索库中获取目标特征向量。
在一些实施例中,服务器可以从特征向量检索库中,查找提取的特征向量对应的特征向量,该查找到的特征向量即为目标特征向量。例如,服务器可以基于余弦相似度算法,从特征向量检索库中获取与提取的特征向量的余弦相似度最大的特征向量,将与提取的特征向量的余弦相似度最大的特征向量确定为目标特征向量。其中,特征向量检索库可以包括至少两个特征向量。
其中,目标特征向量为基于目标对应的三维模型生成的特征向量。例如,用户在注册该目标时上传了该目标对应的三维模型,则服务器基于该目标对应的三维模型生成训练图像;基于训练图像生成该目标对应的特征向量;将该目标对应的特征向量存储在特征向量检索库中。
可以理解的,尽管用户上传的目标和待处理图像中的目标为同一对象,但是由于获取特征向量的过程的差异,导致同一对象在特征向量检索库中的特征向量和步骤S505中提取的特征向量可能并不完全相同,因此,服务器可以通过相关算法(如上述余弦相似度算法),将特征向量检索库中最接近步骤S505中提取的特征向量作为目标特征向量。
可以理解的,为保证服务器提取到的特征向量能够在特征向量检索库中检索到目标对应的检索向量,特征向量检索库中的特征向量的提取方法与步骤S505中提取特征向量的提取方法可以是相同的,如使用的是同样的特征提取器。
本申请实施例中,服务器可以先对提取的特征向量进行归一化处理,再基于归一化处理后的特征向量,从特征向量检索库中查找目标特征向量。例如,服务器可以先对该特征向量进行L2归一化(L2 Normalization)处理,再基于余弦相似度算法,从特征向量检索库中获取与处理后的特征向量的余弦相似度最大的特征向量,将该特征向量作为目标特征向量。需要说明的是,归一化处理可以提高数据处理的准确度,进而,服务器可以更准确的查找到目标特征向量。
S507、服务器基于目标特征向量,获取目标位姿估计模型和目标点云模型。
在一些实施例中,目标的特征向量与该目标的位姿估计模型和点云模型相对应,则服务 器可以获取目标特征向量对应的位姿估计模型和点云模型,得到目标位姿估计模型和目标点云模型。例如目标特征向量为W,服务器可以从位姿估计模型库中获取W对应的位姿估计模型,从点云模型库中获取W对应的点云模型,W对应的位姿估计模型和W对应的点云模型即为目标位姿估计模型和目标点云模型。
本申请实施例中,目标有唯一的标识,目标的标识对应着该目标的特征向量、位姿估计模型和点云模型,则服务器可以基于目标特征向量,获取目标的标识;服务器可以获取目标的标识对应的位姿估计模型和点云模型,得到目标位姿估计算法模型和目标点云模型。其中,目标的标识可以为目标的ID,如目标的细粒度类别,以目标为车辆为例,该车辆的身份标识可以为该车辆的名称。
其中,目标位姿估计模型和目标点云模型可以是存储在服务器内的,即服务器可以从存储的数据中查询目标位姿估计模型和目标点云模型。请参见图8,图8是本申请实施例提供的一种服务器存储数据的示意图。如图8所示,服务器可以包括特征向量检索库、位姿估计模型库和点云模型库,例如服务器在得到对象A的特征向量、位姿估计模型和点云模型后,可以将其分别存入特征向量检索库、位姿估计模型库和点云模型库。那么,服务器也可以基于目标的标识从特征向量检索库、位姿估计模型库和点云模型库中查询目标对应的特征向量、位姿估计模型和点云模型。
S508、服务器基于该目标所在的图像块、目标位姿估计模型和目标点云模型,得到目标的位姿。
其中,目标位姿估计模型包括关键点识别模型和PnP算法。
在一些实施例中,服务器可以将目标所在的图像块输入关键点识别模型,得到至少四个关键控制点在图像块中的位置,其中,该关键点识别模型用于得到目标点云模型中预定义的关键控制点在图像块中的位置,关键控制点在图像块中的位置可以为关键控制点在目标所在的图像块对应的像素坐标系中的二维坐标;服务器可以基于至少四个关键控制点在目标点云模型中的位置和在图像块中的位置,以及设备相机参数,通过pnp算法确定目标在相机坐标系中的位姿,得到目标的初始位姿;最后,可以将该初始位姿作为目标在相机坐标系中的位姿。
其中,关键点识别模型可以为像素级投票网络(Pixel-wise Voting Net,PVNet);也可以为利用EffecientNet-b1网络结构对PVnet进行优化后的Efficient-PVNet,还可以为其他深度学习模型,此处不作限定。需要说明的是,关键点识别模型也可以称为关键点回归神经网络。
需要说明的是,现有技术中,PVNet可以采用深度残差网络(Deep residual network,ResNet)作为骨干网络(backbone),如ResNet-18,在更高分辨率的情况下,其拟合能力并不理想;本申请实施例中可以采用Efficient-PVNet作为关键点识别模型,由于Efficient-PVNet利用EffecientNet-b1网络结构对PVnet进行了结构优化,可以提高网络在较高分辨率(640*480)下的拟合能力。此外,相对于现有技术,Efficient-PVNet算力需求更少,速度更快,泛化能力更强。
本申请实施例中,服务器可以将目标的初始位姿进行优化,将优化后的位姿作为目标在相机坐标系中的位姿。
在一种实现中,待处理图像为一张真实RGB图,服务器可以基于初始位姿、目标点云模型以及相机参数,将目标重投影至真实RGB图上,渲染出渲染RGB图;分别在真实 RGB图和渲染RGB图确定若干个目标的特征点,并将真实RGB图中的特征点和渲染RGB图中的特征点进行一一配对;进而,基于渲染RGB图中的特征点与目标点云模型的3D点的对应关系,以及真实RGB图中的特征点与渲染RGB图中的特征点的配对关系,确定至少四个特征点在待处理图像中的二维坐标和在目标点云模型中的三维坐标(即至少四个2D-3D匹配点对);最后,基于至少四个2D-3D匹配点对,通过PnP方法求出优化后的位姿,将该优化后的位姿作为目标在相机坐标系中的位姿。
例如,真实RGB图中的特征点M和渲染RGB图中的特征点m为配对的两个点,渲染RGB图中的特征点m与目标点云模型的3D点N为匹配的两个点,则渲染RGB图中的特征点m和目标点云模型的3D点N可以为一对2D-3D匹配点对。
其中,服务器分别在真实RGB图和渲染RGB图确定若干个目标的特征点的具体过程可以包括:先基于初始位姿、目标点云模型以及相机参数,将目标重投影至待处理图像上,渲染出RGB图和深度图;进而,将深度图作为掩膜,从渲染RGB图中得到目标的包围框;基于包围框,分别从真实RGB图和渲染RGB图中截取图像块,得到两个包括目标的图像块;利用二维图像特征提取方法,如超级点(Superpoint)特征或图像处理检测点(Oriented FAST and Rotated BRIEF,ORB)特征,提取两个包括目标的图像块的特征点及相应描述子,同时并进行一一匹配。
进一步的,服务器在得到优化后的位姿后,还可以计算优化后的位姿与目标在真实RGB图像中的位姿误差;在位姿误差不小于预设误差值时,对该优化后的位姿进行迭代优化,具体优化过程可见上述相关内容,直至位姿误差小于预设误差值时,将该迭代优化后的位姿作为目标在相机坐标系中的位姿。
可以理解的,在实际应用中,由于光照、遮挡、目标拍摄不全以及边缘轮廓不清晰等问题,导致目标的初始位姿并不能保证足够的精度以支撑后续3D追踪流程;本申请实施例对初始位姿进行二次迭代优化,采用迭代渲染的方式,计算重投影误差更新位姿,可以提高目标在相机坐标系中的位姿的精度和准确度。
S509、服务器向电子设备发送目标的位姿。
在一些实施例中,服务器向电子设备发送了目标的标识和目标的位姿,该目标的标识可以用于电子设备获取该目标对应的点云模型。
可选地,服务器基于目标的位姿,对待处理图像中的目标进行渲染,显示渲染后的图像。例如在车辆的外壳渲染颜色,显示渲染后的图像,以使用户可以清晰看到识别到的车辆。
在一些实施例中,服务器还向电子设备发送了目标的辅助信息,则电子设备可以在接收目标的位姿和目标的辅助信息,电子设备可以在接收目标的位姿后,基于目标的位姿在待处理图像中渲染目标的辅助信息,进而,显示渲染后的图像。
S510、电子设备基于目标的位姿和目标点云模型对目标进行跟踪。
在一些实施例中,电子设备存储有该目标点云模型,则电子设备在接收到目标的位姿和目标的标识后,可以基于目标的标识查询目标点云模型;进而,可以基于目标的位姿和目标点云模型,对在待处理图像之后获取的图像中的目标进行跟踪。
以下示例性的提供一种跟踪目标的方法,请参见图9,该方法包括以下步骤:
为方便描述,以下服务器基于上述位姿估计模型得到目标的位姿称为定位位姿,将电子设备基于目标在上一帧图像中的位姿得到的下一帧图像中目标的位姿称为跟踪位姿。需要说明的是,在获取多张图像的过程中,电子设备对图像中目标的定位可以是隔100帧计算一帧 图像的定位位姿,在两次定位之间的所有帧跟踪,都要使用前一次定位位姿。
S901、电子设备获取当前图像帧,当前帧图像包括目标。
需要说明是的,跟踪目标即在刚体运动中以连续获取的包括目标的图像为输入,求解每帧中目标的位姿。当前图像帧即为连续获取的包括目标的图像中的一帧图像。
在一些实施例中,电子设备接收服务器发送的第一帧待处理图像的定位位姿后,可以将下一帧拍摄的图像作为当前图像帧;也可以将在第一帧待处理图像后拍摄的每一帧包括目标的图像均作为当前图像帧。
S902、电子设备基于能量函数和当前图像帧中目标的位姿,计算位姿矫正量。
在一些实施例中,电子设备可以基于能量函数和当前图像帧中目标的位姿P0,通过高斯牛顿法求解当前图像帧对应的最优矫正量。其中,当前图像帧中目标的位姿P0可以为基于当前图像帧的上一帧图像的跟踪位姿和电子设备当前的SLAM位姿插值得到的目标的位姿,其中,SLAM位姿为电子设备基于SLAM算法得到的当前电子设备在环境坐标系中的位姿。其中,第一次优化时,上一帧待处理图像的跟踪位姿可以为服务器对上一帧图像执行S503至步骤S508得到的目标的定位位姿。
需要说明的是,该电子设备可以基于SLAM算法计算当前电子设备在环境坐标系中的位姿。
以下介绍本申请实施例示例性提供的一种能量函数E(ξ)。
首先对参数ξ与位姿的关系进行介绍。
由于,在刚体运动中,目标刚体对于和当前电子设备在环境坐标系中的相对位姿可以通过旋转分量R和位移分量T表示。在电子设备的投影成像过程中,目标在电子设备成像的任一二维像素点x,可以看做其对应的点云模型中的点X,经过位姿变换和投影变换生成。即有如下关系:
x=π(RX+T)
其中,x为图像中目标物体的二维像素点,R为旋转分量,T为分量,π为用于图片成像的投影变换,默认为相机成像模型函数。
又由于,旋转R和平移T可以由6自由度的变量ξ决定,对上式进行简写:
x(ξ)=f(ξ;X,π)
因此能量函数E(ξ)可以将参数ξ作为变量,以代表目标的位姿。
该能量函数E(ξ)可以如下所示:
E(ξ)=λ1ERBOT(ξ)+λ2Ecloudrefine(ξ)+λ3Egravity(ξ)+λ4Eregular(ξ)
其中,Egravity(ξ)为重力轴约束项;ERBOT(ξ)为区域约束项;Ecloudrefine(ξ)为位姿估计算法约束项;Eregular(ξ)为正则项约束项;λ1、λ2、λ3和λ4分别为各项的权重系数;对ξ可使用李代数中对应方法如扰动模型进行求导和计算。
各项公式可以如下所示:
(1)重力轴约束项可以如下所示:
其中,V1为以目标的几何中心为原点建立的三维坐标系中,在水平面平行的横截面上以原点为圆心,预设半径的圆上随机采样的若干个采样点的三维坐标;R为待优化旋转分量;(RV)z为对采样点进行旋转变换后的z轴分量,即重力轴方向分量。
需要说明的是,针对具有重力轴方向恒定的物体(例如车辆,无需考虑其重力轴翻转现 象),可以在能量函数中增加上述重力轴约束项。其中,重力轴方向为垂直于水平面的方向,重力轴方向恒定的物体是指在重力轴方向上不会翻转的物体。例如车辆在正常情况中,不会出现重力轴方向翻转的姿态,则可以认为车辆为重力轴方向恒定的物体;例如,球体在滚动过程中的姿态通常会在重力轴方向翻转,则可以认为球体不是重力轴方向恒定的物体。
(2)位姿估计算法约束项可以如下所示:
需要说明的是,V2为目标上的采样点,Rcloud,Tcloud表示基于当前图像帧的上一帧图像的定位位姿和电子设备当前的SLAM位姿插值得到的目标的位姿;R,T是待优化位姿变量。需要说明的是,上述当前图像帧的上一帧图像指的是上一帧上传至服务器进行定位的图像,也可以称为关键帧,其中,关键帧之间的间隔可以综合服务器负载情况动态确定,例如关键帧之间可以间隔几十到几百帧。
可以理解的,通过当前图像帧的上一帧图像的定位位姿对目标的位姿进行纠偏,可以减少累计误差,提升算法持续追踪能力,并增加算法恶劣光照环境下的鲁棒性。
(3)区域约束项ERBOT(ξ)可以如下所示:
其中,I代表当前帧图像,y=I(x)表示图像上x位置处的像素值(灰度、RGB等);Pf(I(x))代表I(x)属于前景区域的概率分布;Pb(I(x))代表I(x)属于背景区域的概率分布;
φ(x)为符号距离函数(signed distance function),具体如下所示:


C={x|φ(x)=0}
其中,Ωf代表前景区域(即目标所在区域);Ωb代表背景区域;C代表目标的轮廓;
He为步平滑函数(smooth step function),具体如下所示:
其中,s为He函数的超参数,可以设置为1.2。
以下示例性提供一种确定目标的轮廓C的实现:电子设备可以基于当前图像帧中目标的位姿P0、目标点云数据、相机成像模型函数,利用渲染工具进行逐片元渲染,生成在设备当前视角下的目标物体深度图;进而,基于深度图信息作为掩膜获取目标投影在当前图像帧的轮廓得到的。
(4)正则项约束项
其中,c为当前图像帧中目标的轮廓点,C为当前图像帧中目标的轮廓点对应的三维空间点;R、T为待优化位姿旋转和平移分量,π为相机成像模型函数。可以理解的,目标的轮廓点为目标的轮廓上的采样点,具体获取方法可以参见上文中的相关内容。
需要说明的是,该能量函数用于保证位姿优化后轮廓的三维点的投影和之前的二维位置不要相差太大,以保证轮廓点变化的连贯性。
S903、电子设备基于位姿矫正量更新当前图像帧中目标的位姿。
在一些实施例中,电子设备可以基于位姿矫正量更新当前图像帧中目标的位姿。
S904、电子设备基于能量函数和当前图像帧中目标的位姿,计算能量函数值。
在一些实施例中,在更新当前图像帧的位姿后,电子设备可以将当前图像帧中目标的位姿输入能量函数中,得到能量函数值。
S905、在能量函数值满足预设条件时,输出当前图像帧中目标的位姿;反之,执行步骤S902至S905。
其中,预设条件可以为能量函数值小于预设阈值或迭代次数达到预定次数。
在一些实施例中,电子设备可以在能量函数值小于预设阈值或迭代次数达到预定次数时,将当前图像帧中目标的位姿作为目标的跟踪位姿;反之,执行步骤S902至S905,对当前图像帧中目标的位姿进行迭代优化,直至输出目标的跟踪位姿。
本申请实施例中,电子设备可以对目标进行跟踪,并在包括目标的每一帧图像上均显示目标的辅助信息。例如目标为车辆,目标的辅助信息为车辆的车型、车的生厂商以及车辆数据等,电子设备可以在待处理图像中渲染车辆的车型、生厂商以及车辆数据等,例如该车辆周围可以显示该车辆的车型、生厂商以及车辆数据等;例如在录像场景中,电子设备显示的每一帧图像中均有渲染的辅助信息。其中,辅助信息的显示可以呈现AR效果。
本申请实施例中,也可以由服务器对目标进行跟踪。例如,电子设备可以将获取的每一帧图像都发送至服务器;服务器可以对关键帧图像进行目标定位,服务器也可以基于两帧关键帧图像中的前一帧图像中目标的定位结果对两帧关键帧图像之间的图像进行目标跟踪,其中,服务器对目标进行定位的方法可以与上述电子设备对目标进行定位的方法一致,服务器对目标进行跟踪的方法可以与上述电子设备对目标进行跟踪的方法一致;服务器可以将目标的跟踪结果发送至电子设备。
以下示例性的介绍电子设备在执行上述目标定位方法时的一些用户界面。图10A-图10D为电子设备上实现的用户界面。
图10A示出了第一终端上的用于展示已安装应用程序的示例性用户界面61。如图10A所示,该用户界面61可以包括:状态栏、日历指示符、天气指示符、目标应用的图标611、图库应用的图标以及其他应用程序的图标等。其中,状态栏可包括:移动通信信号(又可称为蜂窝信号)的一个或多个信号强度指示符、Wi-Fi信号的一个或多个信号强度指示符,电池状态指示符、时间指示符等。
在一些实施例中,图10A所示的用户界面61可以为主屏幕界面(home screen)。可以理解的是,图10A仅仅示例性示出了电子设备的一个用户界面,不应构成对本申请实施例的限定。
如图10A所示,当电子设备检测到作用于目标应用的图标611的用户操作,响应于该用户操作,电子设备可以显示图10B所示的用户界面62。
不限于在图10A所示的主界面上,用户还可以通过其他方式,触发电子设备显示用户界面62,本申请实施例对此不做限制。例如,电子设备可以在浏览器的输入框上输入预设网址,触发电子设备显示用户界面62。
如图10B所示,用户界面62可显示选项栏621、选项栏622和确认控件623。其中,选项栏621用于用户注册可识别目标,选项栏622用于用户识别目标。
在一些实施例中,用户可以点击选项栏621;进而,如图10B所示,再点击确认控件623,相应的,当电子设备检测到上述用户操作,响应于上述用户操作,电子设备可以显示图10C所示的用户界面63。或者,用户也可以点击选项栏621,相应的,当电子设备检测到上述用户操作,响应于上述用户操作,电子设备可以显示图10C所示的用户界面63。
其中,用户界面63可包括名称输入框631、导入控件632、信息输入栏633和导入控件634和确认控件635。其中,名称输入框631用于输入目标的名称;导入控件632用于输入目标对应的CAD模型;信息输入栏633用于输入目标对应的信息;导入控件634用于输入目标的相关图像。
在一种实现中,用户可以在用户界面63输入目标的数据后,点击确认控件635,相应的,电子设备检测到上述用户操作,响应于上述用户操作,将输入的目标的数据上传至服务器。
本申请实施例中,用户可以点击选项栏622;进而,如图10B所示,再点击确认控件623,相应的,当电子设备检测到上述用户操作时,响应于上述用户操作,电子设备可以显示图10D所示的用户界面64。或者,用户也可以点击确认选项栏622,相应的,当电子设备检测到上述用户操作时,响应于上述用户操作,电子设备可以显示图10C所示的用户界面63。
其中,用户界面64可以显示拍摄到的图像或对拍摄到的图像执行上述目标定位方法后的图像。图10D示例性示出了一张对目标进行定位后渲染的图像,其中,车辆为目标;车辆周围的虚线以及车长、车宽等显示内容为该车辆的辅助信息。可以理解的,车辆为拍摄到图像中的真实信息,车辆周围的虚线以及车长、车宽等显示内容为对目标定位后在目标周围的预设区域渲染的虚拟信息,该虚拟信息仅在电子设备的显示屏上显示,并不是真实世界内的信息。
需要说明的是,上述用户界面仅为本申请实施例提供的一种示例,不应构成对本申请实施例的限定。本申请实施例中,图10C和图10D所示的用户界面可以为不同应用对应的界面。
本申请实施例还提供了一种电子设备,电子设备包括一个或多个处理器和一个或多个存储器;其中,一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行上述实施例描述的方法。
本申请实施例还提供了一种包含指令的计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行上述实施例描述的方法。
本申请实施例还提供了一种计算机可读存储介质,包括指令,当指令在电子设备上运行时,使得电子设备执行上述实施例描述的方法。
可以理解的是,本申请的各实施方式可以任意进行组合,以实现不同的技术效果。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包 括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
总之,以上所述仅为本申请技术方案的实施例而已,并非用于限定本申请的保护范围。凡根据本申请的揭露,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种目标定位方法,其特征在于,应用于服务器,所述方法包括:
    所述服务器接收第一电子设备发送的定位请求,所述定位请求包括待处理图像;
    所述服务器识别所述待处理图像中的目标;
    所述服务器从位姿估计模型库中查找所述目标对应的目标位姿估计模型,所述位姿估计模型库包括多个对象分别对应的位姿估计模型;
    所述服务器基于所述待处理图像和所述目标位姿估计模型,得到所述目标的位姿;
    所述服务器将所述目标的位姿发送至所述第一电子设备。
  2. 根据权利要求1所述的方法,其特征在于,在所述服务器从位姿估计模型库中查找所述目标对应的目标位姿估计模型之前,所述方法还包括:
    所述服务器接收第二电子设备发送的所述目标对应的三维模型;
    所述服务器对所述目标对应的三维模型进行渲染,生成多张训练图像;
    所述服务器基于所述多张训练图像对初始位姿估计模型进行训练,得到所述目标位姿估计模型。
  3. 根据权利要求1或2所述的方法,其特征在于,所述服务器识别所述待处理图像中的目标包括:
    所述服务器从所述待处理图像中提取特征向量;
    所述服务器从特征向量库中查询与所述提取的特征向量相似度最高的特征向量;所述特征向量库包括多个对象的标识分别对应的特征向量;
    所述服务器基于查询到的特征向量对应的标识,确定所述目标。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述目标位姿估计模型包括关键点识别模型和N点透视算法,所述服务器基于所述待处理图像和所述目标位姿估计模型,得到所述目标的位姿,包括:
    所述服务器将所述待处理图像输入所述关键点识别模型,得到所述目标对应的至少四个关键点的二维坐标;
    所述服务器基于所述目标对应的三维模型,确定所述至少四个关键点的三维坐标;
    所述服务器基于所述至少四个关键点的二维坐标和三维坐标,通过N点透视算法,得到所述目标的位姿。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述服务器将所述目标的位姿发送至所述第一电子设备之前,所述方法还包括:
    所述服务器基于所述目标的位姿,将所述目标对应的三维模型重投影至所述待处理图像上,得到渲染图像;
    所述服务器执行M次优化过程,所述M为正整数,所述优化过程包括:基于所述渲染图像和所述待处理图像,计算优化位姿;所述服务器基于所述待处理图像和所述优化位姿,计算位姿误差;
    所述服务器在所述位姿误差小于预设误差值时,将所述优化位姿更新为所述目标的位姿;在所述位姿误差不小于所述预设误差值时,基于所述优化位姿更新所述渲染图像,执行所述优化过程。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,在所述服务器接收第一电子设备发送的定位请求之前,所述方法还包括:
    将所述目标对应的三维模型发送至所述第一电子设备,所述三维模型用于所述第一电子设备结合所述目标的位姿对所述目标进行跟踪。
  7. 一种目标定位方法,其特征在于,应用于电子设备,所述方法包括:
    所述电子设备获取待处理图像;
    所述电子设备向服务器发送定位请求,所述定位请求包括所述待处理图像;所述定位请求用于请求识别所述待处理图像中的目标和获得所述目标在所述待处理图像中的第一位姿;所述第一位姿是所述服务器基于从位姿估计模型库中查询的所述目标对应的目标位姿估计模型得到的,所述位姿估计模型库包括多个对象分别对应的位姿估计模型;
    所述电子设备接收所述服务器发送的所述第一位姿;
    所述电子设备基于所述第一位姿在所述待处理图像上渲染虚拟信息;
    所述电子设备显示渲染后的图像。
  8. 根据权利要求7所述的方法,其特征在于,所述电子设备接收所述服务器发送的所述第一位姿之后,所述方法还包括:
    所述电子设备获取当前图像帧,所述当前帧图像为获取所述待处理图像后获取的图像;
    所述电子设备基于所述当前图像帧、所述目标对应的三维模型和所述第一位姿,确定所述目标在所述当前图像帧中的第二位姿。
  9. 根据权利要求8所述的方法,其特征在于,在所述电子设备基于所述当前图像帧、所述目标的三维模型和所述第一位姿,确定所述目标在所述当前图像帧中的第二位姿之前,包括:
    所述电子设备接收所述服务器发送的所述目标的标识;
    所述电子设备基于所述目标的标识从存储中获取所述三维模型,所述三维模型是所述电子设备从所述服务器获取后保存在所述存储中的。
  10. 根据权利要求8或9所述的方法,其特征在于,所述确定所述目标在所述当前图像帧中的第二位姿,包括:
    所述电子设备执行N次优化过程,所述N为正整数,所述优化过程包括:所述电子设备基于能量函数、所述三维模型和所述第二位姿,计算位姿矫正量;基于位姿矫正量更新所述第二位姿;基于所述能量函数和所述第二位姿,计算能量函数值;
    所述电子设备在能量函数值满足预设条件时,输出所述第二位姿;反之,执行所述优化过程。
  11. 根据权利要求10所述的方法,其特征在于,所述能量函数包括重力轴约束项、区域约束项、位姿估计算法约束项或正则项约束中的至少一项,所述重力轴约束项用于约束所述第二位姿在重力轴方向的误差;所述区域约束项用于基于所述当前帧图像的像素值约束所述第二位姿时所述目标的轮廓误差;所述位姿估计算法约束项用于基于估计位姿约束所述第二位姿的误差,所述估计位姿是基于所述第一位姿和基于实时定位与同步地图构建算法得到的电子设备的位姿得到的;所述正则项约束项是基于所述目标对应的三维模型约束所述第二位姿时所述目标的轮廓误差。
  12. 根据权利要求7-11所述的方法,其特征在于,在所述电子设备向服务器发送定位请求之前,所述方法包括:
    所述电子设备显示用户界面,所述用户界面包括注册控件;
    所述电子设备在检测到针对所述注册控件的用户操作时,向所述服务器发送所述目标对应的三维模型。
  13. 根据权利要求12所述的方法,其特征在于,所述三维模型是所述电子设备接收外部输入得到的或所述电子设备基于用户操作生成的。
  14. 一种服务器,其特征在于,所述服务器包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述服务器执行如权利要求1-6中任一项所述的方法。
  15. 一种电子设备,其特征在于,所述电子设备包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如权利要求7-13中任一项所述的方法。
  16. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-6或7-13中任一项所述的方法。
  17. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-6或7-13中任一项所述的方法。
  18. 一种目标定位系统,其特征在于,所述目标定位系统包括服务器和电子设备,所述服务器用于执行如权利要求1-6中任一项所述的方法,所述电子设备用于执行如权利要求7-13中任一项所述的方法。
PCT/CN2023/092023 2022-05-11 2023-05-04 一种目标定位方法、系统和电子设备 WO2023216957A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210508427.7 2022-05-11
CN202210508427.7A CN117095319A (zh) 2022-05-11 2022-05-11 一种目标定位方法、系统和电子设备

Publications (1)

Publication Number Publication Date
WO2023216957A1 true WO2023216957A1 (zh) 2023-11-16

Family

ID=88729683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092023 WO2023216957A1 (zh) 2022-05-11 2023-05-04 一种目标定位方法、系统和电子设备

Country Status (2)

Country Link
CN (1) CN117095319A (zh)
WO (1) WO2023216957A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746462A (zh) * 2023-12-19 2024-03-22 深圳职业技术大学 基于互补特征动态融合网络模型的行人重识别方法及装置

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082475A1 (en) * 2016-09-21 2018-03-22 Verizon Patent And Licensing Inc. Placing and presenting virtual objects in an augmented reality environment
CN108346179A (zh) * 2018-02-11 2018-07-31 北京小米移动软件有限公司 Ar设备显示方法和装置
CN110335351A (zh) * 2019-07-02 2019-10-15 北京百度网讯科技有限公司 多模态ar处理方法、装置、系统、设备及可读存储介质
CN110648361A (zh) * 2019-09-06 2020-01-03 深圳市华汉伟业科技有限公司 一种三维目标物体的实时位姿估计方法及定位抓取系统
CN111638793A (zh) * 2020-06-04 2020-09-08 浙江商汤科技开发有限公司 飞行器的展示方法、装置、电子设备及存储介质
CN111694430A (zh) * 2020-06-10 2020-09-22 浙江商汤科技开发有限公司 一种ar场景画面呈现方法、装置、电子设备和存储介质
CN112907658A (zh) * 2019-11-19 2021-06-04 华为技术有限公司 视觉定位评估方法和电子设备
CN112991551A (zh) * 2021-02-10 2021-06-18 深圳市慧鲤科技有限公司 图像处理方法、装置、电子设备和存储介质
CN113436251A (zh) * 2021-06-24 2021-09-24 东北大学 一种基于改进的yolo6d算法的位姿估计系统及方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180082475A1 (en) * 2016-09-21 2018-03-22 Verizon Patent And Licensing Inc. Placing and presenting virtual objects in an augmented reality environment
CN108346179A (zh) * 2018-02-11 2018-07-31 北京小米移动软件有限公司 Ar设备显示方法和装置
CN110335351A (zh) * 2019-07-02 2019-10-15 北京百度网讯科技有限公司 多模态ar处理方法、装置、系统、设备及可读存储介质
CN110648361A (zh) * 2019-09-06 2020-01-03 深圳市华汉伟业科技有限公司 一种三维目标物体的实时位姿估计方法及定位抓取系统
CN112907658A (zh) * 2019-11-19 2021-06-04 华为技术有限公司 视觉定位评估方法和电子设备
CN111638793A (zh) * 2020-06-04 2020-09-08 浙江商汤科技开发有限公司 飞行器的展示方法、装置、电子设备及存储介质
CN111694430A (zh) * 2020-06-10 2020-09-22 浙江商汤科技开发有限公司 一种ar场景画面呈现方法、装置、电子设备和存储介质
CN112991551A (zh) * 2021-02-10 2021-06-18 深圳市慧鲤科技有限公司 图像处理方法、装置、电子设备和存储介质
CN113436251A (zh) * 2021-06-24 2021-09-24 东北大学 一种基于改进的yolo6d算法的位姿估计系统及方法

Also Published As

Publication number Publication date
CN117095319A (zh) 2023-11-21

Similar Documents

Publication Publication Date Title
CN110495819B (zh) 机器人的控制方法、机器人、终端、服务器及控制系统
WO2021249053A1 (zh) 图像处理的方法及相关装置
CN115473957B (zh) 一种图像处理方法和电子设备
US12020472B2 (en) Image processing method and image processing apparatus
CN114119758B (zh) 获取车辆位姿的方法、电子设备和计算机可读存储介质
US20220262035A1 (en) Method, apparatus, and system for determining pose
CN112087649B (zh) 一种设备搜寻方法以及电子设备
WO2022179604A1 (zh) 一种分割图置信度确定方法及装置
WO2022156473A1 (zh) 一种播放视频的方法及电子设备
US20240193945A1 (en) Method for determining recommended scenario and electronic device
WO2023216957A1 (zh) 一种目标定位方法、系统和电子设备
WO2022161386A1 (zh) 一种位姿确定方法以及相关设备
CN113538227A (zh) 一种基于语义分割的图像处理方法及相关设备
CN115150542B (zh) 一种视频防抖方法及相关设备
WO2022068522A1 (zh) 一种目标跟踪方法及电子设备
CN116826892B (zh) 充电方法、充电装置、电子设备及可读存储介质
CN115032640B (zh) 手势识别方法和终端设备
CN115775395A (zh) 图像处理方法及相关装置
WO2022161011A1 (zh) 生成图像的方法和电子设备
CN114812381A (zh) 电子设备的定位方法及电子设备
CN117727073B (zh) 模型训练方法及相关设备
CN117131213B (zh) 图像处理方法及相关设备
WO2022222705A1 (zh) 设备控制方法和电子设备
WO2024046162A1 (zh) 一种图片推荐方法及电子设备
US20240184504A1 (en) Screen projection method and system, and related apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802729

Country of ref document: EP

Kind code of ref document: A1