WO2021013230A1

WO2021013230A1 - Robot control method, robot, terminal, server, and control system

Info

Publication number: WO2021013230A1
Application number: PCT/CN2020/103859
Authority: WO
Inventors: 薛清风; 彭洪彬
Original assignee: 华为技术有限公司
Priority date: 2019-07-24
Filing date: 2020-07-23
Publication date: 2021-01-28
Also published as: CN110495819B; CN110495819A

Abstract

A robot (300) control method, the robot (300), a terminal (100), a server (200), and a control system. The control method comprises: the robot (300) uploads to the server (200) the coordination conversion relation between a first coordinate system and a second coordinate system and a visual SLAM map; the terminal (100) uploads a target image frame and feature data of a target point to the server (200); the server (200) determines coordinates of the target point in the first coordinate system according to the target image frame, the feature data of the target point, and the visual SLAM map; the server (200) converts the first coordinates of the target point in the first coordinate system into second coordinates of the target point in the second coordinate system, and the server (200) sends the second coordinates to the robot (300); the robot (300) receives the second coordinates, determines a movement path according to the second coordinates and coordinates of the current location of the robot (300) in the second coordinate system, and moves to the target point according to the movement path. A user can precisely select a position to which the robot (300) is expected to move, so that the robot (300) can accurately moves to the target point.

Description

Robot control method, robot, terminal, server and control system

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on July 24, 2019, with the application number "201910673025.0" and the application name "Robot control method, robot, terminal, server and control system", all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a robot control method, robot, terminal, server and control system.

Background technique

At present, household sweeping robots are becoming more and more popular, and users need to issue instructions to the robots and control the robots to clean designated areas.

At present, there are two mainstream control methods. One is realized by a remote control. The user presses the up, down, left, and right buttons of the remote control to control the robot to move to a specified target. The sweeper of this control mode must be within the human's field of vision, and the sweeper runs across the house when it is working, and the experience of using this method will be poor.

The other is that the user manually selects the target point through the electronic map drawn in the mobile APP (as shown in Figure 1) and sends it to the sweeper. The main interactive operation relies on the electronic map generated by SLAM (Simultaneous Localization and Mapping, real-time positioning and mapping) technology. The reproduction of the drawn electronic map is still very low, and the error when manually selecting the target point on the map is large (meter level), so that the robot cannot accurately move to the target point.

Application content

The embodiments of the present application provide a robot control method, robot, terminal, server, and control system to control the robot to accurately move to a target point.

The first aspect of the embodiments of the present application provides a method for controlling a robot. This method is applied to a system composed of robots, terminals, and servers. Among them, the robot carries a camera and lidar. The method includes: the robot creates a visual SLAM map through its own camera, creates a laser SLAM map through its own lidar, takes the coordinate system of the visual SLAM map on the horizontal plane as the first coordinate system, and uses the laser SLAM map As the second coordinate system, the robot uploads the coordinate conversion relationship between the first coordinate system and the second coordinate system and the visual SLAM map to the server; the terminal intercepts the current interface to obtain the target image frame, and the terminal extracts the characteristic data of the target point, and Upload the target image frame and the characteristic data of the target point to the server; the server receives the target image frame, the characteristic data of the target point, and determines the target point in the first coordinate system according to the target image frame, the characteristic data of the target point and the visual SLAM map The server converts the first coordinate of the target point in the first coordinate system into the second coordinate of the target point in the second coordinate system, and the server sends the second coordinate to the robot; the robot receives the second coordinate, and according to the second coordinate The coordinates and the coordinates of the current position of the robot in the second coordinate system determine the motion path and move to the target point according to the motion path.

In the embodiment of this application, the user does not need to select the target point on the electronic map with a low degree of restoration, but directly selects the target point on the interface of the terminal. Therefore, the user can accurately select the position where the robot wants to control the movement. So the robot can accurately move to the target point.

Optionally, the origin of the first coordinate system and the origin of the second coordinate system may or may not overlap. In the case where the origin of the first coordinate system does not coincide with the origin of the second coordinate system, the coordinate conversion relationship between the two coordinate systems is more complicated; in the case where the origin of the first coordinate system coincides with the origin of the second coordinate system Below, the coordinate conversion relationship between the two coordinate systems is relatively simple. If the origin of the first coordinate system does not coincide with the origin of the second coordinate system, the coordinate conversion relationship includes the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system, the origin of the first coordinate system and the second coordinate system. The relative position of the origin of the coordinate system. If the origin of the first coordinate system coincides with the origin of the second coordinate system, the coordinate conversion relationship includes the coordinates of the same point in the first coordinate system and the second coordinate system, or the axial direction of the first coordinate system and the second coordinate system The axial angle of the coordinate system. In the case where the origin of the first coordinate system coincides with the origin of the second coordinate system, the server can convert the first coordinate into the second coordinate according to the following formula: X'=X*cos(θ)-Y*sin(θ) , Y'=X*sin(θ)+Y*cos(θ); where (X',Y') represents the second coordinate, (X,Y) represents the first coordinate, and θ represents the axis of the first coordinate system The angle between the axis and the axis of the second coordinate system. The angle θ between the axial direction of the first coordinate system and the axial direction of the second coordinate system satisfies the following formula: θ=arccos[(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² ) ], where (X ₁ , Y ₁ ) and (X ₁ ', Y ₁ ') respectively represent the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system.

Optionally, the included angle θ can be measured before the robot leaves the factory, or can be measured after the robot reaches the user's hands. Specifically, before the robot leaves the factory, the robot constructs a laser SLAM map and a visual SLAM map for a certain environment. After the map is constructed, move the robot to a certain point, and check the coordinates of the robot in the first coordinate system (X ₁ , Y ₁ ) and the coordinates (X ₁ ', Y ₁ ') in the second coordinate system, according to the formula: θ=arccos[(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² )] Calculate the included angle θ. Or, it is also possible not to measure θ before the robot leaves the factory, and wait for the robot to reach the user's hands before measuring. After the user takes the robot home, the robot constructs a laser SLAM map and a visual SLAM map of the indoor environment. After the map is constructed, make the robot move to a certain point in the room, check the coordinates of the robot in the first coordinate system (X ₁ , Y ₁ ) and the coordinates (X ₁ ', Y ₁ ') in the second coordinate system, according to the formula ：Θ=arccos[(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² )] Calculate the included angle θ.

A second aspect of the embodiments of the present application provides a method for controlling a robot, which is applied to a robot including a camera and a lidar. The method includes: the robot creates a visual SLAM map through the camera and a laser SLAM map through the lidar; Upload the coordinate conversion relationship between the first coordinate system and the second coordinate system to the server; where the first coordinate system is the projected coordinate system of the visual SLAM map on the horizontal plane, and the second coordinate system is the coordinate system of the laser SLAM map; The robot uploads the visual SLAM map to the server; the robot receives the second coordinates obtained by the server based on the visual SLAM map and the coordinate conversion relationship between the first coordinate system and the second coordinate system. The second coordinates are the coordinates of the target point in the second coordinate system ; The robot determines the movement path according to the second coordinates and the coordinates of the current position of the robot in the second coordinate system; the robot moves to the target point according to the movement path.

Among them, if the origin of the first coordinate system does not coincide with the origin of the second coordinate system, the coordinate conversion relationship includes the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system, and the origin of the first coordinate system and The relative position of the origin of the second coordinate system.

Wherein, if the origin of the first coordinate system coincides with the origin of the second coordinate system, the coordinate conversion relationship includes the coordinates of the same point in the first coordinate system and the second coordinate system, or the axial direction of the first coordinate system and The axial angle of the second coordinate system. The angle θ between the axial direction of the first coordinate system and the axial direction of the second coordinate system satisfies the following formula: θ=arccos[(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² ) ], where (X ₁ , Y ₁ ) and (X ₁ ', Y ₁ ') respectively represent the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system.

The third aspect of the embodiments of the present application provides a robot control method, which is applied to a terminal. The method includes: the terminal receives a user's touch operation; the terminal intercepts the current interface to obtain the target image frame; the terminal determines the characteristic data of the target point ; The terminal uploads the target image frame and the characteristic data of the target point to the server.

Optionally, the control method of the robot further includes: the terminal extracts feature points from the target image frame according to a preset feature extraction algorithm; the terminal determines the feature data of the feature points; and the terminal uploads the feature data of the feature points to the server.

Optionally, the control method of the robot further includes: the terminal determines the screen coordinates of the target point and the screen coordinates of the feature point; the terminal according to the screen coordinates of the target point and the screen coordinates of the feature point, and the feature data of the target point and the feature point The characteristic data determines the relative position of the target point and the characteristic point; the terminal uploads the relative position of the target point and the characteristic point to the server.

The fourth aspect of the embodiments of the present application provides a robot control method, which is applied to a server. The method includes: the server receives the visual SLAM map sent by the robot; the server receives the coordinates of the first coordinate system and the second coordinate system sent by the robot Conversion relationship; where the first coordinate system is the projected coordinate system of the visual SLAM map on the horizontal plane, and the second coordinate system is the coordinate system of the laser SLAM map acquired by the robot; the visual SLAM map and the laser SLAM map are the robot’s Created in the same environment; the server receives the target image frame and the characteristic data of the target point uploaded by the terminal; the server determines the first coordinate of the target point in the first coordinate system according to the target image frame, the characteristic data of the target point and the visual SLAM map ; The server converts the first coordinate into the second coordinate, and the second coordinate is the coordinate of the target point in the second coordinate system; the server sends the second coordinate to the robot.

Optionally, the server determines the first coordinate of the target point in the first coordinate system according to the target image frame, the characteristic data of the target point and the visual SLAM map, including: the server obtains the characteristic data of the characteristic point in the target image frame; The feature data of the feature point is matched with the feature data in the visual SLAM map to determine the coordinate of the feature point in the first coordinate system; the server determines the relative position of the target point and the feature point; the server determines the position of the feature point in the first coordinate system The coordinates, the relative position of the target point and the characteristic point determine the first coordinate of the target point in the first coordinate system.

Optionally, the server determines the relative position of the target point and the characteristic point, including: the server receives the screen coordinates of the target point and the screen coordinates of the characteristic point uploaded by the terminal, and the characteristic data of the target point and the characteristic data of the characteristic point; The screen coordinates of the point and the screen coordinates of the feature point, as well as the feature data of the target point and the feature data of the feature point, determine the relative position of the target point and the feature point.

Optionally, the server converts the first coordinate to the second coordinate, including: the server according to the relative position of the origin of the first coordinate system and the origin of the second coordinate system, and the relationship between the axis of the first coordinate system and the second coordinate system The included angle of the axis converts the first coordinate to the second coordinate.

Optionally, the origin of the first coordinate system coincides with the origin of the second coordinate system, and the first coordinate and the second coordinate satisfy the following formula: X'=X*cos(θ)-Y*sin(θ), Y'= X*sin(θ)+Y*cos(θ); where (X',Y') represents the second coordinate, (X,Y) represents the first coordinate, and θ represents the axis of the first coordinate system and the second The axial angle of the coordinate system. θ is determined by the following formula: θ=arccos[(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² )], where (X ₁ ,Y ₁ ), (X ₁ ', Y ₁ ') respectively represent the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system.

In a fifth aspect of the embodiments of the present application, a robot is provided, including a memory and a processor. The memory is used to store information including program instructions. The processor is used to control the execution of the program instructions. When the program instructions are loaded and executed by the processor, Make the robot execute the method described in the second aspect.

A sixth aspect of the embodiments of the present application provides a terminal, including a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and when the program instructions are loaded and executed by the processor, The terminal is caused to execute the method described in the third aspect.

In a seventh aspect of the embodiments of the present application, a server is provided, including a memory and a processor, the memory is used to store information including program instructions, and the processor is used to control the execution of the program instructions. When the program instructions are loaded and executed by the processor, The server is caused to execute the method described in the fourth aspect.

An eighth aspect of the embodiments of the present application provides a control system. The control system includes the robot described in the fifth aspect, the terminal described in the sixth aspect, and the server described in the seventh aspect.

Create a visual SLAM map, and create a laser SLAM map through the lidar carried by itself. In the embodiment of this application, the robot constructs two maps, the visual SLAM map and the laser SLAM map, and the robot or the server can learn the difference between the two maps. Conversion relationship. Then based on the user's selection in the image of the actual environment provided by the terminal, the server can learn the position of the target point selected by the user in the visual SLAM map, and based on the conversion relationship between the two maps, obtain the target point in the laser SLAM map Therefore, the robot can be conveniently controlled to move to the target point selected by the user based on the laser SLAM map. Therefore, the user does not need to select the target point on the electronic map with a low degree of restoration, but directly selects the target point on the interface of the terminal. Therefore, the user can accurately select the position where the robot wants to control the movement, so that the robot can accurately Move to the target point.

Description of the drawings

Figure 1 is a schematic diagram of an electronic map provided by the prior art;

2A is a schematic diagram of a control system provided by an embodiment of the application;

2B is a schematic structural diagram of a robot provided by an embodiment of the application;

2C is a software structure block diagram of a robot provided by an embodiment of this application;

3 is a schematic diagram of selecting a target point through a terminal according to an embodiment of the application;

4A is a schematic diagram of an image obtained by a terminal photographing an indoor environment according to an embodiment of the application;

4B is a schematic diagram of feature points in an image obtained by a terminal photographing an indoor environment according to an embodiment of the application;

FIG. 5A is a flowchart of interaction between a robot, a terminal, and a server provided by an embodiment of this application;

FIG. 5B is a flowchart of another robot, terminal, and server interaction provided by an embodiment of the application;

6 is a schematic diagram of the right-hand coordinate system and the ground provided by an embodiment of the application;

FIG. 7 is a schematic diagram of the positional relationship between the first coordinate system and the second coordinate system provided by an embodiment of the application;

FIG. 8 is a flowchart of a method for controlling a robot provided by an embodiment of the application;

Fig. 9 is a flowchart of a method for controlling a robot provided by an embodiment of the application.

Detailed ways

The terminology used in the implementation mode of this application is only used to explain the specific embodiments of this application, and is not intended to limit this application.

In this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, both A and B exist, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are in an "or" relationship. "The following at least one item (a)" or similar expressions refers to any combination of these items, including any combination of a single item (a) or plural items (a). For example, at least one item (a) of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple .

Referring to FIG. 2A, an embodiment of the present application provides a control system, including a terminal 100, a server 200, and a robot 300.

The terminal 100 is also called user equipment (User Equipment, UE), which is a device that provides voice and/or data connectivity to users. For example, handheld devices with wireless connectivity, vehicle-mounted devices, etc. Common terminals include, for example, mobile phones, tablet computers, notebook computers, palmtop computers, and mobile Internet devices (MID). The terminal 100 can control the robot 300 through the server 200. The robot 300 can also be replaced with other electronic devices that have functions similar to those described in the embodiments of the present application and can be controlled by the terminal 100.

FIG. 2B shows a schematic structural diagram of the terminal 100.

The terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and user An identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include pressure sensor 180A, gyroscope sensor 180B, air pressure sensor 180C, magnetic sensor 180D, acceleration sensor 180E, distance sensor 180F, proximity light sensor 180G, fingerprint sensor 180H, temperature sensor 180J, touch sensor 180K, ambient light Sensor 180L, bone conduction sensor 180M, etc.

It is understandable that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the terminal 100. In other embodiments of the present application, the terminal 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc. Among them, the different processing units may be independent devices or integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.

The I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the terminal 100.

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.

The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the terminal 100. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the terminal 100.

The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transfer data between the terminal 100 and peripheral devices. It can also be used to connect headphones and play audio through the headphones. This interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the terminal 100. In other embodiments of the present application, the terminal 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.

The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some embodiments of wireless charging, the charging management module 140 may receive the wireless charging input through the wireless charging coil of the terminal 100. While the charging management module 140 charges the battery 142, it can also supply power to the terminal 100 through the power management module 141.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the terminal 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the terminal 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the terminal 100, including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.

In some embodiments, the antenna 1 of the terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).

The terminal 100 implements a display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the terminal 100 may include one or N display screens 194, and N is a positive integer greater than one.

The terminal 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.

The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats. In some embodiments, the terminal 100 may include 1 or N cameras 193, and N is a positive integer greater than 1.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, for example: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.

NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information and can continuously learn by itself. Through the NPU, applications such as intelligent cognition of the terminal 100 can be implemented, such as image recognition, face recognition, voice recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the terminal 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), etc. The processor 110 executes various functional applications and data processing of the terminal 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.

The speaker 170A, also called a "speaker", is used to convert audio electrical signals into sound signals. The terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C. The terminal 100 may be provided with at least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the terminal 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.

The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. Pressure sensor 180A

There are many types, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The terminal 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the terminal 100 detects the intensity of the touch operation according to the pressure sensor 180A. The terminal 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the movement posture of the terminal 100. In some embodiments, the angular velocity of the terminal 100 around three axes (ie, x, y, and z axes) can be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the terminal 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the terminal 100 through a reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The terminal 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the terminal 100 is a flip machine, the terminal 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Furthermore, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (generally three axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of the terminal 100, and be used in applications such as horizontal and vertical screen switching, and pedometer.

Distance sensor 180F, used to measure distance. The terminal 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 100 emits infrared light to the outside through the light emitting diode. The terminal 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 100. When insufficient reflected light is detected, the terminal 100 may determine that there is no object near the terminal 100. The terminal 100 can use the proximity light sensor 180G to detect that the user holds the terminal 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense the brightness of the ambient light. The terminal 100 can adjust the brightness of the display screen 194 automatically according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal 100 is in a pocket to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The terminal 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the terminal 100 executes to reduce the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the terminal 100 heats the battery 142 to avoid abnormal shutdown of the terminal 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the terminal 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.

Touch sensor 180K, also called "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”. The touch sensor 180K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal 100, which is different from the position of the display screen 194.

The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal. In some embodiments, the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.

The button 190 includes a power button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The terminal 100 may receive key input, and generate key signal input related to user settings and function control of the terminal 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photographing, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.

The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the terminal 100. The terminal 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the terminal 100 adopts an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.

The software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the present application takes an Android system with a layered architecture as an example to illustrate the software structure of the terminal 100 by way of example.

FIG. 2C is a block diagram of the software structure of the terminal 100 according to an embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.

The application layer can include a series of application packages.

As shown in Figure 2C, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.

The application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 2C, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.

The window manager is used to manage window programs. The window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.

The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.

The view system includes visual controls, such as controls that display text and controls that display pictures. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the terminal 100. For example, the management of the call status (including connecting, hanging up, etc.).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompt text messages in the status bar, sound prompts, robot vibrations, flashing lights, etc.

Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.

The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files in the application layer and application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.

The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.

In the following, the workflow of the software and hardware of the terminal 100 will be exemplarily described in conjunction with capturing a photo scene.

When the touch sensor 180K receives a touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into the original input event (including touch coordinates (in the embodiment of the present application), the time stamp of the touch operation, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer. The camera 193 captures still images or videos.

After purchasing the robot, the user can use the terminal to download an APP for controlling the robot. The APP associates the terminal with the robot, and uploads the association relationship to the server. The server stores the association relationship between the terminal and the robot. A robot can be associated with one terminal, or with two or more terminals. For example, if A purchases the robot R1, the server stores the association relationship between A's mobile phone and the robot R1, so that when A opens the APP of his mobile phone, he can control the movement of the robot R1. For another example, someone buys a robot R2 and gives it to B. There are 3 people in B’s family, and B associates all 3 people’s mobile phones with robot R2, so that when any one of the 3 people in B’s family opens the app on their mobile phone, Able to control the movement of the robot R2. By associating the terminal with the robot and storing the association relationship between the terminal and the robot, the robot can be controlled by the terminal associated with the robot.

After the user purchases the robot, he puts the robot in an indoor environment, for example, in his own home, and turns on the robot, and the robot will construct a laser SLAM map and a visual SLAM map of the indoor environment.

The following describes the simultaneous positioning and map construction technology.

Simultaneous Localization And Mapping (SLAM), usually refers to a system that collects and calculates various sensor data on a robot or other carrier to generate its own position and posture positioning and scene map information . SLAM technology is critical to the action and interaction capabilities of robots or other agents, because it represents the basis of this ability: knowing where you are, knowing the surrounding environment, and knowing how to act autonomously next. It has a wide range of applications in areas such as autonomous driving, service robots, unmanned aerial vehicles, AR/VR, etc. It can be said that all agents with certain mobility have some form of SLAM system.

Generally speaking, SLAM systems usually contain multiple sensors and multiple functional modules. According to the core functional modules, the current common robot SLAM system generally has two forms: SLAM based on lidar (laser SLAM) and SLAM based on vision (Visual SLAM, VSLAM or visual SLAM).

Laser SLAM was born out of early ranging-based positioning methods (such as ultrasonic and infrared single-point ranging). The emergence and popularization of lidar makes the measurement faster and more accurate, and the information is richer. The object information collected by lidar presents a series of scattered points with accurate angle and distance information, which are called point clouds. Generally, the laser SLAM system calculates the relative movement distance and posture change of the lidar by matching and comparing two point clouds at different times, thus completing the positioning of the robot itself. Lidar distance measurement is more accurate, the error model is simple, it runs stably in environments other than direct sunlight, and it is easier to process point clouds. At the same time, the point cloud information itself contains direct geometric relationships, making the path planning and navigation of the robot intuitive.

Eyes are the main source of information from the outside world. Visual SLAM also has similar characteristics. It can obtain massive and redundant texture information from the environment and has super-strong scene recognition capabilities. The early visual SLAM was based on filtering theory, and its nonlinear error model and huge amount of calculation became obstacles to its practical implementation. In recent years, with the sparse nonlinear optimization theory (Bundle Adjustment) and the advancement of camera technology and computing performance, visual SLAM running in real time is no longer a dream.

Generally, a visual SLAM system consists of a front end and a back end. The front end is responsible for calculating the robot's pose through visual incremental, which is faster. The back end is mainly responsible for two functions: one is when a loop occurs (that is, when the robot is determined to have returned to the place it has visited before), the loop is found and the position and posture of the two visits are corrected; the other is when the front end tracking is lost , Reposition the robot according to the visual texture information. Simply put, the front end is responsible for fast positioning, and the back end is responsible for slower map maintenance.

In the embodiment of the present application, after the robot constructs the laser SLAM map and the visual SLAM map of the indoor environment, it transmits the constructed visual SLAM map to the server, and the server receives and stores the visual SLAM map.

In the embodiment of this application, the user is in an indoor space and wants to control the movement of the robot through the terminal. For example, the robot can be a sweeping robot, and the user wants to control the sweeping robot to move to a certain position in a certain room through the terminal to clean. , For the convenience of description, this position is called the target point.

After the user’s terminal downloads the APP for controlling the robot, when the user wants to control the robot to move to the target point, the user opens the terminal’s APP, and the APP calls the terminal’s camera. See Figure 3. The user can tilt the terminal at an appropriate angle so that The camera can collect images containing the target point, and the current image captured by the camera is displayed on the terminal interface. The user touches a point on the terminal interface with a finger to select the target point. The contact point between the user's finger and the terminal screen is called the touch point. Take a screenshot of the current interface to get the target image frame.

Referring to Figure 3, the touch point is a point on the screen of the terminal, and the target point is the position the user wants the robot to move to. When the user wants to change the position of the target point, for example, changing the position of the target point from a point in front of the sofa to a point in front of the bookcase, the terminal can be adjusted to tilt or rotate to adjust the direction of the terminal’s camera, Or change the touch point on the screen, etc. Based on the position of the touch point and the current tilt angle of the terminal, the target point in the image (that is, in the indoor three-dimensional space) that the user wants to select can be identified.

Please refer to Figure 4A, which shows the target image frame obtained by the mobile phone taking a screenshot of the current interface.

Refer to Figure 4B, point A is the touch point. The terminal uses a preset feature extraction algorithm to extract feature points in the target image frame, for example, point B1, point B2, ..., point B9 are the extracted feature points, and the terminal determines the screen coordinates of the feature point. The preset feature extraction algorithm may be a SIFT algorithm or the like. SIFT, or Scale-invariant Feature Transform (SIFT), is a description used in the field of image processing. This description is scale-invariant, and key points can be detected in the image.

Next, introduce two concepts, the first coordinate system and the second coordinate system.

The first coordinate system is the projected coordinate system of the coordinate system of the visual SLAM map on the horizontal plane, the second coordinate system is the coordinate system of the laser SLAM map, and both the first coordinate system and the second coordinate system are two-dimensional coordinate systems.

The terminal uploads the screen coordinates of the feature point and the target point, and the feature data of the feature point and the target point to the server; it can also upload the data of the target image frame to the server. The server determines the relative position of the target point and each feature point according to the feature data of each feature point and the target point, and finds the coordinates of these feature points from the visual SLAM map according to the feature data of each feature point; and based on the target point and each feature The relative position of the point, the coordinates of the target point in the visual SLAM map are obtained. That is, the first coordinate of the target point in the first coordinate system is acquired. The server converts the coordinates of the target point in the first coordinate system into the coordinates of the target point in the second coordinate system. The server sends the coordinates of the target point in the second coordinate system to the sweeping robot, and the sweeping robot plans the movement path according to the coordinates of the target point in the second coordinate system and the coordinates of its current position in the second coordinate system. The movement path moves to the target point, and then the position of the target point is cleaned.

Referring to FIG. 5A and FIG. 5B, the robot control method provided by the embodiment of the present application involves the interaction between the terminal, the server, and the robot. The control method of the robot shown in FIG. 5A includes the following steps S1 to S10. The robot control method shown in FIG. 5B includes the following steps S1-S4, steps S5'-S7', and steps S8-S10. Figure 5A is described in detail below.

Step S1: The robot constructs a visual SLAM map and a laser SLAM map.

Among them, the process of the robot constructing the laser SLAM map includes step S400 to step S403. Lidar is installed in the robot, which can emit laser light and receive the laser light reflected by obstacles. Among them, the obstacles refer to objects placed indoors and generally stationary.

Step S400: Predetermine the origin and coordinate system of the laser SLAM map to be created. The selection rule of the origin may be set before the factory. Wherein, the origin may be, for example, the position of the charging pile of the robot.

Step S401: When the robot is moving indoors, the lidar continuously emits laser light, and the laser light is reflected by the obstacle point before being received by the lidar. Wherein, the obstacle point refers to a point in the obstacle. The laser is usually fired to a point in the obstacle.

Step S402: the robot determines the direction of the obstacle point according to the orientation of the lidar; and determines the distance between the obstacle point and the robot according to the length of time from when the laser is emitted to when the laser is received.

Among them, after the origin and coordinate system are determined, during the movement of the robot, the robot can obtain the coordinates of its current location based on its own sensors. In this way, based on the length of the laser round trip, the distance between the obstacle point and the current position of the robot can be determined; the distance between the obstacle point and the origin can also be determined.

Step S403: The robot can obtain the direction of each obstacle point and the distance between each obstacle point and the origin by moving indoors and continuously emitting laser, and then the robot can create based on the direction of the obstacle point and the distance between the obstacle point and the origin Laser SLAM map.

The process of the robot constructing the visual SLAM map includes steps S501 to S505. A camera is installed in the robot to take pictures of the surrounding environment.

Step S501: Predetermine the origin and coordinate system of the visual SLAM map to be created. The selection rule of the origin may be set before the factory. Wherein, the origin may be, for example, the position of the charging pile of the robot.

Step S502: the robot moves indoors and takes pictures of the surrounding environment through the camera. Among them, after the origin and coordinate system are determined, during the movement of the robot, the robot can obtain the coordinates of its current location based on its own sensors.

Step S503: The robot extracts feature points of the captured image according to a preset feature extraction algorithm, and obtains the positions of these feature points relative to the robot. The preset feature extraction algorithm may be a SIFT algorithm or the like. SIFT, or Scale-invariant Feature Transform (SIFT), is a description used in the field of image processing. This description is scale-invariant, and key points can be detected in the image.

Step S504: The robot calculates the coordinates of the feature point in the coordinate system of the visual SLAM map according to the position of the feature point relative to the robot and the current coordinate of the robot in the coordinate system of the visual SLAM map.

Step S505: The robot draws a visual SLAM map based on the coordinates in the coordinate system of the visual SLAM map by moving indoors and continuously acquiring the coordinates of the surrounding feature points.

Refer to Figure 6. In the spatial rectangular coordinate system, let the right thumb point to the positive direction of the x-axis, and the index finger point to the positive direction of the y-axis. If the middle finger can point to the positive direction of the z-axis, this coordinate system is called the right-handed coordinate system. The XY plane in the visual SLAM coordinate system describes the tangent plane of the earth plane with the origin as the tangent point. In the embodiment of this application, the earth surface can be regarded as the ground plane, and the XY plane describes the ground and the coordinates of the laser SLAM The plane described by the department is the same. As mentioned above, the first coordinate system is the projected coordinate system of the coordinate system of the visual SLAM map on the horizontal plane. Therefore, for any point in the three-dimensional space, the abscissa of the point in the visual SLAM map is the same as the point in the first coordinate. The abscissa of the system is the same; and the ordinate of the point in the visual SLAM map is the same as the ordinate of the point in the first coordinate system. The second coordinate system is the coordinate system of the laser SLAM map. For any point in the three-dimensional space, the abscissa of the point in the laser SLAM map is the same as the abscissa of the point in the second coordinate system; and the point is in the laser SLAM The ordinate in the map is the same as the ordinate of the point in the second coordinate system.

In the visual SLAM map, there is an accumulation of feature points relative to the position of the robot, so there is a cumulative error. In the process of constructing the visual SLAM map, the laser SLAM map can be used to correct the visual SLAM map. Specifically, for the abscissa X ₀ and ordinate Y ₀ of a point in the visual SLAM map (for convenience of description, the point is called point B), the abscissa and ordinate of the point are determined in the laser SLAM map, respectively X ₀ ', Y ₀ ', the coordinates of the point in the visual SLAM map can be corrected according to the coordinates of the point in the laser SLAM map.

As a possible way, the abscissa of point B in the visual SLAM map after correction is K1*X ₀ +K2*X ₀ ', and the ordinate is K1*Y ₀ +K2*Y ₀ '. K1 and K2 are constants, and the values of K1 and K2 can be equal or unequal. For example, as an optional method, both K1 and K2 can be set to 1/2. At this time, the abscissa of point B in the visual SLAM map after correction is (X ₀ +X ₀ ')/2, and the ordinate is Is (Y ₀ +Y ₀ ')/2; as another alternative, you can set K1=0.25, K2=0.75, at this time, the abscissa of point B in the visual SLAM map after correction is 0.25*X ₀ +0.75*X ₀ ', the ordinate is 0.25*Y ₀ +0.75*Y ₀ '; as yet another optional way, you can set K1=0, K2=1, at this time, point B is in the visual The horizontal coordinate in the SLAM map is X ₀ ', and the vertical coordinate is Y ₀ '.

Step S2: The robot uploads the visual SLAM map to the server.

Step S3: The robot uploads the relative position of the origin of the first coordinate system and the origin of the second coordinate system, and the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system to the server.

If the distance between two origins and the direction of one origin relative to the other origin are known, the relative position of the two origins can be known. If the coordinates of the origin of the first coordinate system in the second coordinate system are known, or the coordinates of the origin of the second coordinate system in the first coordinate system are known, the relative positions of the two origins can also be known. In step S3, the robot uploads the relative position of the origin of the first coordinate system and the origin of the second coordinate system to the server. Specifically, it uploads the following information to the server: the distance between the origin of the first coordinate system and the origin of the second coordinate system Distance, and the direction of one origin relative to another origin; or, the coordinates of the origin of the first coordinate system in the second coordinate system, or the coordinates of the origin of the second coordinate system in the first coordinate system.

The origin of the first coordinate system and the origin of the second coordinate system may or may not coincide. Compared with the case where the two origins overlap, the formula for calculating the second coordinate based on the first coordinate is more complicated when the two origins do not overlap. In the case where the origins of the two coordinate systems coincide, the robot only needs to upload the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system to the server.

The angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system specifically refers to the angle between the horizontal axis of the first coordinate system and the horizontal axis of the second coordinate system, or, the first coordinate system The angle between the vertical axis of and the vertical axis of the second coordinate system. The included angle lies between [0,π]. Referring to FIG. 7, xOy represents the first coordinate system, x'Oy' represents the second coordinate system, and θ is the axial angle between the two coordinate systems.

Step S4: The terminal displays a visual interface.

When the user wants to control the robot to move to a target point, for example, when the user wants to control the sweeping robot to move to the target point to clean, the user can open an APP installed on the terminal for controlling the robot. The APP calls the camera of the terminal to take an image of the current environment and provides a visual interface for the user. The user can see the indoor environment in the image taken by the camera, for example, the user can see the sofa, desk, chair, bookcase, etc. in the image taken by the camera. If the user wants the robot to move to a certain position in front of the sofa, the position in front of the sofa is the target point; if the user wants the robot to move to a certain position beside the bookcase, the position beside the bookcase is the target point.

Step S5: The terminal receives the user's touch operation, takes a screenshot of the interface to obtain the target image frame, extracts the feature points in the target image frame, and determines the feature data of each feature point and the relative position of the target point and each feature point. Generally speaking, the number of extracted feature points is greater than 2.

The user's finger touches on the screen of the terminal to select a target point, assuming that the touch point (the position where the finger contacts the screen) is point A (see FIG. 4B). Based on the position of point A and the current tilt angle of the terminal, the terminal recognizes the target point in the image (that is, in the indoor three-dimensional space) that the user wants to select. The terminal takes a screenshot of the interface displayed on the screen to obtain the target image frame, and the terminal extracts the characteristic points in the target image frame. Figure 4B shows some of the extracted characteristic points (for example, point B1, point B2, ..., point B9) . The terminal determines the relative position of the target point and each feature point.

It should be noted that the feature extraction algorithm used by the terminal to extract feature points in the target image frame in step S5 is the same as the feature extraction algorithm used when the robot constructs the SLAM map.

Step S6: The terminal uploads the characteristic data of each characteristic point and the relative position of the target point and each characteristic point to the server.

Step S7: The server finds the coordinates of these characteristic points from the visual SLAM map according to the characteristic data of each characteristic point; and obtains the coordinates of the target point in the visual SLAM map based on the relative position of the target point and each characteristic point. That is, the coordinates of the target point in the first coordinate system are acquired. For the convenience of description, the coordinates are called the first coordinates.

Specifically, the server receives the feature data of the feature points uploaded by the terminal, searches for the corresponding feature points in the visual SLAM map stored by itself, and determines the coordinates of the feature points in the visual SLAM map. The coordinates of the feature points in the visual SLAM map can be three-dimensional coordinates. The coordinates of the three dimensions are: abscissa, ordinate, and vertical. Wherein, the abscissa is taken as the abscissa of the target point in the first coordinate system, and the ordinate is taken as the ordinate of the target point in the first coordinate system. The coordinates of the target point in the first coordinate system are determined according to the coordinates of the characteristic point in the first coordinate system and the relative position of the characteristic point and the target point, that is, the first coordinate is determined.

Step S8: The server converts the first coordinate into the second coordinate according to the relative position of the origin of the first coordinate system and the second coordinate system, and the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system.

Assume that the abscissa of the target point is X and the ordinate is Y in the first coordinate system, and the abscissa is X'and the ordinate is Y'in the second coordinate system. Then, converting the first coordinate to the second coordinate is the process of calculating the second coordinate based on the first coordinate.

Please refer to Figure 7, xOy represents the first coordinate system, x'Oy' represents the second coordinate system, the angle between the horizontal axis of the first coordinate system and the horizontal axis of the second coordinate system is θ, in the first coordinate system In the case of coincidence with the origin of the second coordinate system, the coordinates of point A in the first coordinate system are (X, Y), and the coordinates in the second coordinate system are (X′, Y′), point A and the two coordinates The distance between the origin O of the system is r, and the angle between the line segment OA and the horizontal axis of the first coordinate system is β. The following formula holds:

X=r*cos(β) (1)

Y=r*sin(β) (2)

X′=r*cos(θ+β) (3)

Y′=r*sin(θ+β) (4)

Formula (3) is expanded to get formula (5)

X′=r*[cos(θ)*cos(β)-sin(θ)sin(β)] (5)

Formula (4) is expanded to get formula (6)

Y′=r*[sin(θ)*cos(β)+cos(β)sin(θ)] (6)

Substituting formulas (1) and (2) into formulas (5) and (6) gives the following results

X′=X*cos(θ)–Y*sin(θ) (7)

Y′=X*sin(θ)+Y*cos(θ) (8)

Converted to matrix calculation is:

According to formula (7), (8), or formula (9), the coordinates (X', Y') of the target point in the second coordinate system can be calculated.

According to formula (7), (8), or according to formula (9) to calculate the second coordinate, it is necessary to know the value of θ. The angle between the horizontal axis of the first coordinate system and the horizontal axis of the second coordinate system is θ, the positive direction of the horizontal axis of the first coordinate system is determined by the fixed parameters of the camera, and the positive direction of the horizontal axis of the second coordinate system Determined by the fixed parameters of the lidar. Once the robot is manufactured, θ is determined and is a fixed constant. Θ can be deduced from the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system.

The method of deducing θ will be introduced in detail below. Determine a point (for the convenience of description, call this point C), make the robot move to point C, and check the coordinates of the robot in the first coordinate system and the second coordinate system. Assume that the coordinates in the first coordinate system are (X ₁ , Y ₁ ), the coordinates in the second coordinate system are (X ₁ ′, Y ₁ ′), that is, the coordinates of point C in the first coordinate system are (X ₁ , Y ₁ ), point C The coordinates in the second coordinate system are (X ₁ ′, Y ₁ ′).

According to the above formulas (7) and (8), there are:

X ₁ ′=X ₁ *cos(θ)–Y ₁ *sin(θ) (10)

Y ₁ ′=X ₁ *sin(θ)+Y ₁ *cos(θ) (11)

Multiply both the left and right sides of formula (10) by X ₁ to obtain formula (12)

X ₁ X ₁ ′=X ₁ ² *cos(θ)–X ₁ Y ₁ *sin(θ) (12)

Multiply both the left and right sides of formula (11) by Y ₁ to obtain formula (13)

Y ₁ Y ₁ ′=Y ₁ X ₁ *sin(θ)+Y ₁ ² *cos(θ) (13)

Add the left and right sides of formula (12) and formula (13) to obtain formula (14)

X ₁ X ₁ ′+Y ₁ Y ₁ ′=X ₁ ² *cos(θ)+Y ₁ ² *cos(θ)=(X ₁ ² +Y ₁ ² )*cos(θ) (14)

therefore,

cos(θ)=(X ₁ X ₁ ′+Y ₁ Y ₁ ′)/(X ₁ ² +Y ₁ ² ) (15)

After knowing cos(θ), you can obtain θ and sin(θ) according to the relationship of trigonometric functions.

It should be noted that due to the cumulative error in the visual SLAM coordinate system, the first coordinate system is the coordinate system of the projection of the visual SLAM coordinate system on the horizontal plane. Therefore, there is also cumulative error in the first coordinate system. The more inaccurate the coordinates of the point. Therefore, by selecting a position close to the origin (for example, the distance between the point C and the origin is within 30 cm), the obtained coordinates of the robot in the first coordinate system are more accurate, and the θ calculated accordingly is also more accurate.

Step S9: The server sends the second coordinates to the robot. The second coordinate is the coordinate of the target point in the laser SLAM map.

Step S10: the robot determines the movement path according to the second coordinates, and moves to the target point according to the movement path.

Specifically, the robot knows the coordinates of its current position in the second coordinate system, and also knows the coordinates of the target point in the second coordinate system, determines the motion path according to the laser SLAM map, and moves to the target point according to the motion path.

Steps S1-S4 in the robot control method shown in FIG. 5B are the same as steps S1-S4 shown in FIG. 5A. After step S4, the method further includes the following steps S5'-S7'.

Step S5': The terminal receives the user's touch operation, and takes a screenshot of the interface to obtain the target image frame.

The user's finger touches on the screen of the terminal to select a target point, assuming that the touch point (the position where the finger contacts the screen) is point A (see FIG. 4B). Based on the position of point A and the current tilt angle of the terminal, the terminal recognizes the target point in the image (that is, in the indoor three-dimensional space) that the user wants to select. The terminal takes a screenshot of the interface displayed on the screen to obtain the target image frame.

Step S6': The terminal uploads the target image frame to the server.

Step S7': The server extracts feature points from the target image frame, determines the feature data of each feature point, determines the relative position of the target point and each feature point, and finds these features from the visual SLAM map based on the feature data of each feature point The coordinates of the point; and based on the relative position of the target point and each feature point, the coordinates of the target point in the visual SLAM map are obtained. That is, the coordinates of the target point in the first coordinate system are acquired. For the convenience of description, the coordinates are called the first coordinates.

Generally speaking, the number of extracted feature points is greater than 2. FIG. 4B shows some extracted feature points (for example, point B1, point B2, ..., point B9).

After step S7', the robot control method shown in FIG. 5B further includes steps S8-S10. The steps S8-S10 included in the robot control method shown in FIG. 5B are the same as the steps S8-S10 shown in FIG. 5A.

In the process shown in FIG. 5A, the terminal extracts the feature points from the target image frame, and then uploads the feature data of the feature points to the server instead of uploading the target image frame to the server. The advantage of this method is that the server The target image frame cannot be deduced based on the data of the feature points, thereby effectively protecting user privacy.

In the process shown in FIG. 5B, the terminal uploads the target image frame to the server, and the server extracts the feature points in the target image frame. The advantage of this method is that the calculation amount of the terminal is effectively reduced, and the occupation of the calculation resources of the terminal is reduced, so that the configuration requirements of the terminal are lower. Moreover, since the computing power of the server is much stronger than that of the terminal, the computing speed in this way is faster, so the robot responds faster.

Please refer to FIG. 8, which shows a flowchart of a control method for executing a robot provided by an embodiment of the application. It should be noted that the embodiment shown in FIG. 8 is described by taking mobile phones and robot cleaners as examples, and the control method of this application can also be applied to other terminals except mobile phones and robot cleaners, for example, a tablet computer. Control mopping robots or control other movable equipment.

The process of the method shown in Figure 8 can be divided into three parts:

The first part mainly includes that after the user purchases the cleaning robot, before using the cleaning robot to clean, it is necessary to download an APP for controlling the cleaning robot on the mobile phone and associate the mobile phone with the cleaning robot, specifically including steps S101 to S103.

Step S101: The user opens the APP installed on the mobile phone for controlling the sweeping robot.

Step S102: The user associates the mobile phone with the sweeping robot on the APP, and the mobile phone uploads the association relationship to the server. The purpose of associating the mobile phone with the cleaning robot is to make the cleaning robot be controlled by the specific terminal associated with it.

Step S103: The server receives the association relationship between the mobile phone and the cleaning robot uploaded by the mobile phone, and stores the association relationship.

The second part mainly includes the construction of laser SLAM maps and visual SLAM maps for the indoor environment by the sweeping robot. The server stores the visual SLAM maps uploaded by the sweeping robot, which specifically includes steps S201 to S206.

Step S201: The sweeping robot separately establishes a laser SLAM map and a visual SLAM map of the indoor environment.

Step S202: The sweeping robot uploads the visual SLAM map to the server.

Step S203: The cleaning robot uploads the coordinates (X ₁ , Y ₁ ) in the first coordinate system and the coordinates (X ₁ ′, Y ₁ ′) in the second coordinate system of the same point in the indoor environment to the server.

Step S204: The server receives the visual SLAM map uploaded by the cleaning robot, and associates the visual SLAM map with the cleaning robot.

Step S205: The server receives the coordinates (X ₁ , Y ₁ ) in the first coordinate system and the coordinates (X ₁ ′, Y ₁ ′) in the second coordinate system of the same point uploaded by the cleaning robot. The first coordinate system is the projected coordinate system of the coordinate system of the visual SLAM map on the horizontal plane, and the second coordinate system is the coordinate system of the laser SLAM map.

Step S206: The server calculates the horizontal axis of the first coordinate system and the coordinates (X ₁ ′, Y ₁ ′) of the same point in the first coordinate system (X ₁ , Y ₁ ) and the coordinates (X ₁ ′, Y ₁ ′) in the second coordinate system. The included angle θ between the horizontal axes of the second coordinate system, and the included angle θ is associated with the sweeping robot. As mentioned above, the angle between the horizontal axis of the first coordinate system and the horizontal axis of the second coordinate system is θ. The positive direction of the horizontal axis of the first coordinate system is determined by the fixed parameters of the camera. The positive direction of the horizontal axis is determined by the fixed parameters of the lidar. Once the robot is manufactured, θ is determined and is a fixed constant. Generally speaking, because the fixed parameters of the camera and lidar of different types of robots are different, the angle θ corresponding to different types of robots may not be the same. the same.

The purpose of associating the included angle θ with the cleaning robot is to make the server know which included angle θ converts the first coordinate into the second coordinate. For example, as shown in Table 1, the mobile phone P1 and the cleaning robot R1 have an associated relationship; the mobile phone P2 and the cleaning robot R2 have an associated relationship; the mobile phone P3 and the cleaning robot R3 have an associated relationship, as shown in Table 2, the cleaning robot R1 and the included angle θ1 has an associated relationship; the cleaning robot R2 has an associated relationship with the included angle θ2; the cleaning robot R3 has an associated relationship with an included angle θ3. The server stores the above-mentioned association relationship. When the server receives the screen coordinates of the feature point and the screen coordinates of the target point uploaded by the mobile phone P1, it searches for the association relationship stored by itself, and knows that the mobile phone P1 and the cleaning robot R1 have an association relationship. R1 and the included angle θ1 have an associated relationship. In the process of calculating the second coordinate, θ1 is substituted into formula (7), (8), or formula (9). When the server receives the screen coordinates of the feature points uploaded by the mobile phone P3 and the screen coordinates of the target point, it searches for the stored association relationship, and knows that the mobile phone P3 has an association relationship with the cleaning robot R3, and the cleaning robot R3 has an association relationship with the included angle θ3 In the process of calculating the second coordinate, θ3 is substituted into formula (7), (8), or formula (9).

Table 1

Cell phone

Sweeping robot

P1P1	R1R1
P2P2	R2R2
P3P3	R3R3

Table 2

The third part mainly includes the user using the mobile phone to control the sweeping robot to move to the target point for cleaning, specifically including step S301 to step S316.

Step S301: After the robot is associated with the terminal, when the user wants to use the cleaning robot to clean, for example, the user wants the cleaning robot to move to the target point to clean the location of the target point. The user opens the APP on the mobile phone, and the APP calls the mobile phone camera.

Step S302: The camera of the mobile phone collects an image of the current environment, and displays the image on the screen of the mobile phone.

Step S303: The user selects the target point by touching the screen. The user touches a certain point on the phone screen (the point is the touch point).

Referring to Figure 3, the user can tilt the phone at an appropriate angle so that the camera can capture images containing the target point. The current image captured by the camera is displayed on the phone interface. The user touches a certain point on the phone interface to select the target point. Among them, the contact point between the user's finger and the mobile phone screen is called the touch point. The mobile phone can obtain the target point in the actual scene that the user wants to select through the position of the touch point and the current tilt angle of the mobile phone.

Step S304: The mobile phone takes a screenshot of the current screen interface, and the obtained screenshot is the aforementioned target image frame.

Step S305: the mobile phone extracts N feature points in the target image frame. N is a natural number greater than 1.

Step S306: the mobile phone separately determines the feature data of the N feature points and the relative positions of the target point and the N feature points.

Step S307: the mobile phone uploads the feature data of the N feature points and the relative positions of the target point and the N feature points to the server.

Step S308: The server receives the feature data of the N feature points and the relative positions of the target point and the N feature points uploaded by the mobile phone.

It should be noted that the mobile phone can also just upload the target image frame to the server, and the server extracts the characteristic data of the characteristic point, and calculates the relative position between the target point and the characteristic point.

Step S309: The server finds the coordinates of the N feature points from the visual SLAM map according to the feature data of the N feature points. The coordinates of the N feature points refer to the coordinates of the N feature points in the first coordinate system.

Step S310: The server determines the coordinates (X, Y) of the target point in the first coordinate system according to the coordinates of the N feature points and the relative positions of the target point and the N feature points.

Step S311: The server calculates the coordinates of the target point in the second coordinate system according to the formula X'=X*cos(θ)-Y*sin(θ), Y'=X*sin(θ)+Y*cos(θ) (X', Y'), that is, calculate the second coordinate.

Step S312: The server sends the second coordinates (X', Y') to the cleaning robot.

Step S313: The cleaning robot receives the second coordinates (X', Y') sent by the server.

Step S314: The sweeping robot determines the coordinates of its current position in the second coordinate system.

Step S315: the cleaning robot plans a movement path according to the second coordinates (X′, Y′) and the coordinates of the current position of the cleaning robot in the second coordinate system.

Step S316: the sweeping robot controls itself to move to the target point according to the planned movement path.

In an optional manner, the first coordinate is calculated according to the coordinates (X ₁ , Y ₁ ) of the same point in the first coordinate system and the coordinates (X ₁ ′, Y ₁ ′) in the second coordinate system. The angle θ between the horizontal axis of the system and the horizontal axis of the second coordinate system. This step can also be completed by the sweeping robot. The sweeping robot calculates the distance between the horizontal axis of the first coordinate system and the horizontal axis of the second coordinate system. After the included angle θ, the included angle θ is uploaded to the server, and the server can directly associate the included angle θ with the sweeping robot and use it in the process of converting the first coordinate into the second coordinate.

Referring to FIG. 9, a robot control method provided by an embodiment of the present application involves interaction between a mobile phone, a server, and a cleaning robot, and includes the following steps S901 to S907.

S901: The sweeping robot collects environmental information and constructs a laser SLAM map and a visual SLAM map.

S902: The sweeping robot uploads the visual SLAM map to the server.

S903: The mobile phone takes a screenshot of the current interface to obtain a target image frame, and extracts feature points in the target image frame.

Referring to Figure 3, the user can tilt the phone at an appropriate angle so that the camera can capture images containing the target point. The current image captured by the camera is displayed on the phone interface. The user touches a certain point on the phone interface to select the target point. Among them, the contact point between the user's finger and the mobile phone screen is called the touch point. The mobile phone can obtain the target point in the actual scene that the user wants to select through the position of the touch point and the current tilt angle of the mobile phone. The mobile phone takes a screenshot of the current interface to obtain the target image frame. The mobile phone extracts the feature points in the target image frame, and determines the feature data of each feature point and the relative position of the target point and each feature point.

S904: The mobile phone uploads the characteristic data of each characteristic point and the relative position of the target point and each characteristic point to the server.

S905: The server calculates the coordinates of the target point in the first coordinate system, and converts the coordinates of the target point in the first coordinate system into the coordinates of the target point in the second coordinate system.

Specifically, the server finds the coordinates of each feature point from the visual SLAM map according to the feature data of each feature point. The coordinates of each feature point refer to the coordinates of each feature point in the first coordinate system. Calculate the coordinates of the target point in the first coordinate system according to the coordinates of each feature point in the first coordinate system, the target point and the relative position of each feature point, and convert the coordinates of the target point in the first coordinate system to the target point in the first coordinate system. Coordinates in the second coordinate system.

S906: The server delivers the coordinates of the target point in the second coordinate system to the cleaning robot.

S907: The cleaning robot plans a movement path according to the coordinates of the target point in the second coordinate system and the coordinates of its current position in the second coordinate system, and autonomously moves to the target point.

It can be understood that some or all of the steps or operations in the above-mentioned embodiments are only examples, and the embodiments of the present application may also perform other operations or various operation modifications. In addition, each step may be executed in a different order presented in the foregoing embodiment, and it may not be necessary to perform all operations in the foregoing embodiment.

The embodiments of the present application also provide a computer-readable storage medium in which a computer program is stored, and when the computer program is run on a computer, the computer executes the communication method described in the foregoing embodiment.

In addition, the embodiments of the present application also provide a computer program product, which includes a computer program, which when running on a computer, causes the computer to execute the communication method described in the foregoing embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in this application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk).

Claims

A method for controlling a robot, characterized in that it is applied to a system composed of a robot, a terminal, and a server, and the method includes:

The robot creates a visual SLAM map through its own camera, and creates a laser SLAM map through its own lidar;

The robot uploads the coordinate conversion relationship between the first coordinate system and the second coordinate system to the server; wherein, the first coordinate system is the projected coordinate system of the coordinate system of the visual SLAM map on the horizontal plane, and the first coordinate system The two coordinate system is the coordinate system of the laser SLAM map;

The robot uploads the visual SLAM map to the server;

The terminal intercepts the current interface to obtain a target image frame, extracts characteristic data of the target point, and uploads the target image frame and the characteristic data of the target point to the server;

The server receives the target image frame, the characteristic data of the target point, and determines that the target point is at the first coordinate according to the target image frame, the characteristic data of the target point, and the visual SLAM map The first coordinate in the system;

The server converts the first coordinates into second coordinates, where the second coordinates are coordinates of the target point in the second coordinate system;

Sending the second coordinates to the robot by the server;

The robot receives the second coordinates, and determines a movement path according to the second coordinates and the coordinates of the current position of the robot in the second coordinate system;

The robot moves to the target point according to the movement path.
The method according to claim 1, wherein if the origin of the first coordinate system does not coincide with the origin of the second coordinate system, the coordinate conversion relationship includes the axial direction of the first coordinate system The included angle with the axial direction of the second coordinate system, and the relative position of the origin of the first coordinate system and the origin of the second coordinate system.
The method according to claim 1, wherein if the origin of the first coordinate system coincides with the origin of the second coordinate system, the coordinate conversion relationship includes the same point in the first coordinate system. And the coordinates in the second coordinate system, or the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system.
The method according to claim 3, wherein:

The included angle θ between the axial direction of the first coordinate system and the axial direction of the second coordinate system satisfies the following formula: θ=arccos[(X 1 X 1 ′+Y 1 Y 1 ′)/(X 1 2 + Y 1 2 )], where (X 1 , Y 1 ), (X 1 ', Y 1 ') respectively represent the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system .
The method according to any one of claims 1 to 4, wherein the server converting the first coordinate to the second coordinate comprises:

According to the relative position of the origin of the first coordinate system and the origin of the second coordinate system, and the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system, the server calculates The first coordinate is converted into the second coordinate.
The method according to claim 5, wherein the origin of the first coordinate system coincides with the origin of the second coordinate system, and the first coordinate and the second coordinate satisfy the following formula:

X'=X*cos(θ)-Y*sin(θ), Y'=X*sin(θ)+Y*cos(θ); where (X', Y') represents the second coordinate, (X, Y) represents the first coordinate, and θ represents the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system.
A method for controlling a robot, characterized in that it is applied to a robot including a camera and a lidar, and the method includes:

The robot creates a visual SLAM map through the camera, and creates a laser SLAM map through the lidar;

The robot uploads the coordinate conversion relationship between the first coordinate system and the second coordinate system to the server; wherein, the first coordinate system is the projected coordinate system of the coordinate system of the visual SLAM map on the horizontal plane, and the second coordinate Is the coordinate system of the laser SLAM map;

The robot uploads the visual SLAM map to the server;

The robot receives the second coordinate obtained by the server based on the visual SLAM map and the coordinate conversion relationship between the first coordinate system and the second coordinate system, where the second coordinate is the target point in the first coordinate system. Coordinates in the two coordinate system;

Determining the movement path of the robot according to the second coordinates and the coordinates of the current position of the robot in the second coordinate system;

The robot moves to the target point according to the movement path.
7. The method according to claim 7, wherein if the origin of the first coordinate system does not coincide with the origin of the second coordinate system, the coordinate conversion relationship includes the axial direction of the first coordinate system. The included angle with the axial direction of the second coordinate system, the relative position of the origin of the first coordinate system and the origin of the second coordinate system.
The method according to claim 7, wherein if the origin of the first coordinate system coincides with the origin of the second coordinate system, the coordinate conversion relationship includes that the same point is in the first coordinate system. And the coordinates in the second coordinate system, or the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system.
The method according to claim 9, wherein:

The included angle θ between the axial direction of the first coordinate system and the axial direction of the second coordinate system satisfies the following formula: θ=arccos[(X 1 X 1 ′+Y 1 Y 1 ′)/(X 1 2 + Y 1 2 )], where (X 1 , Y 1 ), (X 1 ', Y 1 ') respectively represent the coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system .
A method for controlling a robot, characterized in that it is applied to a terminal, and the method includes:

Acquiring the current environment image by the terminal;

The terminal receives the user's operation;

In response to the user's operation, the terminal obtains a target image frame and a target point located in the target image frame according to the current environment image;

The terminal determines the characteristic data of the target point;

The terminal uploads the target image frame and the characteristic data of the target point to the server.
The method of claim 11, wherein the method further comprises:

The terminal extracts feature points from the target image frame according to a preset feature extraction algorithm;

The terminal determines the characteristic data of the characteristic point;

The terminal uploads the characteristic data of the characteristic point to the server.
The method of claim 12, wherein the method further comprises:

The terminal determines the screen coordinates of the target point and the screen coordinates of the feature point;

The terminal determines the relative position of the target point and the characteristic point according to the screen coordinates of the target point and the screen coordinates of the characteristic point, as well as the characteristic data of the target point and the characteristic data of the characteristic point ；

The terminal uploads the relative position of the target point and the characteristic point to the server.
A method for controlling a robot, characterized in that it is applied to a server, and the method includes:

The server receives the visual SLAM map sent by the robot;

The server receives the coordinate conversion relationship between the first coordinate system and the second coordinate system sent by the robot; wherein, the first coordinate system is a projected coordinate system of the coordinate system of the visual SLAM map on a horizontal plane, and The second coordinate system is the coordinate system of the laser SLAM map acquired by the robot; the visual SLAM map and the laser SLAM map are created by the robot in the same environment;

The server receives the target image frame and the characteristic data of the target point uploaded by the terminal;

Determining, by the server, the first coordinate of the target point in the first coordinate system according to the target image frame, the characteristic data of the target point, and the visual SLAM map;

The server converts the first coordinates into second coordinates, where the second coordinates are coordinates of the target point in the second coordinate system;

The server sends the second coordinates to the robot.
The method according to claim 14, wherein the server determines the position of the target point in the first coordinate system according to the target image frame, the characteristic data of the target point, and the visual SLAM map. The first coordinates include:

Acquiring, by the server, characteristic data of characteristic points in the target image frame;

The server matches the feature data of the feature point with the feature data in the visual SLAM map, and determines the coordinates of the feature point in the first coordinate system;

Determining the relative position of the target point and the characteristic point by the server;

The server determines the first coordinate of the target point in the first coordinate system according to the coordinates of the characteristic point in the first coordinate system, and the relative position of the target point and the characteristic point .
The method according to claim 15, wherein the server determining the relative position of the target point and the characteristic point comprises:

The server receives the screen coordinates of the target point and the screen coordinates of the feature point, and the feature data of the target point and the feature data of the feature point uploaded by the terminal;

The server determines the relative position of the target point and the characteristic point according to the screen coordinates of the target point and the screen coordinates of the characteristic point, and the characteristic data of the target point and the characteristic data of the characteristic point .
The method according to any one of claims 14 to 16, wherein the server converting the first coordinate into the second coordinate comprises:

According to the relative position of the origin of the first coordinate system and the origin of the second coordinate system, and the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system, the server calculates The first coordinate is converted into the second coordinate.
The method according to claim 17, wherein the origin of the first coordinate system coincides with the origin of the second coordinate system, and the first coordinate and the second coordinate satisfy the following formula:

X'=X*cos(θ)-Y*sin(θ), Y'=X*sin(θ)+Y*cos(θ); where (X', Y') represents the second coordinate, (X, Y) represents the first coordinate, and θ represents the angle between the axial direction of the first coordinate system and the axial direction of the second coordinate system.
The method according to claim 18, wherein θ is determined by the following formula:

θ=arccos[(X 1 X 1 ′+Y 1 Y 1 ′)/(X 1 2 +Y 1 2 )], where (X 1 ,Y 1 ), (X 1 ',Y 1 ') respectively represent The coordinates of the same point in the first coordinate system and the coordinates in the second coordinate system.
A robot includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and when the program instructions are loaded and executed by the processor, the robot executes The method of any one of claims 7 to 10.
A terminal includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are loaded and executed by the processor to cause the terminal to execute The method of any one of claims 11 to 13.
A server includes a memory and a processor. The memory is used to store information including program instructions. The processor is used to control the execution of the program instructions. When the program instructions are loaded and executed by the processor, the server executes The method of any one of claims 14 to 19.
A control system, characterized in that the control system includes the robot according to claim 20, the terminal according to claim 21, and the server according to claim 22.