CN115063627A

CN115063627A - Robot scene recognition method and device and electronic equipment

Info

Publication number: CN115063627A
Application number: CN202210603699.5A
Authority: CN
Inventors: 龚汉越; 支涛
Original assignee: Henan Yunji Intelligent Technology Co Ltd
Current assignee: Henan Yunji Intelligent Technology Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-09-16

Abstract

The disclosure relates to the technical field of intelligent robots, and provides a robot scene recognition method and device and electronic equipment. The method comprises the following steps: acquiring a scene image around the robot; inputting the scene image into a preset deep learning network, and obtaining at least one scene characteristic graph of the scene image at the output of the deep learning network; and classifying at least one scene feature map by using a scene recognition model obtained by pre-training, and determining a classification result of the scene feature map, wherein the classification result comprises the scene category around the robot. According to the method, the scene features of the scene image are extracted firstly, and then the scene features are classified, so that the interference of other scene contents in the scene image can be avoided, the model can be accurately identified according to the scene feature map, the data calculation amount of the model is reduced, and the effect that the robot can quickly and accurately identify the surrounding scene categories is achieved.

Description

Robot scene recognition method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of intelligent robots, in particular to a robot scene recognition method and device and electronic equipment.

Background

When the intelligent robot is applied to carry out the transportation work in the scenes such as hotels, buildings and the like, the scene types around the intelligent robot need to be continuously calculated and confirmed so as to control the robot to smoothly carry out the transportation work. Because the image computing power of a single intelligent robot is limited, if the intelligent robot is only relied on to calculate the image data, on one hand, the computing power resources of the intelligent robot may not be supported or the computing power burden is too heavy, and on the other hand, the situation that the result is delayed due to low computing efficiency may exist.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a robot scene identification method, apparatus, and electronic device, so as to solve the problem in the prior art that the efficiency of calculating and confirming the surrounding scene category by an intelligent robot is not high.

In a first aspect of the embodiments of the present disclosure, a robot scene recognition method is provided, including:

acquiring a scene image around the robot;

inputting the scene image into a preset deep learning network, and obtaining at least one scene characteristic graph of the scene image at the output of the deep learning network;

and classifying at least one scene feature map by using a scene recognition model obtained by pre-training, and determining a classification result of the scene feature map, wherein the classification result comprises the scene category around the robot.

In a second aspect of the embodiments of the present disclosure, there is provided a robot scene recognition apparatus, including:

an image acquisition module configured to acquire an image of a scene around the robot;

the feature extraction module is configured to input the scene image into a preset deep learning network, and obtain at least one scene feature map of the scene image at the output of the deep learning network;

the scene recognition module is configured to classify at least one scene feature map by using a scene recognition model obtained through pre-training, and determine a classification result of the scene feature map, wherein the classification result comprises scene categories around the robot.

In a third aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: the method comprises the steps of firstly extracting scene feature maps of scene images around the robot by using a deep learning network to obtain the surrounding scene feature maps, then classifying the scene feature maps by using a scene recognition model to determine the scene types around the robot, and by the method of firstly extracting the scene features of the scene images and then classifying the scene features, the interference of other scene contents in the scene images can be avoided, so that the model can accurately recognize the scene feature maps, the data calculation amount of the model is reduced, and the effect that the robot can quickly and accurately recognize the surrounding scene types is realized.

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a robot scene recognition method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a robot scene recognition apparatus provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A robot scene recognition method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scene may include scene images 1, 2, and 3, a robot 4, an artificial intelligence model 5, and a network 6.

The scene images 1, 2, and 3 include images of the surroundings of the robot 4, and the scene images 1, 2, and 3 are images of the surroundings taken by the robot 4 in different directions, for example.

The robot 4 may be a wheeled robot capable of intelligent movement. Here, the robot 4 may include a vision system and a computing system, the vision system is capable of acquiring scene images 1, 2 and 3 around the robot 4, and then the scene images 1, 2 and 3 are analyzed and processed by the computing system to obtain a scene category around the robot. In addition, the robot 4 may also send the scene images 1, 2, and 3 collected by the vision system to the artificial intelligence model 5, and after the artificial intelligence model 5 analyzes and processes the scene images 1, 2, and 3, return the processing result (i.e., the scene type around the robot) to the robot 4, or may also simply calculate the collected scene images 1, 2, and 3 by the computing system in the robot 4 to obtain an intermediate result, send the intermediate result to the artificial intelligence model 5 for analysis and processing, and finally return the processing result to the robot 4. Of course, the robot 4 may also be implemented as a mobile robot with other structures, which is not limited by the embodiment of the present disclosure.

The artificial intelligence model 5 may be a machine learning model, a deep learning network, or the like, which is trained in advance using data samples. For example, a large number of scene images around the robot are collected in advance, the scene categories of the scene images are marked, and then the marked scene images are used as samples to train a machine model or a neural network, so as to obtain a target model or a target network.

Specifically, the artificial intelligence model 5 may be hardware or software. When the artificial intelligence model 5 is hardware, it can be various electronic devices that provide computational services for the scene images 1, 2, and 3. When the artificial intelligence model 5 is software, it may be a single software or software module providing intelligent recognition services for the scene images 1, 2 and 3 captured by the robot 4, and the single software or software module may be disposed in an electronic device such as the robot 4, a server (not shown in fig. 1) or a workstation (not shown in fig. 1); the artificial intelligence model 5 may also be a plurality of software or software modules providing intelligent recognition services for the scene images 1, 2 and 3 captured by the robot 4, and the plurality of software or software modules may be respectively arranged in different electronic devices such as the robot 4 and a server, and used for providing services for the scene images 1, 2 and 3 in steps.

The network 6 may be a wired network connected by a coaxial cable, a twisted pair cable, and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, Bluetooth (Bluetooth), Near Field Communication (NFC), Infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.

In the application scenario shown in fig. 1, the robot 4 may send the collected scene images 1, 2, and 3 to the artificial intelligence model 5 via the network 6, and the artificial intelligence model 5 performs scene feature image extraction and scene feature image classification on the scene images 1, 2, and 3 to obtain scene recognition results of the scene images 1, 2, and 3, and then returns the scene recognition results to the robot 4, so that image calculation power of the robot 4 is not required to be occupied, and efficiency of scene recognition around the robot 4 can be effectively improved.

It should be noted that the specific types, numbers and combinations of the scene images 1, 2 and 3, the robot 4, the artificial intelligence model 5 and the network 6 may be adjusted according to the actual requirements of the application scene, and the embodiment of the present disclosure does not limit this.

Fig. 2 is a schematic flowchart of a robot scene recognition method according to an embodiment of the present disclosure. The robot scene recognition method of fig. 2 may be performed by an electronic device arranged with the artificial intelligence model 5 of fig. 1. As shown in fig. 2, the robot scene recognition method includes:

s201, acquiring a scene image around the robot;

s202, inputting the scene image into a preset deep learning network, and obtaining at least one scene characteristic graph of the scene image at the output of the deep learning network;

s203, classifying at least one scene feature map by using a scene recognition model obtained through pre-training, and determining a classification result of the scene feature map, wherein the classification result comprises scene categories around the robot.

Specifically, the scene image may be a two-dimensional image or a three-dimensional image. In the embodiment of the present disclosure, the scene image is preferably a three-dimensional image, for example, the scene image is point cloud data or an RGB image around the robot.

The deep learning network may be a feature image extraction network obtained by pre-using scene images around the robot as samples, labeling scene features in the scene images, and then training the labeled samples, and the feature extraction network is arranged on the robot or other electronic devices connected to the robot and is used for extracting a scene feature map from the scene images around the robot.

Similarly, the scene recognition model may be a machine learning model obtained by training in advance using a scene image around the robot as a sample, but the content of labeling the sample is different, the deep learning network labels the scene feature image, and the scene recognition model labels the scene type corresponding to the scene image. Also, the scene recognition model may be disposed on the same electronic device as the deep learning network, or may be disposed on a different electronic device. For example, in some embodiments, the deep learning network may be disposed on a robot, and the scene recognition model may be disposed on a background server connected to the robot, where the background server undertakes the main image computation, thereby reducing the computational power requirements of the robot and also improving the efficiency of the robot in recognizing the surrounding scene categories.

According to the technical scheme provided by the embodiment of the disclosure, the scene feature map extraction is performed on the scene image around the robot by using the deep learning network to obtain the surrounding scene feature map, then the scene feature map is classified by using the scene recognition model to determine the scene category around the robot, the interference of other scene contents in the scene image can be avoided by using the mode of performing the scene feature extraction on the scene image and then classifying the scene feature, so that not only can the model accurately recognize the scene feature map, but also the data calculation amount of the model is reduced, and the effect that the robot can quickly and accurately recognize the surrounding scene category is realized.

Specifically, when a scene image is input to the deep learning network as an input amount to extract a scene feature map, the number of scene feature maps obtained by output of the deep learning network may be one or more.

In the case that at least two scene feature maps of a scene image are obtained from the output of the deep learning network, according to the method provided in fig. 2, the method of classifying the scene feature maps by using a scene recognition model obtained by pre-training to determine the classification result of the scene feature maps includes:

classifying at least two scene characteristic graphs by using a scene recognition model obtained by pre-training to obtain a classification result of each scene characteristic graph;

detecting whether the classification results of the scene feature graphs are consistent or not:

if not, calculating the proportion of different classification results, and determining the classification result with the highest proportion as the scene recognition result of the scene image acquired by the robot at this time;

and if the scene identification result is consistent with the classification result, taking the classification result as the scene identification result of the robot.

For the case that the at least two scene feature maps result in different classification results at the output of the scene recognition model, a target classification result that may be closest to the real scene needs to be determined from the different classification results. Therefore, in the embodiment of the present disclosure, the ratio of each classification result is calculated, and one classification result with a larger ratio of classification results is selected as the target classification result. For example, if two scene feature maps are input to the scene recognition model, and if the classification result of one of the scene feature maps is a and the classification result of the other scene feature map is B, the ratio of the two classification results a and B is 1/2, at this time, it is possible to compare whether the classification result of the scene image around the robot at the previous time is the classification result a or B, determine that the classification result at the current time is B if the classification result at the previous time is a, and determine that the classification result at the current time is a if the classification result at the previous time is B. For another example, if three scene feature maps are input into the scene recognition model, and the classification result of the first scene feature map is a, the classification result of the second scene feature map is B, and the classification result of the third scene feature map is a, the proportion of the classification result a is 2/3, and the proportion of the classification result B is 1/3, according to the technical solution provided by the present disclosure, the classification result a is taken as the target classification result. Similarly, if more than three scene feature maps are input into the scene recognition model, if different classification results are obtained, the target classification result may be determined in combination with the foregoing example.

In addition, under the condition that a plurality of scene feature maps are input into the scene recognition model to obtain different classification results, the scene feature maps and the corresponding classification results can be collected and stored as a sample, then the re-collected sample is labeled manually, namely the actual scene categories of the scene feature maps are labeled, the classification results of the scene feature maps output by the scene recognition model are retained, and then the scene recognition model is retrained by using the labeled sample to obtain a more accurate scene recognition model, so that the recognition accuracy is improved.

In the case that a scene feature map of a scene image is obtained at the output of the deep learning network, according to the method provided in fig. 2, the method of classifying the scene feature map by using a scene recognition model obtained through pre-training and determining the classification result of the scene feature map includes: and inputting a scene characteristic diagram into a scene recognition model obtained by pre-training, and obtaining a corresponding classification result from the output of the scene recognition model.

And under the condition that only one scene feature map exists, directly inputting the scene feature map into the scene recognition model, and taking a classification result obtained by the output of the scene recognition model as a target classification result.

In some embodiments, when the robot operates in a hotel, the classification result includes at least one of a scene category within a hotel lobby, outside the hotel lobby, a corridor, within an elevator, and within a room. Of course, if the robot works in other scenes, the scene categories may also be actually classified according to the other scenes, which is not limited in the embodiment of the present disclosure.

According to the robot scene recognition method provided by some embodiments, if scene images are continuously acquired for scene recognition, a large amount of computational resources are consumed. Therefore, in the following embodiments, the manner of acquiring the scene image around the robot is optimized, and the scene image for scene recognition is acquired in combination with the working state of the robot, so as to avoid excessive occupation of image computational resources due to continuous acquisition of the scene image.

In some embodiments, on the basis of some of the above embodiments, after determining the scene category around the robot, the method further includes:

detecting the working state of the robot;

and determining an acquisition mode of a scene image for robot scene recognition based on the working state.

Specifically, the working state of the robot may include at least one of a task state, a fault state, a charging state, and a standby state. The robot is in different working states, and the requirements for identifying the surrounding scene categories may be different, so that the embodiment of the disclosure determines the manner in which the robot acquires the scene images by acquiring the working states of the robot, so as to overcome the problem that the image calculation resources are excessively occupied due to the fact that the scene images are continuously acquired.

Next, in some embodiments, determining an acquisition mode of a scene image for robot scene recognition includes:

under the condition that the robot is detected to be in a task state, controlling the robot to periodically collect surrounding images, wherein the images comprise at least one frame of RGB scene image;

under the condition that the robot is detected to be in a fault state or a standby state, controlling the robot to collect images around for one time, wherein the images comprise multi-frame RGB scene images;

and controlling the robot to stop collecting surrounding images under the condition that the robot is detected to be in a charging state.

Specifically, when the robot is in the task state, the main task is the movement task, that is, the robot is in the movement state for a long time. Therefore, the scene around the robot may change continuously, so that the surrounding scene image may be periodically acquired to identify the scene type. The specific time of the period may be a preset time interval, or may also be a new time interval obtained by readjusting the current period according to the determined scene category, which is not limited in this embodiment of the disclosure.

Specifically, when the robot is in a fault or standby state, the robot generally stops moving, and the surrounding scene category is fixed. Therefore, only one time of scene image acquisition is needed for scene type identification, and occupation of image computational resources can be effectively reduced. Similarly, when the robot is in a charging state, the surrounding scene type is fixed or known in advance, that is, the robot generally charges at a preset position, so that the acquisition of the surrounding scene image can be stopped, and the scene type can be identified without occupying image resources.

In addition, in some embodiments, for the robot to capture images of the surroundings in different working states, if the images include multiple frames of RGB scene images, the multiple frames of RGB scene images respectively correspond to images captured by a vision system preset on the robot in different directions of the surroundings at the same time.

Specifically, in the application scenario shown in fig. 1, a vision system is preset on the robot 4, and the vision system can photograph the surroundings of the robot in different directions, so that when an image of the scene around the robot needs to be acquired, the vision system can be controlled to photograph images in multiple different directions around the robot at the same time, so as to obtain multiple frames of RGB scene images, such as scene images 1, 2, and 3 shown in fig. 1. The shooting device for the vision system to shoot the multi-frame RGB scene image may be a monocular camera or a binocular camera, which is not limited by the embodiments of the present disclosure.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a robot scene recognition device according to an embodiment of the present disclosure. As shown in fig. 3, the robot scene recognition apparatus includes:

an image acquisition module 301 configured to acquire an image of a scene around the robot;

the feature extraction module 302 is configured to input the scene image into a preset deep learning network, and obtain at least one scene feature map of the scene image at the output of the deep learning network;

the scene recognition module 303 is configured to classify at least one scene feature map by using a pre-trained scene recognition model, and determine a classification result of the scene feature map, where the classification result includes a scene category around the robot.

According to the technical scheme provided by the embodiment of the disclosure, the scene feature map extraction is performed on the scene image around the robot by using the deep learning network to obtain the surrounding scene feature map, then the scene feature map is classified by using the scene recognition model to determine the scene category around the robot, the scene feature extraction is performed on the scene image firstly, and then the interference of other scene contents in the scene image can be avoided by using the mode of classifying the scene feature, so that the model can accurately recognize the scene feature map, the data calculation amount of the model is reduced, and the effect that the robot can quickly and accurately recognize the surrounding scene category is realized.

In some embodiments, when the output of the deep learning network obtains at least two scene feature maps of a scene image, the scene recognition module 303 in fig. 3 uses a scene recognition model obtained through pre-training to classify the at least two scene feature maps respectively, so as to obtain a classification result of each scene feature map; detecting whether the classification results of the scene feature graphs are consistent or not: if not, calculating the proportion of different classification results, and determining the classification result with the highest proportion as the scene recognition result of the scene image acquired by the robot at this time; and if the scene identification result is consistent with the classification result, taking the classification result as the scene identification result of the robot.

In some embodiments, in the case that the output of the deep learning network obtains one scene feature map of the scene image, the scene recognition module 303 in fig. 3 inputs the one scene feature map into a scene recognition model obtained by pre-training, and obtains a corresponding classification result at the output of the scene recognition model.

In some embodiments, when the robot operates in a hotel, the classification result includes at least one of a scene category within a hotel lobby, outside the hotel lobby, corridor, within an elevator, and within a room.

In some embodiments, after determining the scene category around the robot, the robot scene recognition apparatus further includes:

an operating state detection module 304 configured to detect an operating state of the robot;

an image acquisition mode module 305 configured to determine an acquisition mode of a scene image for robot scene recognition based on the operating state.

In some embodiments, the operating state includes at least one of a task state, a fault state, a charge state, and a standby state.

In some embodiments, the image obtaining manner module 305 in fig. 3 controls the robot to periodically collect surrounding images in the case that it is detected that the robot is in the task state, where the images include at least one frame of RGB scene images; under the condition that the robot is detected to be in a fault state or a standby state, controlling the robot to collect images around for one time, wherein the images comprise multi-frame RGB scene images; and controlling the robot to stop collecting surrounding images under the condition that the robot is detected to be in a charging state.

In some embodiments, in the case where the image includes a plurality of frames of RGB scene images, the plurality of frames of RGB scene images respectively correspond to images taken by a vision system preset on the robot in different directions around the same time.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of an electronic device 400 provided by an embodiment of the disclosure. The electronic device 400 here may be a server, or a computer, etc. arranged with the artificial intelligence model 5 in fig. 1. As shown in fig. 4, the electronic apparatus 400 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

The electronic device 400 may be a desktop computer, a notebook, a palm top computer, a cloud server, or other electronic devices. The electronic device 400 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of an electronic device 400 and does not constitute a limitation of electronic device 400 and may include more or fewer components than shown, or different components.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like.

The storage 402 may be an internal storage unit of the electronic device 400, for example, a hard disk or a memory of the electronic device 400. The memory 402 may also be an external storage device of the electronic device 400, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 400. The memory 402 may also include both internal and external storage units of the electronic device 400. The memory 402 is used for storing computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and they should be construed as being included in the scope of the present disclosure.

Claims

1. A robot scene recognition method is characterized by comprising the following steps:

acquiring a scene image around the robot;

inputting the scene image into a preset deep learning network, and obtaining at least one scene characteristic map of the scene image at the output of the deep learning network;

and classifying the at least one scene feature map by using a scene recognition model obtained by pre-training, and determining a classification result of the scene feature map, wherein the classification result comprises scene categories around the robot.

2. The method according to claim 1, wherein in a case that the output of the deep learning network obtains at least two scene feature maps of the scene image, the classifying the scene feature maps by using a scene recognition model obtained through pre-training to determine a classification result of the scene feature maps comprises:

classifying the at least two scene feature maps respectively by using a scene recognition model obtained by pre-training to obtain a classification result of each scene feature map;

detecting whether the classification results of the scene feature maps are consistent or not:

if the scene images are inconsistent, calculating the occupation ratios of different classification results, and determining the classification result with the highest occupation ratio as the scene recognition result of the scene image acquired by the robot at this time;

3. The method according to claim 1, wherein in a case that an output of the deep learning network obtains a scene feature map of the scene image, the classifying the scene feature map by using a scene recognition model obtained through pre-training to determine a classification result of the scene feature map comprises:

and inputting the scene characteristic diagram into a scene recognition model obtained by pre-training, and obtaining a corresponding classification result at the output of the scene recognition model.

4. The method of claim 1, wherein the classification results include at least one of scene categories within a hotel lobby, outside a hotel lobby, corridor, elevator, and room when the robot is operating in a hotel.

5. The method of any of claims 1-4, further comprising, after determining the category of the scene surrounding the robot:

detecting the working state of the robot;

6. The method of claim 5, wherein the operating state comprises at least one of a task state, a fault state, a charge state, and a standby state.

7. The method of claim 6, wherein determining a manner of acquiring a scene image for robot scene recognition based on the operating state comprises:

under the condition that the robot is detected to be in a fault state or a standby state, controlling the robot to collect surrounding images for one time, wherein the images comprise multi-frame RGB scene images;

8. The method according to claim 7, characterized in that, in the case where the image comprises a plurality of frames of RGB scene images, the plurality of frames of RGB scene images respectively correspond to images taken by a vision system preset on the robot at the same time and in different directions around.

9. A robot scene recognition apparatus, comprising:

the scene recognition module is configured to classify the at least one scene feature map by using a scene recognition model obtained through pre-training, and determine a classification result of the scene feature map, wherein the classification result comprises a scene category around the robot.

10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 8 when executing the computer program.