US20230030837A1 - Human-object scene recognition method, device and computer-readable storage medium - Google Patents
Human-object scene recognition method, device and computer-readable storage medium Download PDFInfo
- Publication number
- US20230030837A1 US20230030837A1 US17/386,531 US202117386531A US2023030837A1 US 20230030837 A1 US20230030837 A1 US 20230030837A1 US 202117386531 A US202117386531 A US 202117386531A US 2023030837 A1 US2023030837 A1 US 2023030837A1
- Authority
- US
- United States
- Prior art keywords
- humans
- bounding boxes
- objects
- detected
- detected objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000001514 detection method Methods 0.000 claims abstract description 43
- 230000004044 response Effects 0.000 claims abstract description 33
- 230000011218 segmentation Effects 0.000 claims abstract description 22
- 238000007635 classification algorithm Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 241000282412 Homo Species 0.000 description 88
- 238000004590 computer program Methods 0.000 description 26
- 230000008569 process Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000004984 smart glass Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G06K9/00664—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G06K9/00362—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present disclosure generally relates to field of object recognition, and particularly to a human-object scene recognition method, device and computer-readable storage medium.
- Scene understanding is a deeper level of object detection, recognition and reasoning based on image analysis. On the basis of image understanding, image data is processed to obtain an understanding of the content of the scene reflected in the image.
- FIG. 1 is a schematic diagram of a robot according to one embodiment.
- FIG. 2 is a schematic block diagram of the robot according to one embodiment.
- FIG. 3 shows an image of an exemplary scene including a person standing away from a chair.
- FIG. 4 shows an image of an exemplary scene including a person sitting on a chair.
- FIG. 5 shows an image of an exemplary scene including a bed and a chair standing away from the bed.
- FIG. 6 shows an image of an exemplary scene including a bed and a nightstand near the bed.
- FIG. 7 shows an image of an exemplary scene including a table and two chairs.
- FIG. 8 is an exemplary flowchart of a human-object scene recognition method according to one embodiment.
- FIG. 9 is an exemplary flowchart of a human-object scene recognition method according to another embodiment.
- FIG. 10 is an exemplary flowchart of step S 98 of the method of FIG. 9 .
- FIG. 11 is a processing logic flowchart of computer programs in a method for a robot to recognize a human-object scene.
- FIG. 12 is schematic block diagram of a human-object recognition device according to one embodiment.
- FIG. 1 is a schematic diagram of a robot 10 according to one embodiment.
- FIG. 2 is a schematic block diagram of the robot 10 according to one embodiment.
- the robot 10 may be a mobile robot (e.g., wheeled robot).
- the robot 10 can operate in various application environments, such as hospitals, factories, warehouse, malls, streets, airports, home, elder care centers, museums, restaurants, hotels, and even wild fields, etc.
- FIG. 1 is merely an illustrative example.
- the robot 10 may be other types of robots.
- the robot 10 may include a camera 101 , an actuator 102 , a mobility mechanism 103 , a processor 104 , a storage 105 , and a communication interface module 106 .
- the camera 101 may be, for example, an RGB-D three-dimensional sensor arranged on the body of the robot 10 .
- the camera 101 is electrically connected to the processor 104 for transmitting the captured image data to the processor 104 .
- the actuator 102 may be a motor or a servo.
- the mobility mechanism 103 may include one or more wheels and/or tracks, and wheels are illustrated in FIG. 1 as an example.
- the actuator 102 is electrically coupled to the mobility mechanism 103 and the processor 104 , and can actuate movement of the mobility mechanism 103 according to commands from the processor 104 .
- the storage 105 may include a non-transitory computer-readable storage medium.
- One or more executable computer programs 107 are stored in the storage 105 .
- the processor 104 is electrically connected to the storage 105 , and perform corresponding operations by executing the executable computer programs stored in the storage 105 .
- the communication interface module 106 may include a wireless transmitter, a wireless receiver, and computer programs executable by the processor 104 .
- the communication interface module 106 is electrically connected to the processor 104 and is configured for communication between the processor 104 and external devices.
- the camera 101 , the actuator 102 , the mobility mechanism 103 , the processor 104 , the storage 105 , and the communication interface module 106 may be connected to one another by a bus.
- steps S 81 through S 86 in FIG. 8 steps S 91 through S 98 in FIG. 9 , and steps S 981 through S 987 in FIG. 9 , are implemented.
- the processor 104 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable logic device, a discrete gate, a transistor logic device, or a discrete hardware component.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the storage 105 may be an internal storage unit of the robot 10 , such as a hard disk or a memory.
- the storage 105 may also be an external storage device of the robot 10 , such as a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD) card, or any suitable flash cards.
- the storage 105 may also include both an internal storage unit and an external storage device.
- the storage 105 is used to store computer programs, other programs, and data required by the robot.
- the storage 105 can also be used to temporarily store data that have been output or is about to be output.
- the one or more computer programs 107 may be divided into one or more modules/units, and the one or more modules/units are stored in the storage 105 and executable by the processor 104 .
- the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the one or more computer programs 107 in the robot 10 .
- the one or more computer programs 112 may be divided into an acquiring unit, a detecting unit, a recognition unit and a control unit.
- the acquiring unit is configured to acquire an input RGB image and a depth image corresponding to the RGB image.
- the detecting module is configured to detect objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- the recognizing unit is configured to, in response to detection of objects and/or humans, determine a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans.
- the control unit is configured to control the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans.
- a method for a robot to recognize a human-object scene allow a robot to automatically set a target position and navigate while avoiding collisions.
- the method can also provide application scenarios such as whether a target object is in the scene, the position of the target object, and semantic information about whether human/other humans are near the target object.
- an RGB image and a corresponding depth image are inputted.
- the RGB image would go through a segmentation classification algorithm first for the detection of common objects and humans in the scene. Before final 3D bounding boxes are generated, it needs to detect whether separate segments should be merged as one object.
- Final information of 3D bounding boxes of each detected object/human is generated and set as independent output, which can be directly used for robotics target position set up and/or collision avoiding in navigating process under needed situations.
- a customer assigned object(s) of interest can be taken as the target object(s) for the calculation as human-object or object-object relationship.
- the analysis of whether the detected object/human is near the target object(s) can only be performed when target object(s) (and person if only one object defined) are present in the scene.
- a stereo based calculation step is performed for the “near” check.
- An output of whether the person is near the target object(s) or whether two or more target objects are near each other would be generated.
- this human/object-environment interaction information a guide for the robot-human-environment interaction can be achieved.
- FIGS. 3 and 4 show images containing a person and a chair, which are taken by the camera 101 of the robot which contain.
- the person is standing away from the chair, and in FIG. 4 , the person is standing behind the chair.
- the upper left corner shows the recognition results of a target object (i.e., the chair) present in the scene, and information about whether the person is near the target object.
- FIGS. 5 - 7 show images containing a person and a chair, which are taken by the camera 101 of the robot which contain.
- FIG. 3 the person is standing away from the chair
- FIG. 4 the person is standing behind the chair.
- the upper left corner shows the recognition results of a target object (i.e., the chair) present in the scene, and information about whether the person is near the target object.
- FIGS. 5 - 7 shows two chairs near the table.
- the upper left corner shows the recognition results of target objects and information about whether the target objects are near each other.
- the recognized humans/objects in each image are surrounded by 3D bounding boxes.
- the recognized human and chair in FIGS. 3 and 5 - 7 are surrounded by 3D bounding boxes 301 , 302 , 501 , 502 , 601 , 602 , 701 , 702 , and 703 .
- the 2D bounding boxes surrounding the recognized human/chair in FIG. 4 are only for representation purpose.
- the robot captures images through the camera 101 while moving, and sends the captured images to the processor 104 .
- the processor 104 processes the captured images by executing executable computer programs 107 to complete the recognition of the human-object scene. Specifically, the processing process is as follows: acquiring an input RGB image and a depth image corresponding to the RGB image; detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database; and in response to detection of objects and/or humans, determining a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans.
- FIG. 8 shows an exemplary flowchart of a method for recognizing a human-object scene according to one embodiment.
- the method can be implemented to control the movement of the robot 10 shown in FIGS. 1 and 2 , and can be specifically implemented the robot 10 shown in FIG. 2 or other control devices electrically coupled to the robot 10 .
- the control devices may include, but are not limited to: desktop computers, tablet computers, laptop computers, multimedia players, servers, smart mobile devices (such as smart phones, handheld phones, etc.) and smart wearable devices (such as smart watches, smart glasses, smart cameras, smart bands, etc.) and other computing devices with computing and control capabilities.
- the method may include steps S 81 to S 86 .
- Step S 81 Acquiring an input RGB image and a depth image corresponding to the RGB image.
- the RGB-D three-dimensional sensor equipped on the robot 10 captures the scene image in front of the robot to obtain the RGB image and the depth image corresponding to the RGB image.
- Step S 82 Detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- the segmentation detection of the image is to detect the objects and humans in the input single RGB image by using a deep learning method. It should be noted that there may be only objects in the RGB image, only humans in the RGB image, or humans and objects in the RGB image. In one embodiment, the objects and humans refer to common objects and humans that are objects and humans in the ordinary sense and do not specifically refer to certain persons or certain objects. The image characteristics of various common objects and humans that may appear in each scene are pre-stored, which can serve as a basis for determining the characteristics of common objects and humans in image detection.
- Step S 83 In response to detection of objects and/or humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image, and acquiring a result of the segment detection.
- the depth values of the pixels of each segment can be used for three-dimensional coordinate calculation.
- the depth values can be obtained from the depth image corresponding to the RGB image.
- Step S 84 Calculating 3D bounding boxes for each of the detected objects and/or humans according to the result of the segment detection.
- Step S 85 Determining a position of each of the detected objects and/or humans according to the 3D hounding boxes.
- Step S 86 Controlling the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans.
- the predetermined tasks correspond to the positions of the detected objects and humans.
- the robot can select pre-set tasks corresponding to the positions of the detected objects and humans according to the position distribution of the objects and humans in the recognized scene.
- the predetermined tasks may include bypassing obstacles, slow movement, interactions, and the like.
- the method shown in FIG. 8 can be implemented by other devices, such as a computer equipped with a depth camera.
- the computer may output the determined positions of the detected objects and/or humans to a user after step S 83 .
- FIG. 9 shows an exemplary flowchart of a method for a robot to recognize a human-object scene according to one embodiment.
- the method can be implemented to control the movement of the robot 10 shown in FIGS. 1 and 2 , and can be specifically implemented by the robot 10 shown it FIG. 2 or other control devices electrically coupled to the robot 10 .
- the method may include steps S 91 to S 98 .
- Step S 91 Setting an object of interest as a target object.
- a user may input the name, shape, contour, size and other data of objects through a robot or computer to define the objects of interest.
- One or more objects inputted by the user as the objects of interest serve as a basis to for determining the human-object or object-object relationship.
- the chair is set as the target object, and it is determined whether the human is near the chair in each frame of the image.
- “being near” means that the one or more objects of interest are in contact with at least one surface of another object or human. When the one or more objects of interest is not in contact with any surfaces of the object or human, it is determined as “being not near.” In one embodiment, only when the target objects are present in the scene (if only one target object is defined, it is a person), can it be analyzed whether the target objects are near another object or human. A distance threshold can be preset as the criterion for “being near.”
- Step S 92 Acquiring an input RGB image and a depth image corresponding to the RGB image.
- the RGB-D three-dimensional sensor equipped on the robot 10 captures the scene image in front of the robot to obtain the RGB image and the depth image corresponding to the RGB image.
- Step S 93 Detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- the segmentation classification algorithm is to detect common objects and humans in the scene.
- a deep learning method e.g., Mask-RCNN algorithm
- the algorithm detects objects and humans in the RGB image, and the result of the detection is to generate a segmentation mask for the common objects and human in the RGB image, and obtain the coordinates of pixels of each of the common objects and humans. All of or a portion of the objects and humans in the image can be detected.
- Step S 94 In response to detection of no objects and humans, outputting the detection result.
- Step S 95 In response to detection of the objects and humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans.
- the depth values of the pixels of each segment can be used for three-dimensional coordinate calculation.
- performing the segment detection to each of the detected objects and/or humans based on the RGB image and the depth image may include shrinking inwardly contours of objects and/or humans in each segment of the RGB image and the depth image inwardly using an erode algorithm, to acquire confident segments of the objects and/or humans in each segment of the RGB image and the depth image; and calculating 3D bounding boxes corresponding to shrank data using, a Convex Hull algorithm to compensate for volume of the objects and/or humans in each segment of the RGB image and the depth image.
- the contour pixels in each segment have the highest possibility of misclassification, such as the pixels between the person and the background segment in FIG. 4 .
- This method is to use the erode algorithm to inwardly shrink the contour of the detected objects/humans, and the shrinkage number is changed by defining the number of iterations. It is worth noting that the number of iterations is an adjustable parameter and can be different for different objects/humans. The shrinking leads to a reliable segmentation of the objects/humans. Then the Convex Hull algorithm is used to calculate the 3D bounding boxes corresponding to the shrunk data.
- the values of the 3D bounding boxes which are adjustable variables, are increased by a certain amount. This process is called compensation for the volume. It should be noted that the above-mentioned calculation is performed for each segmentation. Later, it will be determined whether to perform the merge operation based on the relative positions of the same objects/humans.
- the pixels that shrink along the contour of the line segment and the volume value to be added are parameters that can be adjusted to achieve the best balance.
- the point group of each segment can be expressed using base frame X-, Y-, and Z-coordinates, where the X-Y plane is the ground in the real world, and Z- is for height.
- a Convex Hull calculation is applied for the point group of each segment.
- the Convex Hull calculation is to save the shape data of the target objects with the least data, and the target objects refers to the objects currently being analyzed.
- the Convex Hull calculation method specifically refers to a method based on the coordinates of the vertices of the outermost contour of the objects.
- the Convex Hull can calculate whether each point is contained in the closed graph formed by the rest of the points. If it is contained in the closed graph, the point will be discarded. If it is not contained in the closed graph, the point will be used as a new contribution point to form a closed graph, until no point can be surrounded by the closed graph formed by the rest of the points.
- the Convex Hull only applies to the projected coordinated to the X-Y plane of each point group and for the Z- values, only minimum/maximum values are needed. Instead of using thousands of points initially in the point group of each segment, 30 points may be extracted as the Convex Hull points which persist all useful information for the 3D bounding box calculation.
- the useful information here refers to the coordinates, the shape, size and pose of the objects/humans being processed.
- the convex hull points are the output result of the convex hull algorithm.
- the projection of these convex hull points on the ground plane is the vertices of the outer boundary of the projection of the objects/humans on the ground plane.
- the heights of the convex hull points are the height values of the upper and lower planes of the objects/humans, and the upper surface height or the lower surface height is randomly selected here.
- a target human is the same as the method of detecting a target object described above, and the target human refers to the human currently being analyzed.
- a three-dimensional position/orientation with a minimum-volume bounding box can be generated for each analyzed object/human in the scene in the RGB image.
- Step S 96 Determining whether two or more segments of a same object category need to be merged as one of the objects or humans.
- One object/human may include multiple discontinuous segments due to occlusion. Therefore, it is necessary to determine whether two or more segments are a portion of one object/human.
- the segments of the same object category may be multiple segments of the same object.
- the table in FIG 7 is separated into three segments 1 , 2 , and 3 . Therefore, before generating the final 3D bounding boxes, an additional step is performed to check whether the two or more segments need to be merged into one object/human. The calculation is based on the three-dimensional positions, directions and sizes of the bounding boxes of each segment. A tolerance threshold distance is also set as an adjustable parameter for best performance.
- Step S 97 Outputting each detected objects and/or humans with corresponding classification names, and 3D bounding boxes of the detected objects and/or humans.
- Step S 96 the information of the 3D bounding boxes of each object/person is generated and set as independent output, this can be directly used for robotics target position set up automatically and/or collision avoiding in navigating process under needed situations.
- Step S 98 Determining whether the detected objects in the RGB image comprise the target object according to 3D bounding boxes; in response to detection of the target object, acquiring three-dimensional position and orientation with minimum-volume 3D bounding boxes of the detected objects and/or humans and the detected target object; determining the positional relationship between the one or more objects or humans and the objects of interest according to the three-dimensional position and orientation, and determining a predetermined task according to the positional relationship.
- determining the positional relationship between the one more objects or humans and the objects of interest according to the three-dimensional position and orientation may include determining whether the one or more of the detected objects and/or humans are near the detected target object by performing a stereo based calculation based on the information of the 3D bounding boxes of the detected object and the one or more of the detected objects and/or humans.
- determining positional relationship between the one or more if the detected objects and/or humans and the detected target object according to the three-dimensional position and orientation may include determining whether the one or more of the detected objects and/or humans are near the detected target object by performing a stereo based calculation based on the information of the 3D bounding boxes of the detected objects and the one or more of the detected objects and/or humans.
- step S 98 may include the following steps.
- Step S 981 Comparing positions of first 2D hounding boxes formed by projection of the 3D bounding boxes of the detected objects or humans on a supporting surface (e.g., floor, ground, etc.), with positions of second 2D bounding boxes formed by projection of the 3D bounding boxes of the target object on the supporting surface.
- a supporting surface e.g., floor, ground, etc.
- the objects or humans outside one target object are compared with the target object to determine the position relationship between the objects or humans and the target object.
- the position relationship includes “near” and “not near”.
- Step S 982 In response to the positions of the first 2D bounding boxes partly overlapping the positions of the second 2D bounding boxes, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S 983 In response to the positions of the first 2D bounding boxes not overlapping the positions of the second 2D bounding boxes, determining whether the positions of the first 2D bounding boxes overlap the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated.
- Step S 984 In response to the positions of the first 2D bounding boxes overlapping the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S 985 In response to the positions of the first 2D hounding boxes not overlapping the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated, determining whether a shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes is less than a variable threshold.
- variable threshold is variable for each target object.
- Step S 986 In response to the shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes being less than the variable threshold, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S 987 In response to the shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes being greater than the variable threshold, determining that the one or more of the detected objects and/or humans are not near the detected target object.
- the method according to the aforementioned embodiments can provide scene understanding information based on the relationship between the robot and the objects/humans in the RGB image.
- the scene understanding information may include positional relationship between the target object and other detected objects and/or humans, which serves as a basis for the next operation to be performed. This can be critical in various daily situations when human reaches a target object, the robot would be able to react quickly and perform the assistance accordingly. For example, when an old person sits on the chair, a robot would detect this scene and approach the person and provide water/food/other assistance as needed.
- the method according to the aforementioned embodiments has advantages as follows.
- the position and direction of objects and human in the three-dimensional space are detected, and the position of all custom input objects can be determined, and the direction can be determined according to their presence in the current scene.
- This can further be used for robotics target position set up as well as occlusion avoidance by navigation.
- the position and orientation can be dynamically updated based on position change of the robot.
- Shrinking contour and compensating for volume are introduced to remove misclassification values.
- the Convex Hull is used for the minimum memory/CPU cost while persisting all useful information,
- the stereo-based calculation method is introduced to merge occlusion caused segmentation pieces into one object.
- the semantic scene understanding system is developed and allows a user to set target objects. The system is easy to apply to any scenes or objects of interest.
- the method according to the aforementioned embodiments can be used for object stereo information calculation, finding target objects in current scene, and scene understanding of human-object and object-object relationship.
- the RGBD camera sensor is economic and can be arranged on various positions of the robot with different quantronium angle. With the knowledge of camera mounting height and quantronium values, a relative position/orientation angle of each object near the robot and the objects relationship can be generated.
- FIG. 12 is a schematic block diagram of a human-object recognizing device according to one embodiment.
- the human-object recognizing device may include, but are not limited to: cellular phones, smart phones, other wireless communication devices, personal digital assistants, and audio players, other media players, music recorders, video recorders, cameras, other media recorders, radios, vehicle transportation equipment, laptop computers, desktop computers, netbook computers, Personal Digital Assistants (PDA), Portable Multimedia Players (PMP), Moving Picture Experts Group (MPEG-1 or MPEG-2) Audio Layer 3 (MP3) players, portable gaming devices (such as Nintendo DSTM, PlayStation PortableTM) Gameboy AdvanceTM, iPhoneTM), portable Internet devices, data storage devices, smart wearable devices (or example, head mounted devices (HMD) such as smart glasses, smart clothes, smart bracelets, smart necklaces, or smart watches), digital cameras and their combinations.
- the device can be installed on the robot, or it can be the robot itself. In some cases, the device can perform multiple functions, such as, playing music, displaying videos, storing pictures, and receiving
- the device may include a processor 110 , a storage 111 and one or more executable computer programs 112 that are stored in the storage 111 and executable by the processor 110 .
- the processor 110 executes the computer programs 112 , the steps in the embodiments of the method for controlling the robot 10 , such as steps S 81 to S 86 in FIG. 8 , are implemented.
- the one or more computer programs 112 may be divided into one or more modules/units, and the one or more modules/units are stored in the storage 111 and executable by the processor 110 .
- the one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the one or more computer programs 112 in the device.
- the one or more computer programs 112 may be divided into an acquiring unit, a detecting unit, a recognition unit and a control unit.
- the acquiring unit is configured to acquire an input RGB image and a depth image corresponding to the RGB image.
- the detecting module is configured to detect objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- the recognizing unit is configured to, in response to detection of objects and/or humans, determine a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans.
- the control unit is configured to control the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans.
- FIG. 12 is only an example of the device 11 , and does not constitute a limitation on the device 11 . In practical applications, it may include more or fewer components, or a combination of certain components, or different components.
- the device 11 may also include: input/output devices (such as keyboards, microphones, cameras, speakers, display screens, etc.), network connections access equipment, buses, sensors, etc.
- the processor 110 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable logic device, a discrete gate, a transistor logic device, or a discrete hardware component.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the storage 111 may be an internal storage unit, such as a hard disk or a memory.
- the storage 111 may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD) card, or any suitable flash cards.
- the storage 111 may also include both an internal storage unit and an external storage device.
- the storage 111 is used to store computer programs, other programs, and data required by the robot.
- the storage 111 can also be used to temporarily store data that have been output or is about to be output.
- a non-transitory computer-readable storage medium is provided.
- the non-transitory computer-readable storage medium may be configured in the robot 10 shown in FIG. 1 or in the device shown in FIG. 12 .
- the non-transitory computer-readable storage medium stores executable computer programs, and when the programs are executed by the one or more processors of the robot 10 , the human-object scene recognition method described in the embodiments above is implemented.
- the division of the above-mentioned functional units and modules is merely an example for illustration.
- the above-mentioned functions may be allocated to be performed by different functional units according to requirements, that is, the internal structure of the device may be divided into different functional units and modules to complete all or part of the above-mentioned functions.
- the functional units and modules in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional unit.
- each functional unit and module is merely for the convenience of distinguishing each other and are not intended to limit the scope of protection of the present disclosure.
- the specific operation process of the units and modules in the above-mentioned system reference may be made to the corresponding processes in the above-mentioned method embodiments, and are not described herein.
- a non-transitory computer-readable storage medium that may be configured in the robot 10 or the mobile robot control device as described above.
- the non-transitory computer-readable storage medium may be the storage unit configured in the main control chip and the data acquisition chip in the foregoing embodiments.
- One or more computer programs are stored on the non-transitory computer-readable storage medium, and when the computer programs are executed by one or more processors, the robot control method described in the embodiment above is implemented.
- the disclosed apparatus (device)/terminal device and method may be implemented in other manners.
- the above-mentioned apparatus (device)/terminal device embodiment is merely exemplary.
- the division of modules or units is merely a logical functional division, and other division manner may be used in actual implementations, that is, multiple units or components may be combined or be integrated into another system, or some of the features may be ignored or not performed.
- the shown or discussed mutual coupling may be direct coupling or communication connection, and may also be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms.
- the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual requirements to achieve the objectives of the solutions of the embodiments.
- the functional units and modules in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional unit.
- the integrated module/unit When the integrated module/unit is implemented in the form of a software functional unit and is sold or used as an independent product, the integrated module/unit may be stored in a non-transitory computer-readable storage medium. Based on this understanding, all or part of the processes in the method for implementing the above-mentioned embodiments of the present disclosure may also be implemented by instructing relevant hardware through a computer program.
- the computer program may be stored in a non-transitory computer-readable storage medium, which may implement the steps of each of the above-mentioned method embodiments when executed by a processor.
- the computer program includes computer program codes which may be the form of source codes, object codes, executable files, certain intermediate, and the like.
- the computer-readable medium may include any primitive or device capable of carrying the computer program codes, a recording medium, a USB flash drive, a portable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random-access memory (RAM), electric carrier signals, telecommunication signals and software distribution media.
- a computer readable medium does not include electric carrier signals and telecommunication signals.
- the content included in the computer readable medium could be appropriately increased and decreased according to requirements of legislation and patent practice under judicial jurisdictions.
- the computer readable medium does not include the electric carrier signal and the telecommunication signal according to the legislation and the patent practice.
Abstract
Description
- The present disclosure generally relates to field of object recognition, and particularly to a human-object scene recognition method, device and computer-readable storage medium.
- Scene understanding is a deeper level of object detection, recognition and reasoning based on image analysis. On the basis of image understanding, image data is processed to obtain an understanding of the content of the scene reflected in the image.
- Conventional image resource utilization typically analyzes low-level visual features, such as color, shape, and texture. However, low-level visual features only represent visual information. With the semantic information contained in the image content ignored, the positioning errors of objects and/or humans are large, and there is a deviation in the understanding of the scene in the images.
- Therefore, there is a need to provide a method and a device to overcome the above-mentioned problem.
- Many aspects of the present embodiments can be better understood with reference to the following drawings. The components in the drawings are not necessarily drawn to scale, the emphasis instead being placed upon clearly illustrating the principles of the present embodiments. Moreover, in the drawings, all the views are schematic, and like reference numerals designate corresponding parts throughout the several views.
-
FIG. 1 is a schematic diagram of a robot according to one embodiment. -
FIG. 2 is a schematic block diagram of the robot according to one embodiment. -
FIG. 3 shows an image of an exemplary scene including a person standing away from a chair. -
FIG. 4 shows an image of an exemplary scene including a person sitting on a chair. -
FIG. 5 shows an image of an exemplary scene including a bed and a chair standing away from the bed. -
FIG. 6 shows an image of an exemplary scene including a bed and a nightstand near the bed. -
FIG. 7 shows an image of an exemplary scene including a table and two chairs. -
FIG. 8 is an exemplary flowchart of a human-object scene recognition method according to one embodiment. -
FIG. 9 is an exemplary flowchart of a human-object scene recognition method according to another embodiment, -
FIG. 10 is an exemplary flowchart of step S98 of the method ofFIG. 9 . -
FIG. 11 is a processing logic flowchart of computer programs in a method for a robot to recognize a human-object scene. -
FIG. 12 is schematic block diagram of a human-object recognition device according to one embodiment. - The disclosure is illustrated by way of example and not by way of limitation in the figures of the accompanying drawings, in which like reference numerals indicate similar elements. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references can mean “at least one” embodiment.
- Although the features and elements of the present disclosure are described as embodiments in particular combinations, each feature or element can be used alone or in other various combinations within the principles of the present disclosure to the full extent indicated by the broad general meaning of the terms in which the appended claims are expressed.
-
FIG. 1 is a schematic diagram of arobot 10 according to one embodiment.FIG. 2 is a schematic block diagram of therobot 10 according to one embodiment. Therobot 10 may be a mobile robot (e.g., wheeled robot). Therobot 10 can operate in various application environments, such as hospitals, factories, warehouse, malls, streets, airports, home, elder care centers, museums, restaurants, hotels, and even wild fields, etc. However, the example ofFIG. 1 is merely an illustrative example. Therobot 10 may be other types of robots. - In one embodiment, the
robot 10 may include acamera 101, anactuator 102, amobility mechanism 103, aprocessor 104, astorage 105, and acommunication interface module 106. Thecamera 101 may be, for example, an RGB-D three-dimensional sensor arranged on the body of therobot 10. Thecamera 101 is electrically connected to theprocessor 104 for transmitting the captured image data to theprocessor 104. Theactuator 102 may be a motor or a servo. Themobility mechanism 103 may include one or more wheels and/or tracks, and wheels are illustrated inFIG. 1 as an example. Theactuator 102 is electrically coupled to themobility mechanism 103 and theprocessor 104, and can actuate movement of themobility mechanism 103 according to commands from theprocessor 104. - The
storage 105 may include a non-transitory computer-readable storage medium. One or moreexecutable computer programs 107 are stored in thestorage 105. Theprocessor 104 is electrically connected to thestorage 105, and perform corresponding operations by executing the executable computer programs stored in thestorage 105. Thecommunication interface module 106 may include a wireless transmitter, a wireless receiver, and computer programs executable by theprocessor 104. Thecommunication interface module 106 is electrically connected to theprocessor 104 and is configured for communication between theprocessor 104 and external devices. In one embodiment, thecamera 101, theactuator 102, themobility mechanism 103, theprocessor 104, thestorage 105, and thecommunication interface module 106 may be connected to one another by a bus. - When the
processor 104 executes thecomputer programs 107, the steps in the embodiments of the method for controlling therobot 10, such as steps S81 through S86 inFIG. 8 , steps S91 through S98 inFIG. 9 , and steps S981 through S987 inFIG. 9 , are implemented. - The
processor 104 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable logic device, a discrete gate, a transistor logic device, or a discrete hardware component. The general-purpose processor may be a microprocessor or any conventional processor or the like. - The
storage 105 may be an internal storage unit of therobot 10, such as a hard disk or a memory. Thestorage 105 may also be an external storage device of therobot 10, such as a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD) card, or any suitable flash cards. Furthermore, thestorage 105 may also include both an internal storage unit and an external storage device. Thestorage 105 is used to store computer programs, other programs, and data required by the robot. Thestorage 105 can also be used to temporarily store data that have been output or is about to be output. - Exemplarily, the one or
more computer programs 107 may be divided into one or more modules/units, and the one or more modules/units are stored in thestorage 105 and executable by theprocessor 104. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the one ormore computer programs 107 in therobot 10. For example, the one ormore computer programs 112 may be divided into an acquiring unit, a detecting unit, a recognition unit and a control unit. The acquiring unit is configured to acquire an input RGB image and a depth image corresponding to the RGB image. The detecting module is configured to detect objects and humans in the RGB image using a segmentation classification algorithm based on a sample database. The recognizing unit is configured to, in response to detection of objects and/or humans, determine a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans. The control unit is configured to control the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans. - A method for a robot to recognize a human-object scene according to embodiments of the present disclosure allow a robot to automatically set a target position and navigate while avoiding collisions. In addition, the method can also provide application scenarios such as whether a target object is in the scene, the position of the target object, and semantic information about whether human/other humans are near the target object. Specifically, referring to
FIG. 11 , an RGB image and a corresponding depth image are inputted. The RGB image would go through a segmentation classification algorithm first for the detection of common objects and humans in the scene. Before final 3D bounding boxes are generated, it needs to detect whether separate segments should be merged as one object. Final information of 3D bounding boxes of each detected object/human is generated and set as independent output, which can be directly used for robotics target position set up and/or collision avoiding in navigating process under needed situations. A customer assigned object(s) of interest can be taken as the target object(s) for the calculation as human-object or object-object relationship. The analysis of whether the detected object/human is near the target object(s) can only be performed when target object(s) (and person if only one object defined) are present in the scene. With the information of the 3D bounding boxes of target object(s) and a person, a stereo based calculation step is performed for the “near” check. An output of whether the person is near the target object(s) or whether two or more target objects are near each other would be generated. With the help of this human/object-environment interaction information, a guide for the robot-human-environment interaction can be achieved. - The representative results of the understanding of the human-object relationship scene are shown in
FIGS. 3 and 4 . Specifically,FIGS. 3 and 4 show images containing a person and a chair, which are taken by thecamera 101 of the robot which contain. InFIG. 3 , the person is standing away from the chair, and inFIG. 4 , the person is standing behind the chair. In each image, the upper left corner shows the recognition results of a target object (i.e., the chair) present in the scene, and information about whether the person is near the target object. The representative results of the understanding of the object-object relationship scene are shown inFIGS. 5-7 . Specifically,FIG. 5 shows a chair away from a bed,FIG. 6 shows a nightstand in contact with the bed, andFIG. 7 shows two chairs near the table. In each image, the upper left corner shows the recognition results of target objects and information about whether the target objects are near each other. - In one embodiment, the recognized humans/objects in each image are surrounded by 3D bounding boxes. For example, the recognized human and chair in
FIGS. 3 and 5-7 are surrounded by3D bounding boxes FIG. 4 are only for representation purpose. - The robot captures images through the
camera 101 while moving, and sends the captured images to theprocessor 104. Theprocessor 104 processes the captured images by executingexecutable computer programs 107 to complete the recognition of the human-object scene. Specifically, the processing process is as follows: acquiring an input RGB image and a depth image corresponding to the RGB image; detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database; and in response to detection of objects and/or humans, determining a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans. -
FIG. 8 shows an exemplary flowchart of a method for recognizing a human-object scene according to one embodiment. The method can be implemented to control the movement of therobot 10 shown inFIGS. 1 and 2 , and can be specifically implemented therobot 10 shown inFIG. 2 or other control devices electrically coupled to therobot 10. The control devices may include, but are not limited to: desktop computers, tablet computers, laptop computers, multimedia players, servers, smart mobile devices (such as smart phones, handheld phones, etc.) and smart wearable devices (such as smart watches, smart glasses, smart cameras, smart bands, etc.) and other computing devices with computing and control capabilities. In one embodiment, the method may include steps S81 to S86. - Step S81: Acquiring an input RGB image and a depth image corresponding to the RGB image.
- In one embodiment, the RGB-D three-dimensional sensor equipped on the
robot 10 captures the scene image in front of the robot to obtain the RGB image and the depth image corresponding to the RGB image. - Step S82: Detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- In one embodiment, the segmentation detection of the image is to detect the objects and humans in the input single RGB image by using a deep learning method. It should be noted that there may be only objects in the RGB image, only humans in the RGB image, or humans and objects in the RGB image. In one embodiment, the objects and humans refer to common objects and humans that are objects and humans in the ordinary sense and do not specifically refer to certain persons or certain objects. The image characteristics of various common objects and humans that may appear in each scene are pre-stored, which can serve as a basis for determining the characteristics of common objects and humans in image detection.
- Step S83: In response to detection of objects and/or humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image, and acquiring a result of the segment detection.
- In each of segments of the detected objects and/or humans, with the camera parameters taken into consideration, the depth values of the pixels of each segment can be used for three-dimensional coordinate calculation. The depth values can be obtained from the depth image corresponding to the RGB image.
- Step S84: Calculating 3D bounding boxes for each of the detected objects and/or humans according to the result of the segment detection.
- Step S85: Determining a position of each of the detected objects and/or humans according to the 3D hounding boxes.
- Step S86: Controlling the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans.
- The predetermined tasks correspond to the positions of the detected objects and humans. The robot can select pre-set tasks corresponding to the positions of the detected objects and humans according to the position distribution of the objects and humans in the recognized scene. The predetermined tasks may include bypassing obstacles, slow movement, interactions, and the like.
- It should be noted that the method shown in
FIG. 8 can be implemented by other devices, such as a computer equipped with a depth camera. In this case, the computer may output the determined positions of the detected objects and/or humans to a user after step S83. -
FIG. 9 shows an exemplary flowchart of a method for a robot to recognize a human-object scene according to one embodiment. The method can be implemented to control the movement of therobot 10 shown inFIGS. 1 and 2 , and can be specifically implemented by therobot 10 shown itFIG. 2 or other control devices electrically coupled to therobot 10. - In one embodiment, the method may include steps S91 to S98.
- Step S91: Setting an object of interest as a target object.
- In one embodiment, a user may input the name, shape, contour, size and other data of objects through a robot or computer to define the objects of interest. One or more objects inputted by the user as the objects of interest serve as a basis to for determining the human-object or object-object relationship. As shown in
FIG. 3 , the chair is set as the target object, and it is determined whether the human is near the chair in each frame of the image. - In one embodiment, “being near” means that the one or more objects of interest are in contact with at least one surface of another object or human. When the one or more objects of interest is not in contact with any surfaces of the object or human, it is determined as “being not near.” In one embodiment, only when the target objects are present in the scene (if only one target object is defined, it is a person), can it be analyzed whether the target objects are near another object or human. A distance threshold can be preset as the criterion for “being near.”
- Step S92: Acquiring an input RGB image and a depth image corresponding to the RGB image.
- In one embodiment, the RGB-D three-dimensional sensor equipped on the
robot 10 captures the scene image in front of the robot to obtain the RGB image and the depth image corresponding to the RGB image. - Step S93: Detecting objects and humans in the RGB image using a segmentation classification algorithm based on a sample database.
- In one embodiment, the segmentation classification algorithm is to detect common objects and humans in the scene. A deep learning method (e.g., Mask-RCNN algorithm) can be used to perform the segmentation detection of the image. The algorithm detects objects and humans in the RGB image, and the result of the detection is to generate a segmentation mask for the common objects and human in the RGB image, and obtain the coordinates of pixels of each of the common objects and humans. All of or a portion of the objects and humans in the image can be detected.
- Step S94: In response to detection of no objects and humans, outputting the detection result.
- Step S95: In response to detection of the objects and humans, performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans.
- In each of segments of the detected objects and/or humans, with the camera parameters taken into consideration, the depth values of the pixels of each segment can be used for three-dimensional coordinate calculation.
- In one embodiment, performing the segment detection to each of the detected objects and/or humans based on the RGB image and the depth image may include shrinking inwardly contours of objects and/or humans in each segment of the RGB image and the depth image inwardly using an erode algorithm, to acquire confident segments of the objects and/or humans in each segment of the RGB image and the depth image; and calculating 3D bounding boxes corresponding to shrank data using, a Convex Hull algorithm to compensate for volume of the objects and/or humans in each segment of the RGB image and the depth image.
- The contour pixels in each segment have the highest possibility of misclassification, such as the pixels between the person and the background segment in
FIG. 4 . In order to eliminate this misclassification problem and improve robustness, it requires a method to shrink the contour of the segment and compensate for the volume. This method is to use the erode algorithm to inwardly shrink the contour of the detected objects/humans, and the shrinkage number is changed by defining the number of iterations. It is worth noting that the number of iterations is an adjustable parameter and can be different for different objects/humans. The shrinking leads to a reliable segmentation of the objects/humans. Then the Convex Hull algorithm is used to calculate the 3D bounding boxes corresponding to the shrunk data. The values of the 3D bounding boxes, which are adjustable variables, are increased by a certain amount. This process is called compensation for the volume. It should be noted that the above-mentioned calculation is performed for each segmentation. Later, it will be determined whether to perform the merge operation based on the relative positions of the same objects/humans. - The pixels that shrink along the contour of the line segment and the volume value to be added are parameters that can be adjusted to achieve the best balance. Specifically, based on the camera mounting height and quantronium information, the point group of each segment can be expressed using base frame X-, Y-, and Z-coordinates, where the X-Y plane is the ground in the real world, and Z- is for height. With the assumption that all objects (especially furniture) and humans discussed here are dynamically stable in base frame, all 3D Bounding boxes discussed later have at least one plane parallel to the X-Y plane.
- To maintain the minimum memory/CPU cost of each calculation step, a Convex Hull calculation is applied for the point group of each segment. The Convex Hull calculation is to save the shape data of the target objects with the least data, and the target objects refers to the objects currently being analyzed. The Convex Hull calculation method specifically refers to a method based on the coordinates of the vertices of the outermost contour of the objects. At the algorithm level, the Convex Hull can calculate whether each point is contained in the closed graph formed by the rest of the points. If it is contained in the closed graph, the point will be discarded. If it is not contained in the closed graph, the point will be used as a new contribution point to form a closed graph, until no point can be surrounded by the closed graph formed by the rest of the points.
- It should be noted that the Convex Hull only applies to the projected coordinated to the X-Y plane of each point group and for the Z- values, only minimum/maximum values are needed. Instead of using thousands of points initially in the point group of each segment, 30 points may be extracted as the Convex Hull points which persist all useful information for the 3D bounding box calculation. The useful information here refers to the coordinates, the shape, size and pose of the objects/humans being processed. The convex hull points are the output result of the convex hull algorithm. The projection of these convex hull points on the ground plane is the vertices of the outer boundary of the projection of the objects/humans on the ground plane. The heights of the convex hull points are the height values of the upper and lower planes of the objects/humans, and the upper surface height or the lower surface height is randomly selected here.
- It should be noted that the method used in detecting, a target human is the same as the method of detecting a target object described above, and the target human refers to the human currently being analyzed. Through the calculation above, a three-dimensional position/orientation with a minimum-volume bounding box can be generated for each analyzed object/human in the scene in the RGB image.
- Step S96: Determining whether two or more segments of a same object category need to be merged as one of the objects or humans.
- In one embodiment, it is first determined whether the two or more segments are a portion of the one of the objects or humans according to three-dimensional positions, directions, sizes, and tolerance threshold distances of the 3D bounding boxes of the two or more segments. One object/human may include multiple discontinuous segments due to occlusion. Therefore, it is necessary to determine whether two or more segments are a portion of one object/human. In response to the two or more segments being a portion of the one of the objects or humans, merge the two or more segments as the one of the objects or humans. In response to the two or more segments being not a portion of the one of the objects or humans, determine not to merge the two or more segments as one of the objects or humans.
- Specifically, due to occlusion, the segments of the same object category may be multiple segments of the same object. For example, due to the existence of the chairs, the table in
FIG 7 is separated into threesegments - Step S97: Outputting each detected objects and/or humans with corresponding classification names, and 3D bounding boxes of the detected objects and/or humans.
- After Step S96, the information of the 3D bounding boxes of each object/person is generated and set as independent output, this can be directly used for robotics target position set up automatically and/or collision avoiding in navigating process under needed situations.
- Step S98: Determining whether the detected objects in the RGB image comprise the target object according to 3D bounding boxes; in response to detection of the target object, acquiring three-dimensional position and orientation with minimum-
volume 3D bounding boxes of the detected objects and/or humans and the detected target object; determining the positional relationship between the one or more objects or humans and the objects of interest according to the three-dimensional position and orientation, and determining a predetermined task according to the positional relationship. - In one embodiment, determining the positional relationship between the one more objects or humans and the objects of interest according to the three-dimensional position and orientation may include determining whether the one or more of the detected objects and/or humans are near the detected target object by performing a stereo based calculation based on the information of the 3D bounding boxes of the detected object and the one or more of the detected objects and/or humans.
- In one embodiment, determining positional relationship between the one or more if the detected objects and/or humans and the detected target object according to the three-dimensional position and orientation may include determining whether the one or more of the detected objects and/or humans are near the detected target object by performing a stereo based calculation based on the information of the 3D bounding boxes of the detected objects and the one or more of the detected objects and/or humans. Referring to
FIG. 10 , step S98 may include the following steps. - Step S981: Comparing positions of first 2D hounding boxes formed by projection of the 3D bounding boxes of the detected objects or humans on a supporting surface (e.g., floor, ground, etc.), with positions of second 2D bounding boxes formed by projection of the 3D bounding boxes of the target object on the supporting surface.
- Specifically, the objects or humans outside one target object are compared with the target object to determine the position relationship between the objects or humans and the target object. The position relationship includes “near” and “not near”.
- Step S982: In response to the positions of the first 2D bounding boxes partly overlapping the positions of the second 2D bounding boxes, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S983: In response to the positions of the first 2D bounding boxes not overlapping the positions of the second 2D bounding boxes, determining whether the positions of the first 2D bounding boxes overlap the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated.
- Step S984: In response to the positions of the first 2D bounding boxes overlapping the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S985: In response to the positions of the first 2D hounding boxes not overlapping the positions of the second 2D bounding boxes after the first 2D bounding boxes and the second 2D bounding boxes are rotated, determining whether a shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes is less than a variable threshold.
- In one embodiment, the variable threshold is variable for each target object.
- Step S986: In response to the shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes being less than the variable threshold, determining that the one or more of the detected objects and/or humans are near the detected target object.
- Step S987: In response to the shortest distance between the positions of the first 2D bounding boxes and the second 2D bounding boxes being greater than the variable threshold, determining that the one or more of the detected objects and/or humans are not near the detected target object.
- By outputting whether the objects or humans are near the target object, or whether multiple target objects (for example, two target objects) are near each other, it can realize the guidance of robot-human-environment interaction.
- When implemented by a robot, the method according to the aforementioned embodiments can provide scene understanding information based on the relationship between the robot and the objects/humans in the RGB image. The scene understanding information may include positional relationship between the target object and other detected objects and/or humans, which serves as a basis for the next operation to be performed. This can be critical in various daily situations when human reaches a target object, the robot would be able to react quickly and perform the assistance accordingly. For example, when an old person sits on the chair, a robot would detect this scene and approach the person and provide water/food/other assistance as needed.
- The method according to the aforementioned embodiments has advantages as follows. By combining the segmentation and classification results with depth information, the position and direction of objects and human in the three-dimensional space are detected, and the position of all custom input objects can be determined, and the direction can be determined according to their presence in the current scene. This can further be used for robotics target position set up as well as occlusion avoidance by navigation. Note the position and orientation can be dynamically updated based on position change of the robot. Shrinking contour and compensating for volume are introduced to remove misclassification values. The Convex Hull is used for the minimum memory/CPU cost while persisting all useful information, The stereo-based calculation method is introduced to merge occlusion caused segmentation pieces into one object. The semantic scene understanding system is developed and allows a user to set target objects. The system is easy to apply to any scenes or objects of interest.
- The method according to the aforementioned embodiments can be used for object stereo information calculation, finding target objects in current scene, and scene understanding of human-object and object-object relationship. The RGBD camera sensor is economic and can be arranged on various positions of the robot with different quantronium angle. With the knowledge of camera mounting height and quantronium values, a relative position/orientation angle of each object near the robot and the objects relationship can be generated.
-
FIG. 12 is a schematic block diagram of a human-object recognizing device according to one embodiment. The human-object recognizing device may include, but are not limited to: cellular phones, smart phones, other wireless communication devices, personal digital assistants, and audio players, other media players, music recorders, video recorders, cameras, other media recorders, radios, vehicle transportation equipment, laptop computers, desktop computers, netbook computers, Personal Digital Assistants (PDA), Portable Multimedia Players (PMP), Moving Picture Experts Group (MPEG-1 or MPEG-2) Audio Layer 3 (MP3) players, portable gaming devices (such as Nintendo DS™, PlayStation Portable™) Gameboy Advance™, iPhone™), portable Internet devices, data storage devices, smart wearable devices (or example, head mounted devices (HMD) such as smart glasses, smart clothes, smart bracelets, smart necklaces, or smart watches), digital cameras and their combinations. According to actual needs, the device can be installed on the robot, or it can be the robot itself. In some cases, the device can perform multiple functions, such as, playing music, displaying videos, storing pictures, and receiving and sending phone calls. - In one embodiment, the device may include a
processor 110, astorage 111 and one or moreexecutable computer programs 112 that are stored in thestorage 111 and executable by theprocessor 110. When theprocessor 110 executes thecomputer programs 112, the steps in the embodiments of the method for controlling therobot 10, such as steps S81 to S86 inFIG. 8 , are implemented. - Exemplarily, the one or
more computer programs 112 may be divided into one or more modules/units, and the one or more modules/units are stored in thestorage 111 and executable by theprocessor 110. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the one ormore computer programs 112 in the device. For example, the one ormore computer programs 112 may be divided into an acquiring unit, a detecting unit, a recognition unit and a control unit. - The acquiring unit is configured to acquire an input RGB image and a depth image corresponding to the RGB image. The detecting module is configured to detect objects and humans in the RGB image using a segmentation classification algorithm based on a sample database. The recognizing unit is configured to, in response to detection of objects and/or humans, determine a position of each of the detected objects and/or humans by performing a segment detection to each of the detected objects and/or humans based on the RGB image and the depth image and performing a 3D bounding box calculation to each of the detected objects and/or humans. The control unit is configured to control the robot to perform predetermined tasks according to the determined positions of the detected objects and/or humans.
- Those skilled in the art can understand that
FIG. 12 is only an example of thedevice 11, and does not constitute a limitation on thedevice 11. In practical applications, it may include more or fewer components, or a combination of certain components, or different components. For example, thedevice 11 may also include: input/output devices (such as keyboards, microphones, cameras, speakers, display screens, etc.), network connections access equipment, buses, sensors, etc. - The
processor 110 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a programmable logic device, a discrete gate, a transistor logic device, or a discrete hardware component. The general-purpose processor may be a microprocessor or any conventional processor or the like. - The
storage 111 may be an internal storage unit, such as a hard disk or a memory. Thestorage 111 may also be an external storage device, such as a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD) card, or any suitable flash cards. Furthermore, thestorage 111 may also include both an internal storage unit and an external storage device. Thestorage 111 is used to store computer programs, other programs, and data required by the robot. Thestorage 111 can also be used to temporarily store data that have been output or is about to be output. - In one embodiment, a non-transitory computer-readable storage medium is provided. The non-transitory computer-readable storage medium may be configured in the
robot 10 shown inFIG. 1 or in the device shown inFIG. 12 . The non-transitory computer-readable storage medium stores executable computer programs, and when the programs are executed by the one or more processors of therobot 10, the human-object scene recognition method described in the embodiments above is implemented. - A person having ordinary skill in the art may clearly understand that, for the convenience and simplicity of description, the division of the above-mentioned functional units and modules is merely an example for illustration. In actual applications, the above-mentioned functions may be allocated to be performed by different functional units according to requirements, that is, the internal structure of the device may be divided into different functional units and modules to complete all or part of the above-mentioned functions. The functional units and modules in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional unit. In addition, the specific name of each functional unit and module is merely for the convenience of distinguishing each other and are not intended to limit the scope of protection of the present disclosure. For the specific operation process of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the above-mentioned method embodiments, and are not described herein.
- In one embodiment, a non-transitory computer-readable storage medium that may be configured in the
robot 10 or the mobile robot control device as described above. The non-transitory computer-readable storage medium may be the storage unit configured in the main control chip and the data acquisition chip in the foregoing embodiments. One or more computer programs are stored on the non-transitory computer-readable storage medium, and when the computer programs are executed by one or more processors, the robot control method described in the embodiment above is implemented. - In the embodiments above, the description of each embodiment has its own emphasis. For parts that are not detailed or described in one embodiment, reference may be made to related descriptions of other embodiments.
- A person having ordinary skill in the art may clearly understand that, the exemplificative units and steps described in the embodiments disclosed herein may be implemented through electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented through hardware or software depends on the specific application and design constraints of the technical schemes. Those ordinary skilled in the art may implement the described functions in different manners for each particular application, while such implementation should not be considered as beyond the scope of the present disclosure.
- In the embodiments provided by the present disclosure, it should be understood that the disclosed apparatus (device)/terminal device and method may be implemented in other manners. For example, the above-mentioned apparatus (device)/terminal device embodiment is merely exemplary. For example, the division of modules or units is merely a logical functional division, and other division manner may be used in actual implementations, that is, multiple units or components may be combined or be integrated into another system, or some of the features may be ignored or not performed. In addition, the shown or discussed mutual coupling may be direct coupling or communication connection, and may also be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms.
- The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual requirements to achieve the objectives of the solutions of the embodiments.
- The functional units and modules in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional unit.
- When the integrated module/unit is implemented in the form of a software functional unit and is sold or used as an independent product, the integrated module/unit may be stored in a non-transitory computer-readable storage medium. Based on this understanding, all or part of the processes in the method for implementing the above-mentioned embodiments of the present disclosure may also be implemented by instructing relevant hardware through a computer program. The computer program may be stored in a non-transitory computer-readable storage medium, which may implement the steps of each of the above-mentioned method embodiments when executed by a processor. In which, the computer program includes computer program codes which may be the form of source codes, object codes, executable files, certain intermediate, and the like. The computer-readable medium may include any primitive or device capable of carrying the computer program codes, a recording medium, a USB flash drive, a portable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random-access memory (RAM), electric carrier signals, telecommunication signals and software distribution media. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, a computer readable medium does not include electric carrier signals and telecommunication signals. It should be noted that, the content included in the computer readable medium could be appropriately increased and decreased according to requirements of legislation and patent practice under judicial jurisdictions. For example, in some judicial jurisdictions, the computer readable medium does not include the electric carrier signal and the telecommunication signal according to the legislation and the patent practice.
- The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the present disclosure and its practical applications, to thereby enable others skilled in the art to best utilize the present disclosure and various embodiments with various modifications as are suited to the particular use contemplated.
Claims (20)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/386,531 US11854255B2 (en) | 2021-07-27 | 2021-07-27 | Human-object scene recognition method, device and computer-readable storage medium |
PCT/CN2022/107908 WO2023005922A1 (en) | 2021-07-27 | 2022-07-26 | Human-object scene recognition method and apparatus, and computer-readable storage medium |
CN202280004525.5A CN115777117A (en) | 2021-07-27 | 2022-07-26 | Human-object scene recognition method, device and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/386,531 US11854255B2 (en) | 2021-07-27 | 2021-07-27 | Human-object scene recognition method, device and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230030837A1 true US20230030837A1 (en) | 2023-02-02 |
US11854255B2 US11854255B2 (en) | 2023-12-26 |
Family
ID=85039202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/386,531 Active 2042-05-12 US11854255B2 (en) | 2021-07-27 | 2021-07-27 | Human-object scene recognition method, device and computer-readable storage medium |
Country Status (3)
Country | Link |
---|---|
US (1) | US11854255B2 (en) |
CN (1) | CN115777117A (en) |
WO (1) | WO2023005922A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116061187A (en) * | 2023-03-07 | 2023-05-05 | 睿尔曼智能科技(江苏)有限公司 | Method for identifying, positioning and grabbing goods on goods shelves by composite robot |
US20230326048A1 (en) * | 2022-03-24 | 2023-10-12 | Honda Motor Co., Ltd. | System, information processing apparatus, vehicle, and method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9639084B2 (en) * | 2014-08-27 | 2017-05-02 | Honda Motor., Ltd. | Autonomous action robot, and control method for autonomous action robot |
US20180108137A1 (en) * | 2016-10-18 | 2018-04-19 | Adobe Systems Incorporated | Instance-Level Semantic Segmentation System |
US20200066036A1 (en) * | 2018-08-21 | 2020-02-27 | Samsung Electronics Co., Ltd. | Method and apparatus for training object detection model |
CN111753638A (en) * | 2020-05-03 | 2020-10-09 | 深圳奥比中光科技有限公司 | Pedestrian tracking method and system based on RGBD image |
CN111844101A (en) * | 2020-07-31 | 2020-10-30 | 中国科学技术大学 | Multi-finger dexterous hand sorting planning method |
US20200394848A1 (en) * | 2019-06-14 | 2020-12-17 | Magic Leap, Inc. | Scalable three-dimensional object recognition in a cross reality system |
US20220270327A1 (en) * | 2021-02-24 | 2022-08-25 | Denso International America, Inc. | Systems and methods for bounding box proposal generation |
US11436743B2 (en) * | 2019-07-06 | 2022-09-06 | Toyota Research Institute, Inc. | Systems and methods for semi-supervised depth estimation according to an arbitrary camera |
US11587338B2 (en) * | 2020-05-15 | 2023-02-21 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Three-dimensional object detection method, electronic device and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10482681B2 (en) * | 2016-02-09 | 2019-11-19 | Intel Corporation | Recognition-based object segmentation of a 3-dimensional image |
CN110745140B (en) | 2019-10-28 | 2021-01-01 | 清华大学 | Vehicle lane change early warning method based on continuous image constraint pose estimation |
CN111126269B (en) * | 2019-12-24 | 2022-09-30 | 京东科技控股股份有限公司 | Three-dimensional target detection method, device and storage medium |
-
2021
- 2021-07-27 US US17/386,531 patent/US11854255B2/en active Active
-
2022
- 2022-07-26 WO PCT/CN2022/107908 patent/WO2023005922A1/en unknown
- 2022-07-26 CN CN202280004525.5A patent/CN115777117A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9639084B2 (en) * | 2014-08-27 | 2017-05-02 | Honda Motor., Ltd. | Autonomous action robot, and control method for autonomous action robot |
US20180108137A1 (en) * | 2016-10-18 | 2018-04-19 | Adobe Systems Incorporated | Instance-Level Semantic Segmentation System |
US20200066036A1 (en) * | 2018-08-21 | 2020-02-27 | Samsung Electronics Co., Ltd. | Method and apparatus for training object detection model |
US20200394848A1 (en) * | 2019-06-14 | 2020-12-17 | Magic Leap, Inc. | Scalable three-dimensional object recognition in a cross reality system |
US11436743B2 (en) * | 2019-07-06 | 2022-09-06 | Toyota Research Institute, Inc. | Systems and methods for semi-supervised depth estimation according to an arbitrary camera |
CN111753638A (en) * | 2020-05-03 | 2020-10-09 | 深圳奥比中光科技有限公司 | Pedestrian tracking method and system based on RGBD image |
US11587338B2 (en) * | 2020-05-15 | 2023-02-21 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Three-dimensional object detection method, electronic device and readable storage medium |
CN111844101A (en) * | 2020-07-31 | 2020-10-30 | 中国科学技术大学 | Multi-finger dexterous hand sorting planning method |
US20220270327A1 (en) * | 2021-02-24 | 2022-08-25 | Denso International America, Inc. | Systems and methods for bounding box proposal generation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230326048A1 (en) * | 2022-03-24 | 2023-10-12 | Honda Motor Co., Ltd. | System, information processing apparatus, vehicle, and method |
CN116061187A (en) * | 2023-03-07 | 2023-05-05 | 睿尔曼智能科技(江苏)有限公司 | Method for identifying, positioning and grabbing goods on goods shelves by composite robot |
Also Published As
Publication number | Publication date |
---|---|
US11854255B2 (en) | 2023-12-26 |
WO2023005922A1 (en) | 2023-02-02 |
CN115777117A (en) | 2023-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11308347B2 (en) | Method of determining a similarity transformation between first and second coordinates of 3D features | |
US11216971B2 (en) | Three-dimensional bounding box from two-dimensional image and point cloud data | |
US10346684B2 (en) | Visual search utilizing color descriptors | |
US10217195B1 (en) | Generation of semantic depth of field effect | |
US20190147221A1 (en) | Pose estimation and model retrieval for objects in images | |
US20180137644A1 (en) | Methods and systems of performing object pose estimation | |
US9417700B2 (en) | Gesture recognition systems and related methods | |
US11854255B2 (en) | Human-object scene recognition method, device and computer-readable storage medium | |
US9792491B1 (en) | Approaches for object tracking | |
US9292927B2 (en) | Adaptive support windows for stereoscopic image correlation | |
CN110060205B (en) | Image processing method and device, storage medium and electronic equipment | |
US20230245373A1 (en) | System and method for generating a three-dimensional photographic image | |
JPWO2010073929A1 (en) | Person determination device, method and program | |
US20200111233A1 (en) | Adaptive virtual camera for indirect-sparse simultaneous localization and mapping systems | |
US11461921B2 (en) | Program, system, electronic device, and method for recognizing three-dimensional object | |
US20150262362A1 (en) | Image Processor Comprising Gesture Recognition System with Hand Pose Matching Based on Contour Features | |
EP4107650A1 (en) | Systems and methods for object detection including pose and size estimation | |
US20190340773A1 (en) | Method and apparatus for a synchronous motion of a human body model | |
KR20210018114A (en) | Cross-domain metric learning system and method | |
US20220068024A1 (en) | Determining a three-dimensional representation of a scene | |
KR102467010B1 (en) | Method and system for product search based on image restoration | |
CN113112398A (en) | Image processing method and device | |
KR102401626B1 (en) | Method and system for image-based product search | |
KR102287478B1 (en) | Electronic apparatus and method for identifying false detection of object by reflection in indoor environment | |
KR101215569B1 (en) | Method of segmenting for label region |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UBTECH ROBOTICS CORP LTD, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DONG, CHUQIAO;SHAO, DAN;XIU, ZHEN;AND OTHERS;REEL/FRAME:056997/0036 Effective date: 20210727 Owner name: UBTECH NORTH AMERICA RESEARCH AND DEVELOPMENT CENTER CORP, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DONG, CHUQIAO;SHAO, DAN;XIU, ZHEN;AND OTHERS;REEL/FRAME:056997/0036 Effective date: 20210727 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: UBKANG (QINGDAO) TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:UBTECH NORTH AMERICA RESEARCH AND DEVELOPMENT CENTER CORP;UBTECH ROBOTICS CORP LTD;REEL/FRAME:062319/0268 Effective date: 20230105 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |