CN111369688A

CN111369688A - Cognitive navigation method and system expressed by structured scene

Info

Publication number: CN111369688A
Application number: CN202010166282.8A
Authority: CN
Inventors: 陈崇雨; 于帮国
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2020-07-03
Anticipated expiration: 2040-03-11
Also published as: CN111369688B

Abstract

The invention discloses a cognitive navigation method and a system expressed by a structured scene, wherein the method comprises the following steps: obtaining target two-dimensional information and three-dimensional information in each frame of image by using the obtained target scene image sequence and the parameters of the image obtaining equipment; obtaining optimal scene graph information according to the target two-dimensional information, the three-dimensional information and the target prior common knowledge information; processing a scene graph formed by continuous multi-frame images to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph; and acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating. According to the invention, the target scene image sequence and the target prior knowledge are combined to obtain the local scene graph, the local scene graph is merged and updated to obtain the global scene graph, the three-dimensional scene graph construction can be carried out on multiple scenes and scenes containing finer granularity, and the ordering of the targets in the three-dimensional scene graph and the navigation accuracy are improved.

Description

Cognitive navigation method and system expressed by structured scene

Technical Field

The invention relates to the technical field of image processing, in particular to a cognitive navigation method and system for structural scene expression.

Background

At present, with the rapid development of computer graphics image processing technology, a three-dimensional virtual scene can vividly and vividly reproduce a real scene from a plane scene picture, so that good visual effect and visual experience are brought to people, and the demand of the three-dimensional visualization technology shows an obvious growing trend, so that how to create a required three-dimensional scene picture is more and more widely concerned and researched, and the three-dimensional scene picture is widely applied to various industries. The existing scene graph generation technology is divided into: firstly, generating a scene graph based on a single image by utilizing a deep learning technology; secondly, identifying objects by using a target detection algorithm, positioning the objects by combining a mapping technology, and directly extracting the relation of each object in the three-dimensional scene map; thirdly, identifying the object by using a target detection algorithm, positioning the object by combining a mapping technology, taking a practical data set as a statistical basis, and extracting the relation of each object in the three-dimensional scene map. However, the relationships between objects in the scene graph generated by the prior art are disordered and disordered, and the relationships between the objects cannot be well represented in a structured manner.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the defects that the relationships of objects in a scene graph generated by the prior art are disordered and the relationships among the objects cannot be well structurally represented, thereby providing a cognitive navigation method and a system expressed by a structured scene.

In order to achieve the purpose, the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a cognitive navigation method expressed in a structured scene, including: acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence; obtaining two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment; obtaining optimal scene graph information according to two-dimensional information, three-dimensional information and target prior common knowledge information of each target in each frame of image; processing scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merging and updating the local scene graphs to generate a global scene graph; and acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating.

In one embodiment, the target prior knowledge information is obtained before obtaining the target scene image, and the process of obtaining the target prior knowledge information includes: screening and cleaning the preset target data set, classifying the screened and cleaned preset target data set according to a preset scene image classification method, and generating various types of scene images; counting the occurrence probability of the target in each type of scene image, the probability of the target attribute and the probability of the relationship; constructing an and-or graph structure according to the attributes of the targets and the relationship between the targets; and filling the probability of the target in each type of scene image, the probability of the target attribute and the probability of the relationship into an and-or graph structure to generate target prior general knowledge information.

In an embodiment, the step of obtaining two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition device includes: estimating the pose of the image acquisition equipment by using an image sequence and an SLAM method, and acquiring the pose of the image acquisition equipment by using the position of the first frame image as a coordinate origin; detecting all targets in each frame of image by using a color image and target detection method, and acquiring two-dimensional information of each target in each frame of image; and acquiring three-dimensional information of the targets in each frame of image according to the depth image, the pose of the image acquisition equipment, the parameters of the image acquisition equipment and the two-dimensional information of each target, wherein the three-dimensional information comprises three-dimensional coordinate information and three-dimensional bounding box information of the targets.

In an embodiment, the step of obtaining the optimal scene graph information according to the two-dimensional information, the three-dimensional information and the prior knowledge information of each target in each frame of image includes: estimating the relation between the targets and the probability thereof according to the three-dimensional information of each target in each frame of image, wherein the probability comprises the probability of the occurrence of the target in each frame of image, the probability of the attribute of the target and the probability of the relation of the target; generating a scene graph of each frame of image according to the relation between corresponding targets in each frame of image and the two-dimensional information of the targets; and optimizing each frame of image according to the prior common sense information of the targets, the relationship between the targets and the probability of the targets to obtain the optimal scene graph information.

In an embodiment, the step of processing a scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph includes: storing initial scene graph information of a preset number of frames to be optimized into a group to be optimized; when the occurrence frequency of any target in the initial scene graph of the group to be optimized is smaller than a preset occurrence frequency threshold value, filtering the target from the initial scene graph to generate a filtered scene graph set; recalculating the average value of the three-dimensional information of all targets in the filtered scene graph set, and generating a local scene graph according to the recalculated target three-dimensional information average value; and merging and updating the plurality of local scene graphs according to the target coordinate information, the target category information and the target information in the generated global scene graph to generate the global scene graph.

In an embodiment, the step of processing a scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph further includes: and when the number of the frames in the group to be optimized exceeds the preset number of the frames to be optimized, filtering and optimizing the scene graph of the key frame group to generate a local scene graph.

In one embodiment, the step of updating the plurality of partial scene graphs includes: carrying out target relation calculation again by using the generated targets in the global scene graph, and generating a target relation calculation result; and adding a new target, a new target relation or updating an old target relation in the global scene graph which is not generated according to the target relation calculation result.

In a second aspect, an embodiment of the present invention provides a cognitive navigation system expressed in a structured scene, including: the image and image sequence acquisition module is used for acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence; the two-dimensional information and three-dimensional information acquisition module is used for acquiring two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment; the optimal scene image information acquisition module is used for acquiring optimal scene image information according to two-dimensional information, three-dimensional information and target prior common knowledge information of each target in each frame of image; the global scene graph generating module is used for processing scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merging and updating the local scene graphs to generate global scene graphs; and the path planning and navigation module is used for acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating.

In a third aspect, an embodiment of the present invention provides a terminal device, including: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to cause the at least one processor to perform a method for cognitive navigation of a structured scenario representation of a first aspect of an embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the cognitive navigation method expressed in a structured scene according to the first aspect of the embodiment of the present invention.

The technical scheme of the invention has the following advantages:

1. according to the cognitive navigation method and system for structural scene expression, the target scene graph, the image sequence and the target prior knowledge are combined to obtain the local scene graph, and the local scene graphs are combined and updated to obtain the global scene graph, so that three-dimensional scene graph construction can be performed on multiple scenes and scenes containing finer granularity, and the ordering of the targets in the three-dimensional scene graph and the navigation accuracy are improved; and the structural expression of the target prior knowledge is introduced, so that more scene information can be acquired in a limited perception information range, and the image construction effect is further optimized for the scene image generated by detection.

2. According to the cognitive navigation method and system for structural scene expression, the two-dimensional information of the target and the pose of the image acquisition equipment are acquired, the structural information comprising the relationship among the object attribute, the three-dimensional coordinate and the target is estimated by combining the depth image, the optimal scene graph information is obtained according to the prior common knowledge information of the target, and the accuracy of scene graph construction is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a specific example of a cognitive navigation method expressed in a structured scene according to an embodiment of the present invention;

FIG. 2 is a flowchart of a specific example of generating target prior knowledge information according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a specific example of an AND-OR structure provided by an embodiment of the present invention;

fig. 4 is a flowchart of a specific example of acquiring three-dimensional information according to an embodiment of the present invention;

fig. 5 is a flowchart of a specific example of obtaining optimal scene graph information according to an embodiment of the present invention;

fig. 6 is a flowchart of a specific example of generating a global scene graph according to an embodiment of the present invention;

fig. 7 is a module composition diagram of a specific example of a cognitive navigation system expressed by a structured scene according to an embodiment of the present invention;

fig. 8 is a composition diagram of a specific example of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Example 1

The embodiment of the invention provides a cognitive navigation method expressed by a structured scene, which is applied to the fields of scene graph construction, three-dimensional reconstruction, automation control technology and the like, and as shown in a figure 1, the cognitive navigation method expressed by the structured scene comprises the following steps:

step S11: and acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence.

According to the embodiment of the invention, the image acquisition equipment can be used for sequentially and continuously acquiring the target scene images of all targets in the scene at different time and different directions to obtain the corresponding depth image sequence and color image sequence, and the depth images which are not easily influenced by illumination, shadow and the like are used for replacing the gray level images in the prior art and combining the color images, so that the accuracy of the constructed three-dimensional scene image and the orderliness of the targets in the scene image are improved.

Step S12: and obtaining two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment.

The embodiment of the invention detects two-dimensional information of all targets in each frame of target scene image and each frame of image sequence by using a preset target detection and identification method, wherein the two-dimensional information comprises the type and the attribute of the target, the occurrence probability of the target in the scene image, target two-dimensional bounding box information, target ID and the like, the target scene image and the image sequence are mainly color images and color image sequences, and the preset target detection and identification method can be a yolo-v3 target detection method.

According to the image sequence, the image acquisition equipment is positioned by adopting a laser or visual positioning and map building algorithm, the initial pose of the image acquisition equipment is estimated, and meanwhile, the corner points can be tracked by utilizing an ORB-SLAM algorithm. And obtaining the coordinates and the three-dimensional bounding box of the target in the three-dimensional space according to the initial pose of the camera, the depth image, the two-dimensional information and the parameters of the image acquisition equipment.

It should be noted that the various algorithms mentioned in the embodiments of the present invention are only used for illustration, but not for limitation.

Step S13: and obtaining optimal scene graph information according to the two-dimensional information and the three-dimensional information of each target in each frame of image and the prior common knowledge information of the targets.

The embodiment of the invention uses the three-dimensional coordinate information and the three-dimensional bounding box information in each frame of image information to carry out the relation estimation between the targets, obtains the class probability and the relation probability of the targets, carries out the maximum posterior reasoning on the structural expression of the prior common sense information of the targets, and obtains the optimal scene graph information, wherein the optimal scene graph information comprises the optimized target class, the optimized attribute and the relation between the targets.

Step S14: processing scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merging and updating the local scene graphs to generate a global scene graph.

In order to further improve the accuracy of the three-dimensional scene graph construction, the embodiment of the invention optimizes the initial image formed by the preset number of frames to be optimized in real time, puts the preset number of frames to be optimized into the group to be optimized, and when the number of the frames to be optimized exceeds the preset number of frames to be optimized, directly carrying out optimization filtering processing on the initial scene graph, aiming at the frames to be optimized with the number exceeding the preset number, and when the occurrence frequency of a certain target is less than the preset occurrence frequency threshold value, filtering the target from the initial scene graph, meanwhile, the three-dimensional information average value of the target in the optimized initial scene graph is recalculated, a local scene graph is generated, and merging and updating the plurality of local scene graphs according to the target coordinate information, the target category information and the target information in the generated global scene graph to generate the global scene graph.

Step S15: and acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating.

When the terminal or the equipment receives the navigation instruction, the terminal or the equipment can search and inquire in the global scene graph according to the type, the attribute and the relation of the object, find the corresponding object, then inquire the coordinate of the corresponding object, plan a path according to the target coordinate and perform navigation.

According to the cognitive navigation method expressed by the structured scene, the target scene graph, the image sequence and the target prior knowledge are combined to obtain the local scene graph, and the local scene graphs are combined and updated to obtain the global scene graph, so that the three-dimensional scene graph construction can be carried out on multiple scenes and scenes containing finer granularity, and the ordering of the targets in the three-dimensional scene graph and the navigation accuracy are improved; and the structural expression of the target prior knowledge is introduced, so that more scene information can be acquired in a limited perception information range, and the image construction effect is further optimized for the scene image generated by detection.

In one embodiment, as shown in fig. 2, the target prior knowledge information is obtained before obtaining the target scene image, and the process of obtaining the target prior knowledge information includes:

step S21: and screening and cleaning the preset target data set, classifying the screened and cleaned preset target data set according to a preset scene image classification method, and generating various types of scene images.

The embodiment of the invention takes a target data set as a characteristic extraction basis, for example: and screening and cleaning the data set to remove images of all included people, classifying the images of the screened and cleaned preset target data set according to a plurality of functional scenes such as a kitchen, a living room, a bedroom, a conference room, an office, a restaurant, a washroom and the like, wherein the classification rule can be set in a user-defined mode according to needs.

Step S22: and counting the probability of the appearance of the target, the probability of the attribute of the target and the probability of the relationship in each type of scene image.

Step S23: and constructing an and-or graph structure according to the attributes of the objects and the relationship between the objects.

And (3) counting the probability of the occurrence of the target, the probability of the attribute of the target and the probability of the relationship between the targets in each frame of image of each scene, and constructing an and-or graph structure as shown in fig. 3 according to the attribute of the target and the relationship between the targets.

The and-or graph structure shown in fig. 3 mainly includes and nodes and or nodes, the and nodes represent the splitting of an object, or the nodes represent multiple choices of a target, the terminal nodes include address nodes and attribute nodes, and the attribute nodes are target attributes such as color and coordinates.

Step S24: and filling the probability of the target in each type of scene image, the probability of the target attribute and the probability of the relationship into an and-or graph structure to generate target prior general knowledge information.

And filling statistical results (object types, attributes, the relation between the objects, the probabilities of the first three items and the like) of the probability of the object occurrence, the probability of the object attribute and the probability of the relation between the objects in each frame of image of each scene into the or node structure of the corresponding object in the and-or graph structure, so that a plurality of kinds of object prior common knowledge information can be obtained in the and-or graph, and the actual scene situation is covered by the object prior common knowledge information.

In a specific embodiment, as shown in fig. 4, the step of obtaining the two-dimensional information and the three-dimensional information of each target in each frame of image by using the target scene image, the image sequence, and the parameters of the image acquisition device includes:

step S121: and estimating the pose of the image acquisition equipment by using an image sequence and an SLAM method, and acquiring the pose of the image acquisition equipment by using the position of the first frame image as a coordinate origin.

According to the embodiment of the invention, the pose of the image acquisition equipment is estimated according to the image sequence by using a preset pose estimation algorithm, and the pose of the image acquisition equipment with the position of the first frame image as the origin of coordinates is acquired. For example, when the robot carries camera equipment, the real-time pose of the mobile robot can be estimated by using a laser or vision SLAM algorithm, and the initial pose of the camera can be estimated by using an ORB-SLAM algorithm for angular point tracking.

Step S122: and detecting all targets in each frame of image by using a color image and target detection method, and acquiring two-dimensional information of each target in each frame of image.

All the targets appearing in each frame of color image are detected by using a target detection algorithm, and two-dimensional information of each target is obtained, where the two-dimensional information may include a target type, a probability of appearance in each frame of image, two-dimensional bounding box information of the target, two-dimensional coordinates, and the like.

Step S123: and acquiring three-dimensional information of the targets in each frame of image according to the depth image, the pose of the image acquisition equipment, the parameters of the image acquisition equipment and the two-dimensional information of each target, wherein the three-dimensional information comprises three-dimensional coordinate information and three-dimensional bounding box information of the targets.

And calculating the coordinate of the target in the three-dimensional space and the three-dimensional bounding box information by utilizing the target type and the two-dimensional bounding box information identified by target detection and combining the depth image, the pose of the image acquisition equipment and the parameters of the image acquisition equipment. In the embodiment of the present invention, calculating the three-dimensional bounding box information may be divided into the following two cases: for a target with a small volume, the three-dimensional bounding box can be approximated according to the two-dimensional bounding box, and for a target with a large volume, the three-dimensional bounding box of the target can be set through labeling of the two-dimensional code.

In a specific embodiment, as shown in fig. 5, the step of obtaining the optimal scene graph information according to the two-dimensional information, the three-dimensional information, and the prior knowledge information of each target in each frame of image includes:

step S131: and estimating the relation between the objects and the probability thereof according to the three-dimensional information of each object in each frame of image, wherein the probability comprises the probability of the object appearing in each frame of image, the probability of the object attribute and the probability of the object relation.

And estimating the relationship between the objects in each frame of image and the probability thereof by using the obtained three-dimensional coordinate information and the three-dimensional bounding box information of the objects. The estimation of the relationship between targets can be classified into the following two cases: 1. judging whether the supported targets are above the supporting targets or not so as to judge whether the targets have a supporting relation or not; 2. whether the included objects are in the interior of the included objects is judged, so that whether the objects have the inclusion relationship is judged. 3. And judging whether the targets are irrelevant by judging whether the support key and the inclusion relation exist among the targets.

Step S132: and generating a scene graph of each frame of image according to the relation between the corresponding targets in each frame of image and the two-dimensional information of the targets.

Step S133: and optimizing each frame of image according to the prior common sense information of the targets, the relationship between the targets and the probability of the targets to obtain the optimal scene graph information.

And generating a scene graph of each frame of image according to the relationship among the corresponding targets in each frame of image, the two-dimensional information (the attributes and the types of the targets) of the targets and the structural expression of the prior common sense information of the targets.

Carrying out maximum posterior reasoning on the target relation, the attribute and the like in each frame of image scene graph to obtain the optimized object type, attribute and relation, wherein the maximum posterior reasoning can be implemented according to the formula (1):

wherein PG is an analysis graph corresponding to image information, and p (PG | Γ, g)_ε) Is the posterior probability. g_εFor the and-or graph random syntax, Γ is the input image data, including the object type Γ_oThree-dimensional spatial relationship gamma_SAnd an object property Γ_A。

In a specific embodiment, as shown in fig. 6, the step of processing a scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph includes:

step S141: and storing the initial scene graph information of the frames to be optimized in a preset number into the group to be optimized.

The method and the device for optimizing the scene graph have the advantages that the initial scene graph information of the frames to be optimized of the preset number of continuous multiframes is stored in the group to be optimized, the capacity of the group to be optimized is set to be the preset number, and when the number of the frames in the group to be optimized exceeds the preset number of the frames to be optimized, the initial scene graph is optimized.

Step S142: and when the occurrence frequency of any target in the initial scene graph of the group to be optimized is smaller than a preset occurrence frequency threshold value, filtering the target from the initial scene graph to generate a filtered scene graph set.

According to the embodiment of the invention, when the occurrence frequency of any target in the initial scene graph of the group to be optimized is smaller than the preset occurrence frequency threshold value, objects appearing in short time due to target detection misrecognition in the scene are removed, and the graph establishing precision is improved.

Step S143: and recalculating the average value of the three-dimensional information of all the targets in the filtered scene graph set, and generating a local scene graph according to the recalculated target three-dimensional information average value.

According to the method and the device for generating the local scene graph, after the objects appearing in a short time are removed, the filtered scene graph set is subjected to mean value calculation again on the three-dimensional coordinates and the boundary frames of each object in the group to be optimized to serve as the three-dimensional coordinates and the three-phase boundary frames of the optimized objects, and the local scene graph is generated according to the three-dimensional coordinates and the three-phase boundary frames of the optimized objects.

Step S144: and merging and updating the plurality of local scene graphs according to the target coordinate information, the target category information and the target information in the generated global scene graph to generate the global scene graph.

The embodiment of the invention utilizes the generated targets in the global scene graph to carry out target relation calculation again and generate a target relation calculation result; and adding a new target, a new target relation or an updated target old relation in the global scene graph which is not generated according to the target relation calculation result, thereby generating the global scene graph in real time.

In a specific embodiment, the step of processing a scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph further includes:

and when the number of the frames in the group to be optimized exceeds the preset number of the frames to be optimized, filtering and optimizing the scene graph of the key frame group to generate a local scene graph.

According to the cognitive navigation method expressed by the structured scene, the target scene graph, the image sequence and the target prior knowledge are combined to obtain the local scene graph, and the local scene graphs are combined and updated to obtain the global scene graph, so that the three-dimensional scene graph construction can be carried out on multiple scenes and scenes containing finer granularity, and the ordering of the targets in the three-dimensional scene graph and the navigation accuracy are improved; the structural expression of the target prior knowledge is introduced, so that more scene information can be acquired in a limited perception information range, and the image construction effect is further optimized for the scene image generated by detection; by acquiring the two-dimensional information of the target and the pose of the image acquisition equipment and combining the depth image, the method estimates the structural information containing the relationship among the object attribute, the three-dimensional coordinate and the target, and obtains the optimal scene graph information according to the prior common knowledge information of the target, thereby improving the accuracy of scene graph construction.

Example 2

An embodiment of the present invention provides a cognitive navigation system expressed in a structured scene, as shown in fig. 7, including:

the image and image sequence acquisition module 1 is used for acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence; this module executes the method described in step S1 in embodiment 1, and is not described herein again.

The two-dimensional information and three-dimensional information acquisition module 2 is used for acquiring two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment; this module executes the method described in step S2 in embodiment 1, and is not described herein again.

The optimal scene graph information acquisition module 3 is used for acquiring optimal scene graph information according to two-dimensional information, three-dimensional information and target prior common knowledge information of each target in each frame of image; this module executes the method described in step S3 in embodiment 1, and is not described herein again.

The global scene graph generating module 4 is configured to process scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merge and update the local scene graphs to generate a global scene graph; this module executes the method described in step S4 in embodiment 1, and is not described herein again.

The path planning and navigation module 5 is used for acquiring target coordinates according to target information in the global scene graph, planning a path according to the target coordinates and navigating; this module executes the method described in step S5 in embodiment 1, and is not described herein again.

The cognitive navigation system expressed by the structured scene combines the target scene graph, the image sequence and the target prior knowledge to obtain the local scene graph, and combines and updates the local scene graph to obtain the global scene graph, so that the three-dimensional scene graph construction can be carried out on multiple scenes and scenes containing finer granularity, and the ordering of the targets in the three-dimensional scene graph and the navigation accuracy are improved; the structural expression of the target prior knowledge is introduced, so that more scene information can be acquired in a limited perception information range, and the image construction effect is further optimized for the scene image generated by detection; by acquiring the two-dimensional information of the target and the pose of the image acquisition equipment and combining the depth image, the method estimates the structural information containing the relationship among the object attribute, the three-dimensional coordinate and the target, and obtains the optimal scene graph information according to the prior common knowledge information of the target, thereby improving the accuracy of scene graph construction.

Example 3

An embodiment of the present invention provides a terminal device, as shown in fig. 8, including: at least one processor 401, such as a CPU (Central Processing Unit), at least one communication interface 403, memory 404, and at least one communication bus 402. Wherein a communication bus 402 is used to enable connective communication between these components. The communication interface 403 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 403 may also include a standard wired interface and a standard wireless interface. The Memory 404 may be a RAM (random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 404 may optionally be at least one memory device located remotely from the processor 401. Wherein the processor 401 may execute the cognitive navigation method of the structured scene representation of embodiment 1. A set of program codes is stored in the memory 404 and the processor 401 invokes the program codes stored in the memory 404 for performing the cognitive navigation method of the structured scene representation of embodiment 1.

The communication bus 402 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in FIG. 8, but this does not represent only one bus or one type of bus.

The memory 404 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviation: HDD), or a solid-state drive (english: SSD); the memory 404 may also comprise a combination of memories of the kind described above.

The processor 401 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.

The processor 401 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The aforementioned PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 404 is also used to store program instructions. The processor 401 may call a program instruction to implement the cognitive navigation method of implementing the structured scene representation in embodiment 1.

The embodiment of the invention also provides a computer-readable storage medium, wherein computer-executable instructions are stored on the computer-readable storage medium, and the computer-executable instructions can execute the cognitive navigation method expressed by the structured scene in the embodiment 1. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

1. A cognitive navigation method expressed by a structured scene is characterized by comprising the following steps:

acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence;

obtaining two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment;

obtaining optimal scene graph information according to two-dimensional information, three-dimensional information and target prior common knowledge information of each target in each frame of image;

processing scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merging and updating the local scene graphs to generate a global scene graph;

and acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating.

2. The method of cognitive navigation expressed in a structured scene according to claim 1, wherein the target prior knowledge information is obtained before obtaining the target scene image, and the process of obtaining the target prior knowledge information comprises:

screening and cleaning the preset target data set, classifying the screened and cleaned preset target data set according to a preset scene image classification method, and generating various types of scene images;

counting the occurrence probability of the target in each type of scene image, the probability of the target attribute and the probability of the relationship;

constructing an and-or graph structure according to the attributes of the targets and the relationship between the targets;

and filling the probability of the target in each type of scene image, the probability of the target attribute and the probability of the relationship into an and-or graph structure to generate target prior general knowledge information.

3. The cognitive navigation method expressed in a structured scene according to claim 1, wherein the step of obtaining the two-dimensional information and the three-dimensional information of each target in each frame of image by using the images of the target scene, the image sequence and the parameters of the image acquisition device comprises:

estimating the pose of the image acquisition equipment by using an image sequence and an SLAM method, and acquiring the pose of the image acquisition equipment by using the position of the first frame image as a coordinate origin;

detecting all targets in each frame of image by using a color image and target detection method, and acquiring two-dimensional information of each target in each frame of image;

and acquiring three-dimensional information of the target in each frame of image according to the depth image, the pose of the image acquisition equipment, the parameter of the image acquisition equipment and the two-dimensional information of each target, wherein the three-dimensional information comprises three-dimensional coordinate information and three-dimensional bounding box information of the target.

4. The cognitive navigation method expressed in the structured scene according to claim 1, wherein the step of obtaining the optimal scene graph information according to the two-dimensional information, the three-dimensional information and the prior common sense information of each target in each frame of image comprises:

estimating the relation between the targets and the probability thereof according to the three-dimensional information of each target in each frame of image, wherein the probability comprises the probability of the occurrence of the target in each frame of image, the probability of the attribute of the target and the probability of the relation of the target;

generating a scene graph of each frame of image according to the relation between corresponding targets in each frame of image and the two-dimensional information of the targets;

and optimizing each frame of image according to the prior common sense information of the targets, the relationship between the targets and the probability of the targets to obtain the optimal scene graph information.

5. The cognitive navigation method expressed by a structured scene according to claim 1, wherein the step of processing the scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph includes:

storing initial scene graph information of a preset number of frames to be optimized into a group to be optimized;

when the occurrence frequency of any target in the initial scene graph of the group to be optimized is smaller than a preset occurrence frequency threshold value, filtering the target from the initial scene graph to generate a filtered scene graph set;

recalculating the average value of the three-dimensional information of all targets in the filtered scene graph set, and generating a local scene graph according to the recalculated target three-dimensional information average value;

and merging and updating the plurality of local scene graphs according to the target coordinate information, the target category information and the target information in the generated global scene graph to generate the global scene graph.

6. The cognitive navigation method expressed by a structured scene according to claim 1, wherein the step of processing the scene graph formed by a preset number of frames to be optimized to generate a local scene graph, and merging and updating the local scene graph to generate a global scene graph further comprises:

7. The cognitive navigation method of a structured scene representation according to claim 1, wherein the step of updating the plurality of local scene graphs comprises:

carrying out target relation calculation again by using the generated targets in the global scene graph, and generating a target relation calculation result;

and adding a new target, a new target relation or updating an old target relation in the global scene graph which is not generated according to the target relation calculation result.

8. A cognitive navigation system for structured scene representation, comprising:

the image and image sequence acquisition module is used for acquiring a target scene image by using image acquisition equipment to obtain a corresponding image sequence, wherein the image comprises a depth image and a color image, and the image sequence comprises a depth image sequence and a color image sequence;

the two-dimensional information and three-dimensional information acquisition module is used for acquiring two-dimensional information and three-dimensional information of each target in each frame of image by using the target scene image, the image sequence and the parameters of the image acquisition equipment;

the optimal scene image information acquisition module is used for acquiring optimal scene image information according to two-dimensional information, three-dimensional information and target prior common knowledge information of each target in each frame of image;

the global scene graph generating module is used for processing scene graphs formed by a preset number of frames to be optimized to generate local scene graphs, and merging and updating the local scene graphs to generate global scene graphs;

and the path planning and navigation module is used for acquiring target coordinates according to the target information in the global scene graph, planning a path according to the target coordinates and navigating.

9. A terminal device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for cognitive navigation of a structured scene representation as recited in any one of claims 1-8.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the method for cognitive navigation of a structured scene representation according to any one of claims 1 to 8.