CN115170637A

CN115170637A - Virtual visual angle image construction method and device, control equipment and readable storage medium

Info

Publication number: CN115170637A
Application number: CN202210804941.5A
Authority: CN
Inventors: 潘柏宇; 庞建新
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2022-07-08
Filing date: 2022-07-08
Publication date: 2022-10-11

Abstract

The application provides a virtual visual angle image construction method and device, control equipment and a readable storage medium, and relates to the technical field of computer vision. According to the method and the device, after a three-dimensional depth distribution diagram between a target robot and a target scene is determined according to a real binocular field shot by the target robot under a current real visual angle and according to a current camera focal length of the target robot and a virtual camera base line distance corresponding to the target virtual visual angle, the three-dimensional depth distribution diagram is converted to obtain a virtual binocular disparity map, then image migration is carried out according to the virtual binocular disparity map on the basis of the real binocular scene map to obtain an initial virtual binocular scene map, and the initial virtual binocular scene map is restored according to the three-dimensional depth distribution diagram to obtain the target virtual binocular scene map, so that binocular scene images with good image texture consistency under different visual angles are quickly constructed under the condition that the real camera visual angle does not need to be adjusted, and a binocular training image set is quickly constructed at low cost.

Description

Virtual visual angle image construction method and device, control equipment and readable storage medium

Technical Field

The application relates to the technical field of computer vision, in particular to a virtual visual angle image construction method and device, control equipment and a readable storage medium.

Background

With the continuous development of scientific technology, the computer vision technology has great research value and application value, and is widely valued by various industries to realize various business functions (for example, a robot movement obstacle avoidance function, a three-dimensional video construction function, and a three-dimensional model construction function). In the practical application process, the business function is usually realized by using a neural network model obtained by model training with a large number of training image sets, and the performance of the neural network model is seriously influenced if the data volume of the training image sets is sufficient or not.

For the binocular robot, the required service functions are usually implemented on the basis of binocular scene images acquired by the binocular camera of the robot, and a large number of binocular training image sets are required to be obtained for model training corresponding to the service functions. However, it should be noted that a binocular robot often has only one binocular camera position configuration (i.e., the corresponding observation view angle of the binocular robot is fixed), and the binocular training image set needs to be composed of binocular training images with different observation view angles. The method for constructing the binocular training image set through binocular image acquisition needs to spend a large cost to adjust the observation visual angle of the robot, and after the observation visual angle of the robot is adjusted each time, image correction operation of polar line alignment is carried out on shot binocular scene images, so that the binocular training images under the observation visual angle of the corresponding robot can be obtained. Therefore, the scheme for constructing the training set based on the image acquisition mode cannot quickly construct the expected binocular training image set, and the construction cost of the training image set is extremely high.

Disclosure of Invention

In view of this, an object of the present application is to provide a virtual perspective image construction method and apparatus, a control device, and a readable storage medium, which can quickly construct binocular scene images corresponding to a robot with good image texture consistency at different perspectives without adjusting the position configuration of a real camera of the robot, so as to quickly construct an expected binocular training image set, and reduce the construction cost of the training image set.

In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:

in a first aspect, the present application provides a method for constructing a virtual perspective image, where the method includes:

acquiring a real binocular scene graph shot by a target robot aiming at a target scene under a current real visual angle;

determining a stereoscopic depth distribution map between the target robot and the target scene according to the real binocular scene map;

according to the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual visual angle, performing depth parallax conversion on the stereoscopic depth distribution diagram to obtain a virtual binocular parallax diagram of the target robot for a target scene under the target virtual visual angle;

performing image migration on the real binocular scene graph according to the virtual binocular disparity map to obtain an initial virtual binocular scene graph matched with a target virtual visual angle;

and performing image restoration on the initial virtual binocular scene graph according to the stereo depth distribution graph to obtain a target virtual binocular scene graph.

In an optional embodiment, the real binocular scene graph includes a real left-eye scene graph and a real right-eye scene graph, and the step of determining the stereoscopic depth distribution map between the target robot and the target scene according to the real binocular scene graph includes:

respectively extracting features of the real left-eye scene graph and the real right-eye scene graph to obtain a corresponding left-eye scene feature graph and a corresponding right-eye scene feature graph;

determining a disparity estimation distribution diagram of a plurality of pre-stored disparity values between the left-eye scene characteristic diagram and the right-eye scene characteristic diagram;

performing image splicing on the parallax estimation distribution maps corresponding to all the parallax values respectively, and calling a prestored parallax regression model to perform parallax regression on the binocular vision difference distribution maps obtained through splicing to obtain a real binocular parallax map corresponding to the current real visual angle;

and performing parallax depth conversion on the real binocular parallax map according to the current camera focal length of the target robot and the real camera baseline distance corresponding to the current real visual angle to obtain the stereo depth distribution map.

In an optional embodiment, the step of determining a disparity estimation distribution map of each of the pre-stored multiple disparity values between the left-eye scene feature map and the right-eye scene feature map includes:

aiming at each preset parallax value, performing characteristic offset on the right-eye scene characteristic diagram according to the parallax value under a characteristic capture window of the right-eye scene characteristic diagram to obtain a corresponding offset characteristic diagram;

image splicing is carried out on the offset characteristic image corresponding to the parallax value and the real left-eye scene image, and a binocular parallax characteristic image corresponding to the parallax value is obtained;

and calling a pre-stored parallax matching network model based on a Transformer model, and performing pixel level feature matching on the binocular parallax feature map corresponding to the parallax value to obtain a parallax estimation distribution map corresponding to the parallax value.

In an optional embodiment, the step of calling a pre-stored parallax regression model to perform parallax regression on the binocular differential layout obtained by splicing to obtain a real binocular parallax map corresponding to the current real viewing angle includes:

inputting the binocular disparity distribution map into the disparity regression model to perform disparity distribution redistribution processing, so as to obtain all the occurable disparity values at each pixel point in the real binocular disparity map and the respective occurrence probabilities of all the occurable disparity values;

and aiming at each pixel point in the real binocular disparity map, performing weighted summation operation on all the occupiable disparity values corresponding to the pixel point and the respective occurrence probabilities of all the occupiable disparity values to obtain the real disparity value of the real binocular disparity map at the pixel point.

In an alternative embodiment, the disparity depth correlation required to perform the disparity depth conversion operation or the depth disparity conversion operation is expressed by the following equation:

wherein D is used for representing the parallax value of the photographed object between the left-eye camera and the right-eye camera of the target robot which are at the same base line, b is used for representing the camera base line distance between the left-eye camera and the right-eye camera of the target robot, f is used for representing the camera focal length of the target robot, and Z is used for representing the stereo depth value between the photographed object and the target robot.

In an optional embodiment, the step of performing image restoration on the initial virtual left-eye scene image according to the stereoscopic depth distribution map to obtain a target virtual left-eye scene image includes:

determining a stereoscopic depth value of each effective pixel point in the initial virtual left-eye scene image according to the stereoscopic depth distribution map;

for each first pixel point to be repaired in the initial virtual left-eye scene graph, calling a left-eye repairing template to determine a plurality of first effective reference pixel points around the first pixel point to be repaired in the initial virtual left-eye scene graph;

screening a plurality of first calibration pixels of which the relative position distribution, the depth level and the color characteristics are consistent with those of the first effective reference pixels in the initial virtual left-eye scene graph according to the three-dimensional depth value and the pixel position of each effective pixel in the initial virtual left-eye scene graph, wherein if the absolute value of the difference value between the three-dimensional depth values of two effective pixels is greater than a preset interval threshold value, the two effective pixels are in different depth levels;

calling the left-eye repairing template to search a first effective replacement pixel point in the initial virtual left-eye scene graph, wherein the first effective replacement pixel point is surrounded by the first calibration pixel points, and adopting the first effective replacement pixel point to perform pixel replacement processing on the first pixel point to be repaired to obtain an effective pixel point corresponding to the first pixel point to be repaired;

and taking the initial virtual left-eye scene graph after the pixel replacement operation of all the first pixel points to be repaired is completed as the target virtual left-eye scene graph.

In an optional embodiment, the initial virtual binocular scene graph further includes an initial virtual right-eye scene graph, the target virtual binocular scene graph further includes a target virtual right-eye scene graph, and the step of performing image restoration on the initial virtual right-eye scene graph according to the stereoscopic depth distribution map to obtain the target virtual right-eye scene graph includes:

determining the stereoscopic depth value of each effective pixel point in the initial virtual right-eye scene graph according to the stereoscopic depth distribution graph;

for each second pixel point to be repaired in the initial virtual right-eye scene graph, calling a right-eye repairing template to determine a plurality of second effective reference pixel points around the second pixel point to be repaired in the initial virtual right-eye scene graph;

according to the three-dimensional depth value and the pixel position of each effective pixel point in the initial virtual right-eye scene image, screening a plurality of second calibration pixel points of which the relative position distribution, the depth level and the color characteristics are consistent with those of the second effective reference pixel points in the initial virtual right-eye scene image;

calling the right-eye repairing template to search second effective replacement pixel points in the initial virtual right-eye scene graph, wherein the second effective replacement pixel points are surrounded by the plurality of second calibration pixel points, and adopting the second effective replacement pixel points to perform pixel replacement processing on the second pixel points to be repaired to obtain effective pixel points corresponding to the second pixel points to be repaired;

and taking the initial virtual right-eye scene graph after the pixel replacement operation of all the second pixel points to be repaired is completed as the target virtual right-eye scene graph.

In a second aspect, the present application provides a virtual perspective image construction apparatus, the apparatus comprising:

the real scene graph acquisition module is used for acquiring a real binocular scene graph shot by the target robot aiming at a target scene under a current real visual angle;

the stereoscopic depth map determining module is used for determining a stereoscopic depth distribution map between the target robot and the target scene according to the real binocular scene map;

the virtual parallax image conversion module is used for performing depth parallax conversion on the stereoscopic depth distribution map according to the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual visual angle to obtain a virtual binocular parallax image of the target robot for a target scene under the target virtual visual angle;

the virtual scene graph building module is used for carrying out image offset on the real binocular scene graph according to the virtual binocular disparity map to obtain an initial virtual binocular scene graph matched with a target virtual visual angle;

and the virtual scene graph restoration module is used for carrying out image restoration on the initial virtual binocular scene graph according to the stereo depth distribution graph to obtain a target virtual binocular scene graph.

In a third aspect, the present application provides a control device, which includes a processor and a memory, where the memory stores a computer program executable by the processor, and the processor can execute the computer program to implement the virtual perspective image constructing method described in any one of the foregoing embodiments.

In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the virtual perspective image construction method described in any one of the foregoing embodiments is implemented.

In this case, the beneficial effects of the embodiments of the present application may include the following:

after a stereoscopic depth distribution diagram between a target robot and a target scene is determined according to a real binocular scene diagram shot by the target robot under a current real visual angle, depth parallax conversion is carried out on the stereoscopic depth distribution diagram according to the current camera focal length of the target robot and a virtual camera baseline distance corresponding to a target virtual visual angle, a virtual binocular parallax diagram of the target robot for the target scene under the target virtual visual angle is obtained, image offset is carried out according to the virtual binocular parallax diagram on the basis of the real binocular scene diagram, an initial virtual binocular scene diagram matched with the target virtual visual angle is obtained, image restoration is carried out on the initial virtual binocular scene diagram according to the stereoscopic depth distribution diagram, a target virtual binocular scene diagram with good image texture consistency is obtained, therefore, binocular scene images with good image texture consistency of the corresponding robot under different visual angles are quickly built under the condition that the position configuration of the real camera of the robot does not need to be adjusted, an expected binocular image set is quickly built, and the building cost of the training image set is reduced.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic composition diagram of a control device provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a virtual perspective image construction method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating the sub-steps included in step S220 in FIG. 2;

FIG. 4 is a schematic flow chart of the sub-steps included in the sub-step S222 in FIG. 3;

FIG. 5 is a schematic flow chart of the substeps involved in substep S223 of FIG. 3;

fig. 6 is an effect display diagram of a parallax depth correlation provided in the embodiment of the present application;

FIG. 7 is a flowchart illustrating one of the sub-steps included in step S250 of FIG. 2;

FIG. 8 is a second schematic flowchart of the sub-steps included in step S250 in FIG. 2;

FIG. 9 is a schematic diagram illustrating the image inpainting effect of step S250 in FIG. 2;

fig. 10 is a schematic composition diagram of a virtual perspective image constructing apparatus according to an embodiment of the present application.

Icon: 10-a control device; 11-a memory; 12-a processor; 13-a communication unit; 100-virtual perspective image construction means; 110-a real scene graph acquisition module; 120-a stereoscopic depth map determination module; 130-virtual disparity map conversion module; 140-virtual scene graph building module; 150-virtual scenegraph repair module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present application, it is to be understood that relational terms such as the terms first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a control device 10 according to an embodiment of the present disclosure. In this embodiment, the control device 10 may be in communication connection with at least one binocular robot, and control the at least one binocular robot to respectively perform image acquisition on a current scene at a real camera view angle (i.e., maintain a real camera baseline distance state), so that the control device 10 simulates, for a real binocular scene graph acquired by each binocular robot, to construct virtual binocular scene images of the corresponding binocular robot at different view angles, thereby rapidly constructing an expected binocular training image set, and effectively reducing the training image set construction cost. The real camera visual angles of different binocular robots can be the same or different; the scenes of different binocular robots can be the same or different.

In addition, the control device 10 may also be deployed in a single binocular robot alone, so that the control device can only control the binocular robot in which the control device 10 is located, and a binocular scene image with good image texture consistency of the binocular robot under different viewing angles is quickly constructed for the binocular robot alone.

In this embodiment, the control device 10 may include a memory 11, a processor 12, a communication unit 13, and a virtual perspective image constructing apparatus 100. Wherein, the respective elements of the memory 11, the processor 12 and the communication unit 13 are electrically connected to each other directly or indirectly to realize the transmission or interaction of data. For example, the memory 11, the processor 12 and the communication unit 13 may be electrically connected to each other through one or more communication buses or signal lines.

In this embodiment, the Memory 11 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. Wherein, the memory 11 is used for storing a computer program, and the processor 12 can execute the computer program accordingly after receiving the execution instruction.

In this embodiment, the processor 12 may be an integrated circuit chip having signal processing capabilities. The Processor 12 may be a general-purpose Processor including at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and discrete hardware components. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that implements or executes the methods, steps and logic blocks disclosed in the embodiments of the present application.

In this embodiment, the communication unit 13 is configured to establish a communication connection between the control device 10 and one or more binocular robots through a network, and to transmit and receive data through the network, where the network includes a wired communication network and a wireless communication network. For example, the control apparatus 10 may perform an image capturing operation from controlling a corresponding binocular robot through the communication unit 13.

In the present embodiment, the virtual perspective image constructing apparatus 100 includes at least one software function module that can be stored in the memory 11 in the form of software or firmware or solidified in the operating system of the control device 10. The processor 12 may be used to execute executable modules stored in the memory 11, such as software functional modules and computer programs included in the virtual perspective image construction apparatus 100. The control device 10 rapidly simulates and constructs binocular scene images corresponding to the robot with good image texture consistency at different visual angles through the virtual visual angle image construction apparatus 100 without adjusting the position configuration of the real camera of the robot, so as to rapidly construct an expected binocular training image set and reduce the construction cost of the training image set.

It is to be understood that the block diagram shown in fig. 1 is only one constituent schematic diagram of the control device 10, and that the control device 10 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

In the present application, in order to ensure that the control device 10 can quickly construct binocular scene images corresponding to a robot with good image texture consistency under different viewing angles without adjusting the position configuration of a real camera of the robot, so as to quickly construct an expected binocular training image set and reduce the construction cost of the training image set, the embodiments of the present application achieve the foregoing objects by providing a virtual perspective image construction method. The following describes in detail a virtual perspective image construction method provided by the present application.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a virtual perspective image construction method according to an embodiment of the present disclosure. In the embodiment of the present application, the virtual perspective image construction method may include steps S210 to S250.

Step S210, acquiring a real binocular scene image shot by the target robot aiming at the target scene under the current real visual angle.

In this embodiment, the real binocular scene graph includes a real left-eye scene graph and a real right-eye scene graph, and the control device 10 may control the target robot to go to the target scene, and control the target robot to perform image acquisition on the target scene by using its own binocular camera at the current real viewing angle (i.e., the current real camera baseline distance), so as to obtain the real binocular scene graph.

And step S220, determining a three-dimensional depth distribution diagram between the target robot and the target scene according to the real binocular scene diagram.

In this embodiment, after obtaining the real binocular scene graph matched with the current real viewing angle of the target robot, the control device 10 may determine a binocular disparity map between the real right-eye scene graph and the real left-eye scene graph in the real binocular scene graph by calling a pre-stored disparity construction neural network model based on the cost volume, and perform disparity depth conversion on the binocular disparity map in combination with the current camera focal length of the target robot and the real camera baseline distance corresponding to the current real viewing angle, so as to obtain the stereoscopic depth distribution map. And the stereoscopic depth distribution map is used for representing the specific distribution condition of the distance information between the target robot and each object in the target scene.

It is worth noting that building a neural network model based on the cost volume disparity usually requires a large amount of memory, and a large amount of computing resources are consumed to obtain a final result during operation, so that the problems of high computing complexity and high memory loss exist.

Therefore, the embodiment of the application also provides a scheme for determining the stereoscopic depth distribution map, which is low in computational complexity and memory loss, and solves the problem of building the neural network model based on the cost volume parallax. At this time, the specific implementation process of the stereo depth profile determination scheme is as follows:

referring to fig. 3, fig. 3 is a flowchart illustrating sub-steps included in step S220 in fig. 2. In the embodiment of the present application, the step S220 may include a sub-step S221 to a sub-step S224.

In the substep S221, feature extraction is performed on the real left-eye scene image and the real right-eye scene image, respectively, to obtain a corresponding left-eye scene feature image and a corresponding right-eye scene feature image.

In this embodiment, the control device 10 may perform feature extraction on the real left-eye scene graph and the real right-eye scene graph from a bath (number of image frames) dimension, a channels (number of image channels) dimension, a height (number of pixels in an image vertical direction) dimension, and a width (number of pixels in an image horizontal direction) dimension, to obtain a right-eye scene feature graph matching the real right-eye scene graph, and a left-eye scene feature graph matching the real left-eye scene graph.

In the sub-step S222, a disparity estimation distribution diagram of the pre-stored disparity values between the left-eye scene feature map and the right-eye scene feature map is determined.

In this embodiment, the pre-stored multiple parallax values are used to represent all parallax values that can be supported by the current real viewing angle of the target robot, and the control device 10 may construct a corresponding parallax estimation distribution map for each parallax value between the left-eye scene feature map and the right-eye scene feature map, so as to intuitively know the distribution possibility of the corresponding parallax value between the left-eye scene feature map and the right-eye scene feature map through the parallax estimation distribution map corresponding to each parallax value.

Optionally, referring to fig. 4, fig. 4 is a flowchart illustrating the sub-steps included in the sub-step S222 in fig. 3. In the embodiment of the present application, the sub-step S222 may include sub-steps S2221 to S2223.

And a substep S2221, for each preset disparity value, performing feature offset on the right-eye scene feature map according to the disparity value in the feature capture window of the right-eye scene feature map to obtain a corresponding offset feature map.

In this embodiment, when performing feature offset on a right-eye scene feature map according to a certain disparity value, the control device 10 needs to ensure that the position of a feature capture window corresponding to the right-eye scene feature map is not changed, and at the same time, integrally offsets image features of the right-eye scene feature map located under the feature capture window in the image horizontal direction, and performs a feature capture operation on the right-eye scene feature map after the feature offset by using the feature capture window, so as to obtain an offset feature map matched with the disparity value.

In the substep S2222, the offset feature map corresponding to the disparity value is image-stitched with the real left-eye scene map, so as to obtain the binocular disparity feature map corresponding to the disparity value.

In the substep S2223, a pre-stored disparity matching network model based on the transform model is called, and pixel-level feature matching is performed on the binocular disparity feature map corresponding to the disparity value, so as to obtain a disparity estimation distribution map corresponding to the disparity value.

The disparity matching network model based on the transform model can perform feature block division on an input binocular disparity feature map, convert channel dimension feature contents of the correspondingly divided feature blocks into weight values of query dimensions, key dimensions and value dimensions which are mutually associated in a full connection layer, and then perform pixel level feature matching on each feature block and other feature blocks by using an Attention mechanism so as to determine the distribution possibility (namely, disparity estimation distribution diagram corresponding to the disparity value) of the corresponding disparity value between the left eye scene feature map and the right eye scene feature map based on a final pixel level feature matching result.

Therefore, by executing the substeps 2221 to S2223, the method and the device for calculating the stereoscopic depth can avoid that the parallax is retained at the control device 10 to construct a neural network, and can separately consider the distribution possibility of each parallax value at the real binocular scene image through the idea of parallax decoupling, so that the memory loss and the calculation resource loss required by the estimated parallax distribution condition can be effectively reduced, and the calculation complexity of the whole stereoscopic depth calculation process can be reduced.

And a substep S223, performing image splicing on the parallax estimation distribution maps corresponding to all the parallax values respectively, and calling a prestored parallax regression model to perform parallax regression on the binocular vision difference distribution maps obtained through splicing to obtain a real binocular parallax map corresponding to the current real visual angle.

In this embodiment, the disparity regression model pre-stored in the control device 10 may be constructed and formed by using a 3D convolution operation as a core, or may be obtained by training based on a Softargmin algorithm and a smoothed L1 norm (smooth L1) loss function, where the Softargmin algorithm can effectively improve the disparity regression accuracy of the disparity regression model, so as to ensure that the corresponding disparity regression model outputs a true binocular disparity map between the true left-eye scene graph and the true right-eye scene graph with high accuracy.

Optionally, referring to fig. 5, fig. 5 is a flowchart illustrating the sub-steps included in the sub-step S223 in fig. 3. In the embodiment of the present application, the disparity regression model is obtained by training based on a Softargmin algorithm, and the substep S223 may include substeps S2231 to substep S2232.

In the sub-step S2231, the binocular disparity distribution map is input into a disparity regression model for disparity distribution redistribution processing, so as to obtain all the occurable disparity values at each pixel point in the real binocular disparity map and the respective occurrence probabilities of all the occurable disparity values.

In the sub-step S2232, for each pixel point in the real binocular disparity map, a weighted summation operation is performed on all the occurable disparity values corresponding to the pixel point and the respective occurrence probabilities of all the occurable disparity values, so as to obtain a real disparity value of the real binocular disparity map at the pixel point.

Therefore, by executing the substeps 2231-2232, the present application uses the Softargmin algorithm to comprehensively consider the pre-estimated distribution conditions of the pre-stored disparity values at the real binocular disparity map, and accurately calculates the real distribution conditions of the disparity values substantially involved between the real left-eye scene map and the real right-eye scene map.

And a substep S224 of performing parallax depth conversion on the real binocular parallax image according to the current camera focal length of the target robot and the real camera baseline distance corresponding to the current real visual angle to obtain a stereo depth distribution diagram.

In this embodiment, referring to the effect display diagram of the parallax depth correlation shown in fig. 6, P is used to represent a subject to be photographed, O _L Projection center of left eye camera for representing target robot, O _R A projection center of a right-eye camera for representing a target robot, b a projection center of a left-eye camera for representing a distance between respective projection centers of the left-eye camera and the right-eye camera (i.e., a current camera baseline distance corresponding to a current real angle of view), f a camera focal length provided by a single camera at the target robot, L a number of imageable pixels of a camera plane of a corresponding camera (including the left-eye camera and the right-eye camera) in an imaging horizontal direction, P _L For representing the imaging point, P, of the object to be photographed in the camera plane of the left-eye camera _R For representing the imaged point of the imaged object on the camera plane of the right-eye camera, Z for representing the stereo depth value between the imaged object and the target robot, X _L For representing the left-eye camera imaging point P _L Distance, X, between left imaging boundaries to left eye camera _R For representing right-eye camera imaging point P _R The distance between the left imaging boundaries of the left-eye camera and the right-eye camera, and the parallax value of the photographed object between the left-eye camera and the right-eye camera is | X | _L -X _R In this case, the disparity depth correlation relationship may be expressed by the following equation:

Therefore, the control device 10 may determine, based on the parallax depth correlation, a stereoscopic depth value corresponding to each pixel point in the real binocular parallax image at the stereoscopic depth distribution map by combining the current camera focal length of the target robot and the real camera baseline distance corresponding to the current real visual angle, and at this time, may convert the real binocular parallax image into the stereoscopic depth distribution map.

Therefore, by executing the substeps 221 to the substep S224, the method and the device can effectively reduce the calculation complexity and the memory loss in the process of determining the stereoscopic depth distribution map, and improve the construction efficiency of the whole virtual visual angle image construction operation.

And step S230, performing depth parallax conversion on the stereoscopic depth distribution map according to the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual visual angle to obtain a virtual binocular parallax map of the target robot for the target scene under the target virtual visual angle.

In this embodiment, the target virtual views are views that need to be simulated and constructed and are specified by the user, and the number of the target virtual views may be multiple, and the multiple target virtual views are different from each other. At this time, the control device 10 may determine, based on the expression of the parallax depth association relationship, a virtual parallax value corresponding to each pixel point in the stereoscopic depth distribution map at a position corresponding to the virtual binocular parallax map by combining the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual perspective, and at this time, may convert the stereoscopic depth distribution map into the virtual binocular parallax map corresponding to the target virtual perspective.

And step S240, carrying out image offset on the real binocular scene graph according to the virtual binocular disparity map to obtain an initial virtual binocular scene graph matched with the target virtual visual angle.

In this embodiment, when calculating the virtual binocular disparity map of the target robot for the target scene under the target virtual viewing angle, the control device 10 may select the real left-eye scene map in the real binocular scene map as a reference, and equally allocate disparity values in the entire virtual binocular disparity map to the left-eye camera and the right-eye camera under the target virtual viewing angle to perform scene image reconstruction processing (i.e., image offset processing), so as to obtain an initial virtual left-eye scene map and an initial virtual left-eye scene map included in the initial virtual binocular scene map.

And step S250, carrying out image restoration on the initial virtual binocular scene graph according to the stereo depth distribution graph to obtain a target virtual binocular scene graph.

In this embodiment, after the control device 10 initially simulates and constructs a binocular scene graph of the corresponding robot under the target virtual viewing angle without adjusting the position configuration of the real camera of the robot, in order to avoid the existence of an obvious hole (hole) in the finally output binocular scene graph, the control device 10 needs to perform image restoration on the initial virtual binocular scene graph initially simulated and constructed, so as to ensure that the target virtual binocular scene graph obtained after restoration has good image texture consistency.

Optionally, referring to fig. 7, fig. 7 is a flowchart illustrating one of the sub-steps included in step S250 in fig. 2. In this embodiment of the present application, the target virtual binocular scene graph may include a target virtual left-eye scene graph, the step S250 performs image restoration on the initial virtual left-eye scene graph according to the stereoscopic depth distribution map, and the step of obtaining the target virtual left-eye scene graph may include sub-steps S251 to S255, so as to distinguish foreground and background conditions in a restoration process of the initial virtual left-eye scene graph, and avoid a foreground edge blurring phenomenon due to mixed foreground and background contents of a restored foreground texture.

And a substep S251, determining a stereoscopic depth value of each effective pixel point in the initial virtual left-eye scene image according to the stereoscopic depth distribution map.

In this embodiment, the effective pixel points in the initial virtual left-eye scene graph are non-hole pixel points with depth information, which may be obtained after the original pixel points belonging to the hole are repaired, or may be pixel points always belonging to the non-hole. The control device 10 may determine, based on the mapping relationship between the stereoscopic depth map and the pixel content of the initial virtual left-eye scene image, a stereoscopic depth value corresponding to each effective pixel point in the initial virtual left-eye scene image at the stereoscopic depth map.

In the substep S252, for each first pixel to be repaired in the initial virtual left-eye scene image, a left-eye repairing template is called to determine a plurality of first effective reference pixels around the first pixel to be repaired in the initial virtual left-eye scene image.

In this embodiment, the first pixel to be repaired is a pixel belonging to a hole that has not been repaired in the initial virtual left-eye scene graph. The left-eye repairing template is used for describing how to select a plurality of effective pixel points around a first pixel point to be repaired so as to locate and repair the effective pixel points required by the first pixel point to be repaired, wherein the left-eye repairing template records the relative position relationship between the corresponding first pixel point to be repaired and the selected effective pixel points.

In the sub-step S253, according to the stereoscopic depth value and the pixel position of each effective pixel point in the initial virtual left-eye scene image, a plurality of first calibration pixel points whose relative position distribution, depth level and color characteristics are consistent with the plurality of first effective reference pixel points are screened out from the initial virtual left-eye scene image.

In this embodiment, if the absolute value of the difference between the stereo depth values of two effective pixels is greater than the preset distance threshold, it indicates that the two effective pixels are at different depth levels. After determining all the first effective reference pixels corresponding to a certain first pixel to be repaired, the control device 10 determines, for each first effective reference pixel, a plurality of first effective pixels to be screened, corresponding to the first effective reference pixels, of which the depth levels and color characteristics are consistent with those of the first effective reference pixels, and then performs comprehensive screening on the plurality of first effective pixels to be screened, corresponding to the first effective reference pixels, according to the left-eye repairing template, so as to ensure that the plurality of screened first effective pixels to be screened correspond to one first effective reference pixel, and the relative position distribution among the plurality of screened first effective pixels to be screened is consistent with the relative position distribution among all the first effective reference pixels, and at this time, the plurality of screened first effective pixels to be screened are the plurality of first calibration pixels.

In the substep S254, the left-eye patch template is called to search for a first effective replacement pixel point in the initial virtual left-eye scene image, where the first effective replacement pixel point is located in the enclosure of the plurality of first calibration pixel points, and the first effective replacement pixel point is used to perform pixel replacement processing on the first pixel point to be repaired, so as to obtain an effective pixel point corresponding to the first pixel point to be repaired.

In this embodiment, the relative position relationship between the first valid replacement pixel point and the plurality of first calibration pixel points is consistent with the relative position relationship between the first pixel point to be repaired and the corresponding plurality of first valid reference pixel points. At this time, the foreground and background levels of the first valid replacement pixel point and the first pixel point to be repaired are basically consistent and have similar textures.

And in the substep S255, the initial virtual left-eye scene image after the pixel replacement operation of all the first pixel points to be repaired is completed is used as the target virtual left-eye scene image.

In this embodiment, after the pixel replacement operation is performed on all the first pixel points to be repaired in the initial virtual left-eye scene graph, it can be ensured that texture effects of all the first pixel points to be repaired after being repaired are substantially consistent, and a situation that foreground textures to be repaired are mixed with foreground and background contents does not occur, so that a foreground edge blurring phenomenon does not occur.

Therefore, by executing the substeps 251 to the substep S255, foreground and background conditions can be distinguished in the process of repairing the initial virtual left-eye scene image, and the phenomenon of foreground edge blurring caused by the fact that the repaired foreground texture is mixed with foreground and background contents is avoided.

Optionally, referring to fig. 8, fig. 8 is a second flowchart illustrating the sub-steps included in step S250 in fig. 2. In this embodiment, the target virtual binocular scene graph may include a target virtual right-eye scene graph, the step S250 of image restoration is performed on the initial virtual right-eye scene graph according to the stereoscopic depth distribution map, and the step of obtaining the target virtual right-eye scene graph may include sub-steps S256 to S2510, so as to distinguish foreground and background conditions in the restoration process of the initial virtual right-eye scene graph and avoid a foreground edge blurring phenomenon caused by mixed foreground and background contents of the restored foreground texture.

And a substep S256 of determining a stereoscopic depth value of each effective pixel point in the initial virtual right-eye scene image according to the stereoscopic depth distribution map.

In this embodiment, the control device 10 may determine, based on a pixel content mapping relationship between the stereoscopic depth distribution map and the initial virtual right-eye scene map, a stereoscopic depth value corresponding to each effective pixel point in the initial virtual right-eye scene map at the stereoscopic depth distribution map.

In the substep S257, for each second pixel to be repaired in the initial virtual right-eye scene graph, a right-eye repairing template is called to determine a plurality of second effective reference pixels around the second pixel to be repaired in the initial virtual right-eye scene graph.

In this embodiment, the second pixel to be repaired is a pixel belonging to a hole that has not been repaired in the initial virtual right-eye scene graph. The right-eye repairing template is used for describing how to select a plurality of effective pixel points around a second pixel point to be repaired so as to locate and repair the effective pixel points required by the second pixel point to be repaired, wherein the right-eye repairing template records the relative position relationship between the corresponding second pixel point to be repaired and the selected effective pixel points.

And a substep S258, screening a plurality of second calibration pixel points with the relative position distribution, the depth level and the color characteristics consistent with the plurality of second effective reference pixel points in the initial virtual right-eye scene image according to the three-dimensional depth value and the pixel position of each effective pixel point in the initial virtual right-eye scene image.

In this embodiment, after determining all the second effective reference pixels corresponding to a certain second pixel to be repaired, the control device 10 determines, for each second effective reference pixel, a plurality of second effective pixels to be screened, whose corresponding depth levels and color characteristics are consistent with those of the second effective reference pixels, at the position of the initial virtual right-eye scene graph, and then performs comprehensive screening on a plurality of second effective pixels to be screened, which correspond to the second effective reference pixels, according to the right-eye repairing template, so as to ensure that the plurality of screened second effective pixels to be screened correspond to one second effective reference pixel, and the relative position distribution among the plurality of screened second effective pixels to be screened is consistent with the relative position distribution among all the second effective reference pixels, and the plurality of screened second effective pixels to be screened at this time are the plurality of second calibration pixels.

In the substep S259, the right-eye repairing template is called to search a second valid replacement pixel point in the initial virtual right-eye scene graph, where the second valid replacement pixel point is located within the enclosure of the plurality of second calibration pixel points, and the second valid replacement pixel point is adopted to perform pixel replacement processing on the second pixel point to be repaired, so as to obtain a valid pixel point corresponding to the second pixel point to be repaired.

In this embodiment, the relative position relationship between the second valid replacement pixel point and the plurality of second calibration pixel points is consistent with the relative position relationship between the second pixel point to be repaired and the plurality of corresponding second valid reference pixel points. At this time, the foreground and background levels of the second effective replacement pixel point and the second pixel point to be repaired are basically consistent, and the second effective replacement pixel point and the second pixel point to be repaired also have similar textures.

In the substep S2510, the initial virtual right-eye scene graph after the pixel replacement operation of all the second pixel points to be repaired is completed is used as the target virtual right-eye scene graph.

In this embodiment, after the pixel replacement operation is performed on all the second pixels to be repaired in the initial virtual right-eye scene graph, it can be ensured that texture effects of all the second pixels to be repaired after the repair are substantially consistent, and a situation that the repaired foreground texture is mixed with foreground and background contents does not occur, so that a foreground edge blurring phenomenon does not occur.

Therefore, by executing the substeps 256 to the substep S2510, foreground and background conditions can be distinguished in the process of repairing the initial virtual right-eye scene graph, and the phenomenon of foreground edge blurring caused by the fact that the repaired foreground texture is mixed with foreground and background contents is avoided.

Therefore, by executing the steps S210 to S250, the binocular scene images corresponding to the robot with good image texture consistency under different visual angles are quickly constructed under the condition that the position configuration of the real camera of the robot is not required to be adjusted, so that an expected binocular training image set is quickly constructed, and the construction cost of the training image set is reduced.

The following is a brief description of the virtual perspective image construction method, taking an illustration of the image inpainting effect of the aloe vera potted plant as a target scene shown in fig. 9 as an example. When the control device 10 obtains a real binocular scene image of the scene "aloe vera pot culture" according to the current real perspective by controlling the target robot (where the corresponding real left-eye scene image is a background non-dark gray image in the virtual frame shown in fig. 9), the control device 10 determines a stereoscopic depth profile (a background dark gray image in the virtual frame shown in fig. 9) between the target robot and the scene "aloe vera pot culture" according to the above step S220 and the above substeps 221 to substep S224, then the control device 10 constructs an initial virtual left-eye scene image (an image with a black block in the lower right corner of the central distribution shown in fig. 9) and an initial virtual right-eye scene image (an image with a black block in the lower left corner of the central distribution shown in fig. 9) under the target virtual perspective and forms the initial virtual left-eye scene image under the target virtual perspective (an image with a black block in the lower right corner of the central distribution shown in fig. 9) and the initial virtual right-eye scene image (an image with a black block in the lower left corner of the central distribution shown in fig. 9) on the basis of the parallax depth correlation relationship, and then the control device 10 completes the binocular scene image representing the initial virtual perspective of the target virtual perspective and the target scene image under the target virtual perspective by executing the substeps 250 and the substeps 251 to S2510 to obtain the target scene image, and the target scene image representing the target scene.

In this application, in order to ensure that the control device 10 can execute the above virtual perspective image construction method by the virtual perspective image construction apparatus 100, the present application implements the foregoing functions by performing functional block division on the virtual perspective image construction apparatus 100. The following describes specific components of the virtual perspective image construction apparatus 100 provided in the present application.

Referring to fig. 10, fig. 10 is a schematic diagram illustrating a virtual perspective image constructing apparatus 100 according to an embodiment of the present disclosure. In this embodiment, the virtual perspective image constructing apparatus 100 may include a real scene map acquiring module 110, a stereoscopic depth map determining module 120, a virtual disparity map converting module 130, a virtual scene map constructing module 140, and a virtual scene map repairing module 150.

And a real scene graph acquiring module 110, configured to acquire a real binocular scene graph captured by the target robot for the target scene at the current real viewing angle.

And a stereo depth map determining module 120, configured to determine a stereo depth distribution map between the target robot and the target scene according to the real binocular scene map.

And the virtual disparity map conversion module 130 is configured to perform depth disparity conversion on the stereoscopic depth distribution map according to the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual perspective, so as to obtain a virtual binocular disparity map of the target robot for a target scene under the target virtual perspective.

And the virtual scene graph building module 140 is configured to perform image migration on the real binocular scene graph according to the virtual binocular disparity map to obtain an initial virtual binocular scene graph matched with a target virtual perspective.

And the virtual scene graph restoration module 150 is configured to perform image restoration on the initial virtual binocular scene graph according to the stereoscopic depth distribution map to obtain a target virtual binocular scene graph.

It should be noted that the basic principle and the resulting technical effect of the virtual perspective image constructing apparatus 100 provided in the embodiment of the present application are the same as those of the virtual perspective image constructing method described above. For a brief description, where not mentioned in this embodiment section, reference may be made to the above description of the virtual perspective image construction method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, which is stored in a readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In summary, in the virtual perspective image construction method and apparatus, the control device, and the readable storage medium provided by the present application, after determining a stereo depth distribution map between a target robot and a target scene for a real binocular scene image captured by the target robot at a current real perspective, depth parallax conversion is performed on the stereo depth distribution map according to a current camera focal length of the target robot and a virtual camera baseline distance corresponding to the target virtual perspective, so as to obtain a virtual binocular parallax map of the target robot at the target virtual perspective for the target scene, then image migration is performed according to the virtual binocular parallax map on the basis of the real binocular scene image, so as to obtain an initial virtual binocular scene image matched with the target virtual perspective, and then image restoration is performed on the initial virtual binocular scene image according to the stereo depth distribution map, so as to obtain a target virtual binocular scene image with good image texture consistency, thereby quickly constructing image texture images corresponding to the robot at different perspectives without adjusting a real camera position configuration of the robot, so as to quickly construct an expected consistency training set, and reduce a binocular image construction cost.

The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A virtual perspective image construction method is characterized by comprising the following steps:

2. The method of claim 1, wherein the real binocular scene map comprises a real left eye scene map and a real right eye scene map, and the step of determining the stereoscopic depth profile between the target robot and the target scene from the real binocular scene map comprises:

3. The method according to claim 2, wherein the step of determining a disparity estimation distribution map of each of the plurality of pre-stored disparity values between the left-eye scene feature map and the right-eye scene feature map comprises:

for each preset parallax value, performing characteristic offset on the right-eye scene characteristic diagram according to the parallax value under a characteristic capture window of the right-eye scene characteristic diagram to obtain a corresponding offset characteristic diagram;

carrying out image splicing on the offset characteristic image corresponding to the parallax value and the real left-eye scene image to obtain a binocular parallax characteristic image corresponding to the parallax value;

4. The method according to claim 2, wherein the step of calling a pre-stored parallax regression model to perform parallax regression on the spliced binocular disparity difference distribution map to obtain a real binocular disparity map corresponding to the current real viewing angle comprises:

and aiming at each pixel point in the real binocular disparity map, carrying out weighted summation operation on all the occurable disparity values corresponding to the pixel point and the respective occurrence probabilities of all the occurable disparity values to obtain the real disparity value of the real binocular disparity map at the pixel point.

5. The method according to any one of claims 1 to 4, wherein the disparity depth correlation required for performing the disparity depth conversion operation or the depth disparity conversion operation is expressed by the following equation:

6. The method according to claim 1, wherein the initial virtual binocular scene map includes an initial virtual left-eye scene map, and the target virtual binocular scene map includes a target virtual left-eye scene map, and the step of performing image restoration on the initial virtual left-eye scene map according to the stereoscopic depth distribution map to obtain the target virtual left-eye scene map includes:

according to the three-dimensional depth value and the pixel position of each effective pixel point in the initial virtual left-eye scene image, a plurality of first calibration pixel points are screened out in the initial virtual left-eye scene image, wherein the relative position distribution, the depth level and the color characteristics of the first calibration pixel points are consistent with those of the plurality of first effective reference pixel points, and if the absolute value of the difference value between the three-dimensional depth values of the two effective pixel points is larger than a preset interval threshold value, the two effective pixel points are located at different depth levels;

7. The method according to claim 6, wherein the initial virtual binocular scene graph further includes an initial virtual right-eye scene graph, the target virtual binocular scene graph further includes a target virtual right-eye scene graph, and the step of performing image restoration on the initial virtual right-eye scene graph according to the stereoscopic depth distribution map to obtain the target virtual right-eye scene graph includes:

8. An apparatus for constructing a virtual perspective image, the apparatus comprising:

the virtual parallax image conversion module is used for carrying out depth parallax conversion on the three-dimensional depth distribution map according to the current camera focal length of the target robot and the virtual camera baseline distance corresponding to the target virtual visual angle to obtain a virtual binocular parallax image of the target robot for a target scene under the target virtual visual angle;

and the virtual scene graph restoration module is used for carrying out image restoration on the initial virtual binocular scene graph according to the stereo depth distribution map to obtain a target virtual binocular scene graph.

9. A control apparatus comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being capable of executing the computer program to implement the virtual perspective image construction method of any one of claims 1 to 7.

10. A readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the virtual perspective image construction method of any one of claims 1 to 7.