CN115512044A

CN115512044A - Visual perception method and device, readable storage medium and electronic equipment

Info

Publication number: CN115512044A
Application number: CN202211162444.6A
Authority: CN
Inventors: 隋伟; 陈腾; 张骞
Original assignee: Horizon Shanghai Artificial Intelligence Technology Co Ltd
Current assignee: Horizon Shanghai Artificial Intelligence Technology Co Ltd
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-23

Abstract

Disclosed is a visual perception method comprising: acquiring a plurality of images which are respectively corresponding to a plurality of visual angles and acquired by a multi-camera system of a carrier at the same position; respectively encoding the plurality of images to obtain image characteristics corresponding to the images; decomposing the image characteristics in a remapping mode to obtain the overlooking visual angle characteristics and the all-around visual angle characteristics which correspond to the images respectively; fusing each overlooking visual angle characteristic and the all-round visual angle characteristic to obtain a three-dimensional space characteristic; decoding the three-dimensional space characteristics to obtain a three-dimensional space structure; and realizing visual perception according to the three-dimensional space structure. According to the visual perception method, the visual perception device, the readable storage medium and the electronic equipment, the image is decomposed into the all-round view characteristic and the overlooking view characteristic, after fusion, the accurate three-dimensional space characteristic can be obtained, and the perception effect is improved.

Description

Visual perception method and device, readable storage medium and electronic equipment

Technical Field

The present application relates to the field of machine vision technologies, and in particular, to a visual perception method and apparatus, a readable storage medium, and an electronic device.

Background

360-degree perception based on a panoramic image is a key technology for automatic driving. How to fully utilize the information of the panoramic image for automatic driving perception is a key technology to be solved. In the prior art, panoramic images are fused mainly through inverse perspective transformation, and the fusion mode comprises front fusion, middle fusion and rear fusion. However, the existing splicing scheme based on inverse perspective transformation assumes that the ground is a plane, which has a good perception effect on objects near the road surface, but has a serious distortion on objects with a certain height, such as traffic lights, signboards, and the like, and the perception effect is affected due to the loss of information in the vertical space.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a visual perception method, an apparatus, a readable storage medium, and an electronic device, which decompose an image into a panoramic view feature and a top view feature, and after fusion, can obtain an accurate three-dimensional spatial feature, thereby improving a perception effect.

According to an aspect of the present application, there is provided a visual perception method, including:

acquiring a plurality of images which are acquired by a multi-camera system of a carrier at the same position and respectively correspond to a plurality of visual angles;

respectively encoding the plurality of images to obtain image characteristics corresponding to the images;

decomposing the image characteristics in a remapping mode to obtain the overlooking visual angle characteristics and the all-around visual angle characteristics which correspond to the images respectively;

fusing each overlooking visual angle characteristic and the all-around visual angle characteristic to obtain a three-dimensional space characteristic;

decoding the three-dimensional space characteristics to obtain a three-dimensional space structure;

and realizing visual perception according to the three-dimensional space structure.

According to a second aspect of the present application, there is provided a visual perception device comprising:

the acquisition module is used for acquiring a plurality of images which are acquired by a multi-camera system of the carrier at the same position and respectively correspond to a plurality of visual angles;

the encoding module is used for respectively encoding the plurality of images to obtain image characteristics corresponding to the images;

the transformation module is used for decomposing the image characteristics in a remapping mode to obtain the overlooking visual angle characteristics and the all-around visual angle characteristics which correspond to the images respectively;

the fusion module is used for fusing the overlooking visual angle characteristics and the all-around visual angle characteristics to obtain three-dimensional space characteristics;

the decoding module is used for decoding the three-dimensional space characteristics to obtain a three-dimensional space structure;

and the perception module is used for realizing visual perception according to the three-dimensional space structure.

According to a third aspect of the present application, there is provided a computer-readable storage medium storing a computer program for performing any of the visual perception methods described above.

According to a fourth aspect of the present application, there is provided an electronic apparatus comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement any of the above-mentioned visual perception methods.

In the technical scheme provided by the invention, the collected images are decomposed to respectively acquire the top view angle characteristics and the all-around view angle characteristics, wherein the top view angle characteristics acquire the characteristic information in the horizontal direction, and the all-around view angle characteristic diagram acquires the characteristic information in the vertical direction (namely the characteristic information with height). And the overlooking visual angle characteristic and the ring view characteristic are fused into a three-dimensional characteristic, so that the height information loss of the three-dimensional characteristic in the vertical direction is greatly reduced. According to the technical scheme provided by the invention, the height information loss in the vertical direction can be reduced and the accuracy of machine vision perception can be improved according to the three-dimensional characteristic reconstruction of a single image.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally indicate like parts or steps.

Fig. 1 is a diagram of a system to which the present application is applicable.

Fig. 2 is a schematic diagram of image feature decomposition of a visual perception method provided by an exemplary embodiment of the present application.

Fig. 3 is a schematic diagram of image feature fusion of a visual perception method according to another exemplary embodiment of the present application.

FIG. 4 is a flow chart of a method of visual perception provided by an exemplary embodiment of the present application;

FIG. 5 is a flowchart of a visual perception method provided by an exemplary embodiment of the present application for obtaining multi-scale features;

FIG. 6 is a flowchart of image feature decomposition for a visual perception method provided by an exemplary embodiment of the present application;

FIG. 7 is a flow chart of image feature fusion for a visual perception method provided by an exemplary embodiment of the present application;

FIG. 8 is a flow chart of a visual perception method provided by an exemplary embodiment of the present application to determine a feature decomposition plane;

FIG. 9 is a schematic view of a visual perception device provided by an exemplary embodiment of the present application;

FIG. 10 is a schematic diagram of a visual perception device encoding module provided by an exemplary embodiment of the present application;

FIG. 11 is a diagram illustrating a visual perception device transformation module according to an exemplary embodiment of the present application;

FIG. 12 is a schematic diagram of a visual perception device fusion module provided by an exemplary embodiment of the present application;

FIG. 13 is a diagram illustrating a visual perception device transformation module according to an exemplary embodiment of the present application;

fig. 14 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the present disclosure may be generally understood as one or more, unless explicitly defined otherwise or indicated to the contrary hereinafter.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B, may indicate: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the disclosure

In the prior art, the acquired image is usually reconstructed into three-dimensional features by inverse perspective transformation, which is a transformation assuming that the ground is a plane. The inverse perspective transformation has a good perception effect on objects close to the road surface, but for objects with a certain height, such as traffic lights, signboards and the like, serious distortion is generally caused in the transformation process, information loss in a vertical space is caused, and the perception effect is influenced.

The present disclosure provides a visual perception method, comprising: acquiring a plurality of images which are acquired by a multi-camera system of a carrier at the same position and respectively correspond to a plurality of visual angles; respectively encoding the plurality of images to obtain image characteristics corresponding to the images;

decomposing the image characteristics in a remapping mode to obtain the top view angle characteristics and the all-around view angle characteristics corresponding to the images respectively;

In the technical scheme provided by the invention, the collected images are decomposed to respectively acquire the overlooking visual angle characteristic and the all-round visual angle characteristic, wherein the overlooking visual angle characteristic acquires the characteristic information in the horizontal direction, and the all-round visual angle characteristic diagram acquires the characteristic information in the vertical direction. And the overlooking visual angle characteristic and the view changing characteristic are fused into a three-dimensional characteristic, so that the height information loss of the three-dimensional characteristic in the vertical direction is greatly reduced. According to the technical scheme provided by the invention, reconstruction can be carried out according to the three-dimensional characteristics of a single image, the height information loss in the vertical direction is reduced, and the accuracy of machine vision perception is improved.

Exemplary System

As shown in fig. 1, an exemplary system used in implementing the visual perception method of the present invention is illustrated, and includes a plurality of image acquisition modules, such as, but not limited to, a Front image acquisition module (Front in fig. 1), a Rear Front image acquisition module (real in fig. 1), a Front Right image acquisition module (Front Right in fig. 1), and the like, which can acquire images from different viewing angles at the same position.

The acquired images are encoded by an encoding module (Encoder) to obtain image features (Feature maps) corresponding to the images, and the encoding module may obtain the image features corresponding to the images by using a convolution calculation, for example.

Each image Feature can be remapped to obtain a Feature of a View angle in a View direction (Feature map Range View) and a Feature of a View angle in a View direction (Feature map Bev View) corresponding to each image Feature, wherein the Feature of the View angle in the View direction can provide information in a vertical direction, and the Feature of the View angle in the View direction can provide information in a horizontal direction. The process of remapping each image feature into a view change feature and a top view feature is shown in fig. 2. In fig. 2, current View represents an image acquired by an image acquisition module, range View represents a decomposed View angle feature of a ring View, and Bev View represents a decomposed View angle feature of a top View. The intersection of the radioactive lines connecting the four corners of the Current View indicate the position of the camera, and the dashed arrow indicates the decomposition direction.

And fusing the overlooking visual angle characteristic and the all-around visual angle characteristic through a Fusion module to form a three-dimensional space characteristic (Feature Fusion), determining the position of each point of the three-dimensional space characteristic in the horizontal direction by using the overlooking visual angle characteristic when the three-dimensional space characteristic is formed, and determining the position of each point of the three-dimensional space characteristic in the vertical direction by using the all-around visual angle characteristic. The fusion mode of the top View feature and the ring View feature is shown in fig. 3, range View represents the ring View feature, bev View represents the top View feature, and pilar cell represents a point of a three-dimensional spatial feature formed by fusing corresponding points in the ring View feature and the top View feature.

The three-dimensional space characteristics are decoded by the decoding module, a three-dimensional space structure can be obtained, and the three-dimensional space structure can be used for realizing visual perception.

The method of the disclosure

An embodiment of the present invention provides a visual perception method, as shown in fig. 4, including:

step 100, acquiring a plurality of images corresponding to a plurality of visual angles acquired by a multi-camera system of a carrier at the same position;

in some embodiments, the plurality of images respectively corresponding to the plurality of viewing angles acquired at the same position refer to images acquired by the image acquisition modules at different viewing angles when the vehicle is at the same position. Wherein each image corresponds to an image acquisition module. The plurality of image acquisition modules may, for example, respectively acquire images corresponding to a front viewing angle, a rear viewing angle, a left front viewing angle, a left rear viewing angle, a right front viewing angle, and a right rear viewing angle.

Step 200, respectively coding the plurality of images to obtain image characteristics corresponding to the images;

in some embodiments, encoding an image refers to performing a convolution calculation on the image through a filter for feature extraction to obtain features in the image. Since different filters may correspond to different features, different filters may be used to perform a convolution on the image during the image encoding process to extract different features in the image. In addition, since the size of the image that can be extracted by the filter is limited, in this embodiment, the image may be scaled in different scales and then the filter may be used to perform convolution to extract features in different scales.

Step 300, decomposing the image characteristics in a remapping mode to obtain the overlooking visual angle characteristics and the all-around visual angle characteristics corresponding to the images respectively;

in some embodiments, remapping refers to remapping an image from a current image plane to one or more other planes or surfaces according to a mathematical relationship. The top view feature is an image feature formed on a horizontal plane on which the carrier is located. The perspective view feature refers to an image feature formed on an annular curved surface surrounding the carrier.

Step 400, fusing each overlooking visual angle characteristic and each all-around visual angle characteristic to obtain a three-dimensional space characteristic;

in some embodiments, since the top-view perspective feature is an image feature of a plane in which the vehicle is located, distance information of the image feature in the plane is characterized. And because the characteristic of the perspective view is the image characteristic of the annular curved surface which is vertical to the plane of the carrier, the distance information of the image characteristic in the annular curved surface is represented. The two are fused with each other, and the position of each point of the three-dimensional space characteristic in the three-dimensional space can be determined.

Step 500, decoding the three-dimensional space characteristics to obtain a three-dimensional space structure;

in some embodiments, the three-dimensional spatial structure includes the shape, position, and pose, etc., of various objects within the three-dimensional space. Such as the shape, location and attitude of vehicles, pedestrians, signs and traffic lights, etc. Decoding the three-dimensional spatial features means that a plurality of filters are adopted to perform deconvolution on the three-dimensional spatial features, and the three-dimensional spatial features are restored to be three-dimensional spatial structures.

And step 600, realizing visual perception according to the three-dimensional space structure.

In some embodiments, a three-dimensional structure refers to a structure formed by building image features into three-dimensional space. In the three-dimensional space, each target in the image features, such as other vehicles, pedestrians, signboards, traffic lights and the like, is displayed in a three-dimensional form, so that relative displacement information and relative rotation information between any two targets and the current vehicle and any one target, and height information of each target can be obtained, and accurate visual perception can be realized according to the relative displacement information, the relative rotation information and the height information of each target.

In the technical scheme provided by the embodiment of the invention, the collected images are decomposed to respectively acquire the overlooking visual angle characteristic and the all-round visual angle characteristic, wherein the overlooking visual angle characteristic acquires the characteristic information in the horizontal direction, and the all-round visual angle characteristic diagram acquires the characteristic information in the vertical direction. And the overlooking visual angle characteristic and the ring view characteristic are fused into a three-dimensional characteristic, so that the height information loss of the three-dimensional characteristic in the vertical direction is greatly reduced. By the technical scheme provided by the invention, three-dimensional characteristic reconstruction can be carried out according to the image, height information loss in the vertical direction is reduced, and the accuracy of machine vision perception is improved. In some preferred embodiments, the horizontal plane may be, for example, a plane parallel to the ground, and the vertical direction may be, for example, a direction perpendicular to the ground.

In addition to the above-described embodiment shown in fig. 4, as shown in fig. 5, in step 200, encoding each of the plurality of images to obtain the image features corresponding to each of the images includes:

step 110, zooming each image to obtain an image pyramid corresponding to each image;

in some embodiments, in order to acquire features of different scales in an image, the image needs to be scaled and then subjected to feature extraction. The image pyramid refers to a structure formed by stacking a plurality of images with different dimensions. Wherein, the image with larger scale has lower level, and the image with smaller scale has higher level, thereby forming a pyramid-like structure.

And 120, performing feature extraction on each image pyramid by adopting a plurality of feature extraction filters to obtain multi-scale image features corresponding to each image.

In some embodiments, in extracting features of the image pyramid, a plurality of feature extraction filters are used to convolve the image pyramid layer by layer downward from the top of the image pyramid to obtain image features of different scales of the image.

Based on the embodiment shown in fig. 4, as shown in fig. 6, in step 300, decomposing the image features by using a remapping method to obtain the top view angle feature and the all around view angle feature corresponding to each image includes:

step 310, calculating a first remapping vector of each image feature decomposition corresponding to each camera to a corresponding top view angle feature and a second remapping vector of each image feature decomposition to a corresponding ring view angle feature by using external parameters of each camera in the multi-camera system;

in some embodiments, the extrinsic parameters of the camera refer to translation and rotation of the camera relative to the vehicle, e.g., translation and rotation of the camera center relative to the vehicle center. The first remapping vector is a transformation vector when a point in the image shot by the camera is projected to the plane where the carrier is located, and the second remapping vector is a transformation vector when the point in the image shot by the camera is projected to an annular curved surface which is vertical to the plane where the carrier is located. In some preferred embodiments, the external parameters of the camera characterize a translation vector and a rotation vector of the camera relative to the vehicle, and when the first remapping vector and the second remapping vector are calculated, the translation vector and the rotation vector of the coordinate system to be mapped relative to the camera coordinate system are multiplied, and then multiplied by the external parameters of the camera, so that the corresponding mapping vector can be confirmed. For example, for the first remapping vector, it is necessary to multiply the translation vector and the rotation vector of the coordinate system corresponding to the feature of the top-down view angle relative to the coordinate system of the camera, and further multiply the translation vector and the rotation vector by the extrinsic parameters of the camera. Similarly, a similar calculation method may be used for the calculation of the second remapping vector.

And step 320, remapping each image feature based on the first remapping vector and the second remapping vector to obtain a top view angle feature and a surrounding view angle feature corresponding to each image feature.

In some embodiments, remapping each image feature means multiplying each point in each image feature by the first remapping vector to obtain a position of each point in each image feature on the plane where the vehicle is located; and multiplying each point in each image feature by the second remapping vector to obtain the position of each point in each image feature on the annular curved surface perpendicular to the plane where the carrier is located.

Based on the above embodiment shown in fig. 4, in step 300, decomposing each image feature in a remapping manner to obtain the top view feature and the ring view feature corresponding to each image respectively includes:

and decomposing the multi-scale image characteristics of each image in a remapping mode to obtain the multi-scale overlooking visual angle characteristics and the multi-scale around visual angle characteristics of each image.

In some embodiments, in the feature extraction process, image features of multiple scales are obtained in a mode of zooming an image, and in order to enable the image fusion to be more accurate, the image features of each scale are decomposed to obtain a top view angle feature and a surrounding view angle feature of each scale.

On the basis of the embodiment shown in fig. 4, in step 400, fusing the top-view perspective features and the all-around-view perspective features to obtain a three-dimensional spatial feature includes:

fusing the multi-scale overlooking visual angle characteristics and the multi-scale around visual angle characteristics of each image one by one to obtain three-dimensional space characteristics; wherein the scale-by-scale fusion comprises: and fusing the top view characteristic and the ring view characteristic with the same scale one by one.

In some embodiments, when feature fusion is performed, the top view feature of the current scale and the all-around view feature with the same scale are fused, and after the fusion is completed, the top view feature of the next scale and the all-around view feature with the same scale are fused. In some preferred embodiments, the current dimension is smaller than the next dimension. In other preferred embodiments, the current dimension may also be larger than the next dimension.

Based on the embodiment shown in fig. 4, as shown in fig. 7, in step 400, the fusing the top view angle features and the around view angle features to obtain the three-dimensional spatial features includes:

step 410, determining coordinates of each point of the three-dimensional space characteristic on a horizontal plane coordinate system based on the overlooking visual angle characteristic;

in some embodiments, since the top-view perspective feature is an image feature projected on a top-view plane, the position of each point of the three-dimensional spatial feature within the horizontal plane coordinate system can be clearly characterized in the top-view perspective feature. The coordinates of each point of the three-dimensional space characteristics on the horizontal plane coordinate system are determined by using the overlooking visual angle characteristics, so that accurate positioning can be realized, and distance loss is avoided.

Step 420, determining an angle coordinate and a vertical coordinate of each point of the three-dimensional space characteristic on a cylindrical coordinate system according to the characteristics of the all-round view angle;

in some embodiments, since the perspective view feature is an image feature projected on a curved surface of the surrounding vehicle, in the perspective view feature, the height information of each point of the three-dimensional spatial feature in the angular coordinate system can be clearly represented. The coordinates of each point of the three-dimensional space features on the cylindrical surface are determined by using the panoramic view angle features, so that accurate positioning can be realized, and the height loss is avoided.

And step 430, fusing the top view angle characteristic and the all-round view angle characteristic according to the corresponding relation between the horizontal plane coordinate system and the cylindrical surface coordinate system to obtain the three-dimensional space characteristic.

In some embodiments, the coordinates of each point of the three-dimensional space feature are accurately determined according to the overlooking visual angle feature and the around visual angle feature, and then the overlooking visual angle feature and the around visual angle feature are fused to determine the three-dimensional space feature, so that not only can the height information of a vertical space be kept, but also the information of a horizontal space can be kept, and the sensing precision and performance are improved. In some preferred embodiments, the look-around view feature has height information and angle information, and the look-down view feature has coordinates determined in two directions perpendicular to each other in a plane. In the fusion process, for example, taking any selected point in the view-around feature as an example, the height information and the angle information of the point are determined, the point with the same height on the vertical coordinate system of the view-around feature is connected with the point and extends outwards to form an extension line, and a perpendicular line is made to the view-down feature at any point on the extension line, and the perpendicular line will determine a point in the view-down feature. At this time, the point in the overlooking visual angle characteristic determined by the vertical line corresponds to the initially selected point in the looking-around visual angle characteristic, and the two points are fused to determine one point in the three-dimensional spatial characteristic.

Based on the above embodiment shown in fig. 4, as shown in fig. 8, in step 300, decomposing each image feature in a remapping manner to obtain a top view angle feature and a ring view angle feature corresponding to each image includes:

step 330, using the bottom surface of the carrier as a decomposition plane of the top view angle characteristic;

in some embodiments, since the plane on which the vehicle is located is not always a horizontal plane, for example, when the vehicle travels on a slope, the bottom surface of the vehicle is determined as a decomposition plane of the top view features, so that the decomposition of each image feature is performed along the plane on which the bottom surface of the vehicle is located, thereby avoiding reduction of the perception accuracy due to the inclination of the vehicle.

And 340, taking the cylindrical surface vertical to the bottom surface of the carrier as a decomposition plane of the characteristic of the perspective view.

In some embodiments, since the boundary surface of the perspective view feature will affect the height information of the article in the image, in order to accurately determine the height information of the article, the cylindrical surface perpendicular to the bottom surface of the carrier is used as the decomposition plane of the perspective view feature. In some preferred embodiments, the vehicle may be an automobile, for example.

Exemplary devices

An embodiment of the present invention further provides a visual perception device, as shown in fig. 9, including:

in some embodiments, the images acquired at the same position and corresponding to the multiple viewing angles respectively refer to images acquired by the image acquisition modules at the multiple different viewing angles respectively when the vehicle is at the same position. Wherein each image corresponds to an image acquisition module. The plurality of image acquisition modules may, for example, respectively acquire images corresponding to a front viewing angle, a rear viewing angle, a left front viewing angle, a left rear viewing angle, a right front viewing angle, and a right rear viewing angle.

in some embodiments, encoding an image refers to performing a convolution calculation on the image through a filter for feature extraction to obtain features in the image. Since different filters may correspond to different features, different filters may be used to perform convolution calculations on the image during the image encoding process, respectively, to extract different features in the image. In addition, since the size of the image that can be extracted by the filter is limited, in this embodiment, the image may be scaled in different scales and then be convolved by the filter to extract features in different scales.

in some embodiments, since the top view perspective feature is an image feature of a plane in which the vehicle is located, distance information of the image feature in the plane is characterized. And because the characteristic of the perspective view is the image characteristic of the annular curved surface which is vertical to the plane of the carrier, the distance information of the image characteristic in the annular curved surface is represented. The two are fused with each other, and the position of each point of the three-dimensional space characteristic in the three-dimensional space can be determined.

In some embodiments, a three-dimensional structure refers to a structure formed by building image features into three-dimensional space. In the three-dimensional space, each target in the image features, such as other vehicles, pedestrians, signs, traffic lights and the like, is displayed in a three-dimensional form, so that for any two targets, the relative displacement information and the relative rotation information between the current vehicle and any one target, and the height information of each target, accurate visual perception can be realized according to the relative displacement information, the relative rotation information and the height information of each target.

In the technical scheme provided by the embodiment of the invention, the collected images are decomposed to respectively obtain the top view angle characteristics and the all-around view angle characteristics, wherein the top view angle characteristics are used for obtaining the characteristic information in the horizontal direction, and the all-around view angle characteristic diagram is used for obtaining the characteristic information in the vertical direction. And the overlooking visual angle characteristic and the ring view characteristic are fused into a three-dimensional characteristic, so that the height information loss of the three-dimensional characteristic in the vertical direction is greatly reduced. According to the technical scheme provided by the invention, the height information loss in the vertical direction can be reduced and the accuracy of machine vision perception can be improved according to the three-dimensional characteristic reconstruction of a single image.

On the basis of the embodiment shown in fig. 9, as shown in fig. 10, the encoding module includes:

the zooming unit is used for zooming each image to obtain an image pyramid corresponding to each image;

in some embodiments, in order to acquire features of different scales in an image, the image needs to be scaled and then subjected to feature extraction. The image pyramid refers to a structure formed by stacking a plurality of images with different scales. Wherein, the image with larger scale has lower level, and the image with smaller scale has higher level, thereby forming a pyramid-like structure.

And the extraction unit is used for extracting the features of each image pyramid by adopting a plurality of feature extraction filters to obtain the multi-scale image features corresponding to each image.

In some embodiments, in performing feature extraction on the image pyramid, a plurality of feature extraction filters are used to perform downward convolution from the top of the image pyramid layer by layer to obtain image features of different scales of the image.

On the basis of the above-described embodiment shown in fig. 9, as shown in fig. 11, the transformation module includes:

the calculation unit is used for calculating a first remapping vector of each image feature decomposed to a corresponding overlook visual angle feature and a second remapping vector of each image feature decomposed to a corresponding surround visual angle feature, wherein the first remapping vector corresponds to each camera;

in some embodiments, the extrinsic parameters of the camera refer to translation and rotation of the camera relative to the vehicle, e.g., translation and rotation of the camera center relative to the vehicle center. The first remapping vector is a transformation vector when a point in an image shot by the camera is projected to a plane where the carrier is located, and the second remapping vector is a transformation vector when a point in an image shot by the camera is projected to an annular curved surface perpendicular to the plane where the carrier is located. In some preferred embodiments, the external parameters of the camera characterize a translation vector and a rotation vector of the camera relative to the vehicle, and when the first remapping vector and the second remapping vector are calculated, the translation vector and the rotation vector of the coordinate system to be mapped relative to the camera coordinate system are multiplied, and then multiplied by the external parameters of the camera, so that the corresponding mapping vector can be confirmed. For example, for the first remapping vector, it is necessary to multiply the translation vector and the rotation vector of the coordinate system corresponding to the feature of the top view relative to the coordinate system of the camera, and then multiply the translation vector and the rotation vector by the extrinsic parameters of the camera. Similarly, a similar calculation method may be used for the calculation of the second remapping vector.

And the mapping unit is used for remapping each image feature based on the first remapping vector and the second remapping vector so as to obtain a top view angle feature and a surrounding view angle feature corresponding to each image feature.

On the basis of the embodiment shown in fig. 9, the transformation module is further configured to decompose the multi-scale image features of each image in a remapping manner, so as to obtain the multi-scale top view angle features and the multi-scale around view angle features of each image.

In some embodiments, in the feature extraction process, image features of multiple scales are obtained in a mode of zooming an image, and in order to enable the image features to be fused more accurately in image fusion, the image features of each scale are decomposed to obtain an overlooking view angle feature and a surround view angle feature of each scale.

On the basis of the embodiment shown in fig. 9, the fusion module is further configured to fuse the multi-scale top view angle features and the multi-scale around view angle features of each image one by one to obtain three-dimensional spatial features; wherein the scale-by-scale fusion comprises: and fusing the top view characteristic and the ring view characteristic with the same scale one by one.

On the basis of the embodiment shown in fig. 9, as shown in fig. 12, the fusion module includes:

the horizontal calculation unit is used for determining the coordinate of each point of the three-dimensional space characteristic on a horizontal plane coordinate system based on the overlooking visual angle characteristic;

The vertical calculation unit is used for determining the angle coordinate and the vertical direction coordinate of each point of the three-dimensional space characteristic on a cylindrical surface coordinate system according to the panoramic view angle characteristic;

in some embodiments, since the perspective view feature is an image feature projected on a curved surface of the surrounding vehicle, in the perspective view feature, the height information of each point of the three-dimensional spatial feature in the angular coordinate system can be clearly represented. The coordinates of each point of the three-dimensional space characteristics on the cylindrical surface are determined by using the panoramic view angle characteristics, so that accurate positioning can be realized, and the height loss is avoided.

And the fusion calculation unit is used for fusing the overlooking visual angle characteristic and the all-round visual angle characteristic according to the corresponding relation between the horizontal plane coordinate system and the cylindrical surface coordinate system to obtain the three-dimensional space characteristic.

In some embodiments, the coordinates of each point of the three-dimensional space feature are accurately determined according to the overlooking visual angle feature and the around visual angle feature, and then the overlooking visual angle feature and the around visual angle feature are fused to determine the three-dimensional space feature, so that not only the height information of a vertical space but also the information of a horizontal space can be reserved, and the sensing precision and performance are improved. In some preferred embodiments, the look-around view feature has height information and angle information, and the look-down view feature has coordinates determined in two directions perpendicular to each other in a plane. In the process of merging, for example, taking any selected point in the look-around view feature as an example, the height information and the angle information of the point are determined, the point with the same height on the vertical coordinate system of the look-around view feature is connected with the point and extends outwards to form an extension line, and any point on the extension line makes a perpendicular line to the look-down view feature, and the perpendicular line will determine a point in the look-down view feature. At this time, the point in the overlooking visual angle characteristic determined by the vertical line corresponds to the initially selected point in the looking-around visual angle characteristic, and the two points are fused to determine one point in the three-dimensional spatial characteristic.

On the basis of the above-described embodiment shown in fig. 9, as shown in fig. 13, the transformation module includes:

the first determining unit is used for taking the bottom surface of the carrier as a decomposition plane of the top view angle characteristic;

And the second determining unit is used for taking a cylindrical surface vertical to the bottom surface of the carrier as a decomposition plane of the characteristic of the perspective view.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 14. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.

FIG. 14 illustrates a block diagram of an electronic device in accordance with an embodiment of the application.

As shown in fig. 14, the electronic device 10 includes one or more processors 11 and a memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 11 to implement the visual perception methods of the various embodiments of the present application described above and/or other desired functions. Various content such as an input signal, signal components, noise components, etc. may also be stored in the computer readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 13 may be the camera described above for capturing images. When the electronic device is a stand-alone device, the input means 13 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.

The input device 13 may also include, for example, a keyboard, a mouse, and the like.

The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 14, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the visual perception method according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the visual perception method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. As used herein, the words "or" and "refer to, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of visual perception, comprising:

acquiring a plurality of images which are respectively corresponding to a plurality of visual angles and acquired by a multi-camera system of a carrier at the same position;

2. The method of claim 1, wherein encoding the plurality of images respectively to obtain image features corresponding to the respective images comprises:

zooming each image to obtain an image pyramid corresponding to each image;

and performing feature extraction on each image pyramid by adopting a plurality of feature extraction filters to obtain multi-scale image features respectively corresponding to each image.

3. The method of claim 2, wherein decomposing each image feature in a remapping manner to obtain a top view feature and a ring view feature corresponding to each image respectively comprises:

calculating a first remapping vector of each image feature decomposition corresponding to each camera to a corresponding top view angle feature and a second remapping vector of each image feature decomposition to a corresponding ring view angle feature by using external parameters of each camera in a multi-camera system;

based on the first remapping vector and the second remapping vector, remapping each image feature to obtain the top view feature and the ring view feature corresponding to each image feature.

4. The method of claim 2, wherein decomposing each image feature in a remapping manner to obtain a top view feature and a ring view feature corresponding to each image respectively comprises:

and decomposing the multi-scale image features of each image in a remapping mode to obtain the multi-scale overlooking view angle features and the multi-scale around view angle features of each image.

5. The method of claim 4, wherein fusing each of the top view perspective features and the surround view perspective features to obtain a three-dimensional spatial feature comprises:

fusing the multi-scale overlooking visual angle characteristics and the multi-scale around visual angle characteristics of each image one by one to obtain three-dimensional space characteristics; wherein the scale-by-scale fusing comprises: and fusing the top view characteristic and the ring view characteristic with the same scale one by one.

6. The method of claim 1, wherein fusing each of the top view perspective features and the surround view perspective features to obtain a three-dimensional spatial feature comprises:

determining coordinates of each point of the three-dimensional space feature on a horizontal plane coordinate system based on the top view perspective feature;

according to the all-around view angle characteristics, determining the angle coordinates and the vertical direction coordinates of each point of the three-dimensional space characteristics on a cylindrical surface coordinate system;

and fusing the overlooking visual angle characteristic and the surrounding visual angle characteristic according to the corresponding relation between the horizontal plane coordinate system and the cylindrical surface coordinate system to obtain the three-dimensional space characteristic.

7. The method of claim 1, wherein decomposing each image feature in a remapping manner to obtain a top view feature and a ring view feature corresponding to each image respectively comprises:

taking the bottom surface of the carrier as a decomposition plane of the overlooking visual angle characteristic;

and taking a cylindrical surface vertical to the bottom surface of the carrier as a decomposition plane of the all-round visual angle characteristic.

8. A visual perception device, comprising:

the system comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring a plurality of images which are acquired by a multi-camera system of a carrier at the same position and respectively correspond to a plurality of visual angles;

the transformation module is used for decomposing the image characteristics in a remapping mode to obtain the overlook visual angle characteristics and the around-view visual angle characteristics corresponding to the images respectively;

9. A computer-readable storage medium, storing a computer program for performing the visual perception method of any one of the preceding claims 1-7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is used for reading the executable instructions from the memory and executing the instructions to realize the visual perception method of any one of the claims 1-7.