WO2021164269A1 - 基于注意力机制的视差图获取方法和装置 - Google Patents

基于注意力机制的视差图获取方法和装置 Download PDF

Info

Publication number
WO2021164269A1
WO2021164269A1 PCT/CN2020/119379 CN2020119379W WO2021164269A1 WO 2021164269 A1 WO2021164269 A1 WO 2021164269A1 CN 2020119379 W CN2020119379 W CN 2020119379W WO 2021164269 A1 WO2021164269 A1 WO 2021164269A1
Authority
WO
WIPO (PCT)
Prior art keywords
original image
feature matrix
matrix
left original
matching cost
Prior art date
Application number
PCT/CN2020/119379
Other languages
English (en)
French (fr)
Inventor
周宸
周宝
陈远旭
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021164269A1 publication Critical patent/WO2021164269A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method and device for acquiring a disparity map based on an attention mechanism.
  • Parallax refers to the difference in direction when observing the same target from two points at a certain distance.
  • the inventor realizes that when there are affine distortions, radiation distortions, or ill-conditioned areas such as occlusions, weak textures, repeated textures, and reflective surfaces on image pairs that have a parallax relationship, the accuracy of the calculation of the parallax value If it is lower, a disparity map with higher accuracy cannot be obtained.
  • the embodiments of the present application provide a method, device, computer equipment, and storage medium for acquiring a disparity map based on an attention mechanism to solve the current affine distortion, radiation distortion, or radiation distortion on image pairs that have a disparity relationship.
  • an attention mechanism to solve the current affine distortion, radiation distortion, or radiation distortion on image pairs that have a disparity relationship.
  • a disparity map with higher accuracy cannot be obtained.
  • an embodiment of the present application provides a method for obtaining a disparity map based on an attention mechanism, including:
  • a pre-trained feature extraction model is used to extract a feature matrix of the left original image from the left original image, and a feature matrix of the right original image is extracted from the right original image, wherein the features of the left original image include low-level features of the left original image A matrix and a high-level feature matrix of the left original image, where the features of the right original image include a low-level feature matrix of the right original image and a high-level feature matrix of the right original image;
  • the preset attention mechanism module is used to filter the feature matrix of the left original image and the feature matrix of the right original image, wherein the preset attention mechanism module is used to realize the comparison of the low-level feature matrix of the left original image and all the features of the left original image. Performing feature selection on the high-level feature matrix of the left original image, and performing feature selection on the low-level feature matrix of the right original image and the high-level feature matrix of the right original image;
  • a disparity map is obtained according to the target matching cost matrix.
  • an embodiment of the present application provides a disparity map acquisition device based on an attention mechanism, including:
  • An original image acquisition module configured to obtain a left original image and a right original image, wherein the left original image and the right original image are image pairs having a parallax relationship;
  • the feature extraction module is configured to use a pre-trained feature extraction model to extract a feature matrix of the left original image from the left original image, and extract a feature matrix of the right original image from the right original image, wherein the left original image features Comprising a low-level feature matrix of the left original image and a high-level feature matrix of the left original image, and the features of the right original image include a low-level feature matrix of the right original image and a high-level feature matrix of the right original image;
  • the filtering module is configured to filter the feature matrix of the left original image and the feature matrix of the right original image by using a preset attention mechanism module, wherein the preset attention mechanism module is used to realize the comparison of the left original image Performing feature selection on the low-level feature matrix and the high-level feature matrix of the left original image, and performing feature selection on the low-level feature matrix of the right original image and the high-level feature matrix of the right original image;
  • a matching cost matrix obtaining module configured to obtain a matching cost matrix according to the filtered feature matrix of the left original image and the filtered feature matrix of the right original image;
  • the target matching cost matrix acquisition module is configured to input the matching cost matrix into a pre-trained convolutional neural network to obtain a target matching cost matrix;
  • the disparity map obtaining module is used to obtain the disparity map according to the target matching cost matrix.
  • a computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor.
  • the processor executes the computer program, an attention-based The steps of the mechanism's disparity map acquisition method:
  • a pre-trained feature extraction model is used to extract a feature matrix of the left original image from the left original image, and a feature matrix of the right original image is extracted from the right original image, wherein the features of the left original image include low-level features of the left original image A matrix and a high-level feature matrix of the left original image, where the features of the right original image include a low-level feature matrix of the right original image and a high-level feature matrix of the right original image;
  • the preset attention mechanism module is used to filter the feature matrix of the left original image and the feature matrix of the right original image, wherein the preset attention mechanism module is used to realize the comparison of the low-level feature matrix of the left original image and all the features of the left original image. Performing feature selection on the high-level feature matrix of the left original image, and performing feature selection on the low-level feature matrix of the right original image and the high-level feature matrix of the right original image;
  • a disparity map is obtained according to the target matching cost matrix.
  • an embodiment of the present application provides a computer-readable storage medium, including: a computer program, which, when executed by a processor, implements the steps of an attention mechanism-based disparity map acquisition method:
  • a pre-trained feature extraction model is used to extract a feature matrix of the left original image from the left original image, and a feature matrix of the right original image is extracted from the right original image, wherein the features of the left original image include low-level features of the left original image A matrix and a high-level feature matrix of the left original image, where the features of the right original image include a low-level feature matrix of the right original image and a high-level feature matrix of the right original image;
  • the preset attention mechanism module is used to filter the feature matrix of the left original image and the feature matrix of the right original image, wherein the preset attention mechanism module is used to realize the comparison of the low-level feature matrix of the left original image and all the features of the left original image. Performing feature selection on the high-level feature matrix of the left original image, and performing feature selection on the low-level feature matrix of the right original image and the high-level feature matrix of the right original image;
  • a disparity map is obtained according to the target matching cost matrix.
  • the left original image and the right original image with parallax relationship are first obtained; then the pre-trained feature extraction model is used to perform feature extraction on the left original image and the right original image, and the attention mechanism is adopted after the feature extraction
  • the module filters the feature matrix of the left original image and the feature matrix of the right original image.
  • the attention mechanism can filter out the useless and negative information contained in the feature matrix of the left original image and the feature matrix of the right original image, thereby helping to improve the accuracy of the disparity map.
  • the matching cost matrix is obtained, which can represent the similarity between every two pixels between the left original image and the right original image The more similar the two points are, the greater the probability of the corresponding point between the left original image and the right original image.
  • the attention mechanism is used for feature selection, it can help to obtain a more accurate matching cost matrix, thereby helping to improve The accuracy of the disparity map;
  • the matching cost matrix is input into the pre-trained convolutional neural network to obtain the target matching cost matrix, and the disparity map is obtained according to the target matching cost matrix.
  • an attention mechanism is used to perform feature selection on the feature matrix of the left original image and the feature matrix of the right original image, and useless information and negative information contained in the feature matrix are filtered out, thereby improving the accuracy of the disparity map.
  • FIG. 1 is a flowchart of a method for acquiring a disparity map based on an attention mechanism in an embodiment of the present application
  • FIG. 2 is a functional block diagram of an apparatus for acquiring a disparity map based on an attention mechanism in an embodiment of the present application
  • Fig. 3 is a schematic diagram of a computer device in an embodiment of the present application.
  • first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other.
  • the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
  • the word “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
  • FIG. 1 shows a flow chart of the method for acquiring a disparity map based on the attention mechanism in this embodiment.
  • the disparity map acquisition method based on the attention mechanism can be applied to a disparity map acquisition system, and the disparity map acquisition method can be used when acquiring a disparity map for an image pair with a disparity relationship.
  • the disparity map acquisition system can be specifically applied to a computer device, where the computer device is a device that can perform human-computer interaction with a user, including but not limited to devices such as computers, smart phones, and tablets.
  • the method for acquiring a disparity map based on the attention mechanism includes:
  • S10 Acquire a left original image and a right original image, where the left original image and the right original image are image pairs having a parallax relationship.
  • Parallax refers to the difference in direction when observing the same target from two points at a certain distance. Understandably, for example, when a person observes the same object, the objects observed by the person's left and right eyes are different, and this difference is called parallax.
  • a device such as a binocular camera may be used to obtain the left and right original images. Since the binocular camera does not start from the same point to take the image, the images obtained by the binocular camera, the left original image and the right original image The original image has a parallax relationship.
  • S20 Use the pre-trained feature extraction model to extract the feature matrix of the left original image from the left original image, and extract the feature matrix of the right original image from the right original image, where the features of the left original image include the low-level feature matrix of the left original image and the left original image The high-level feature matrix of the image.
  • the features of the right original image include the low-level feature matrix of the right original image and the high-level feature matrix of the right original image.
  • the high-level feature matrix of the left original image refers to the output of the nth convolutional layer in the feature extraction model
  • the low-level feature matrix of the left original image refers to the output of the mth convolutional layer in the feature extraction model, where 0 ⁇ m ⁇ n
  • the high-level feature matrix of the right original image refers to the output of the q-th convolutional layer in the feature extraction model
  • the low-level feature matrix of the right original image refers to the output of the p-th convolutional layer in the feature extraction model, where , 0 ⁇ p ⁇ q.
  • the pre-trained feature extraction model includes a convolutional layer, which is used to extract feature matrices from the input left and right original images. It is understandable that when a convolutional neural network is used for feature extraction, the more convolutional layers included in the neural network, the extracted feature matrix represents deeper image features. It should be noted that the low-level feature matrix and the high-level feature matrix mentioned in this embodiment are a relative concept. The low-level feature matrix means that fewer convolutional layers are used to extract image features, and the high-level feature matrix means that more convolutions are used.
  • the high-level feature matrix of the left original image is the output of the nth convolutional layer in the feature extraction model
  • the low-level feature matrix of the left original image is the output of the mth convolutional layer in the feature extraction model.
  • the image features expressed by the low-level feature matrix and the high-level feature matrix of the image are different.
  • the image features expressed by the high-level feature matrix are more concise.
  • the low-level feature matrix contains some useless information in the image features, it still retains the high-level Image features missing from the feature matrix. In fact, the low-level feature matrix still includes valuable image features.
  • m specifically may be 5 layers, and n may specifically be 16 layers.
  • the number of layers represented by p and q may be the same as m and n, respectively, or different, and is not limited here.
  • the feature extraction performed on the left original image and the right original image in this embodiment are actually performed twice respectively, and the low-level feature matrix and the high-level feature matrix of the image are extracted for subsequent feature selection to improve the expression of the feature matrix.
  • Ability improves an important foundation.
  • S30 Use a preset attention mechanism module to filter the feature matrix of the left original image and the feature matrix of the right original image, where the preset attention mechanism module is used to implement the low-level feature matrix of the left original image and the high-level feature matrix of the left original image. Feature selection, and feature selection for the low-level feature matrix of the right original image and the high-level feature matrix of the right original image.
  • the attention mechanism module can be regarded as a feature selector or feature filter, which combines the low-level feature matrix and the high-level feature matrix to improve the attention of feature selection.
  • the low-level feature matrix and the high-level feature matrix will work together. Compare and select valid features from them.
  • the attention mechanism module can be implemented in a variety of ways.
  • the focus of the attention mechanism is to combine low-level feature matrices and high-level feature matrices to improve each other's attention to feature selection. Therefore, the attention mechanism module can use Different preset model structures realize the processing of the low-level feature matrix and high-level feature matrix of the left and right original images.
  • model structure can be used to implement features to filter the feature matrix of the left original image and the feature matrix of the right original image:
  • the attention mechanism module includes a first branch and a second branch.
  • step S30 the preset attention mechanism module is used to filter the feature matrix of the left original image and the feature matrix of the right original image, specifically including the steps of filtering the feature matrix of the left original image as follows:
  • S31 Input the high-level feature matrix of the left original image to the first branch of the attention mechanism module to obtain the first output feature matrix of the left original image, where the first branch includes a convolutional layer using a 1 ⁇ 1 convolution kernel, Batch specification layer, nonlinear layer and transformation layer.
  • the convolution layer of the 1 ⁇ 1 convolution kernel can be used to adjust the size of the high-level feature matrix of the left original image
  • the batch specification layer can effectively improve the accuracy of the first branch extraction
  • the non-linear layer can specifically use Relu (Rectified Linear Unit, Linear rectification function) is implemented
  • the transformation layer can be implemented using a sigmoid function.
  • the first output feature matrix of the left original image is actually It uses weights to express image features, and the first output feature matrix (weight matrix) of the left original image that uses this weight to express image features can also be subsequently used to combine with the low-level feature matrix of the left original image to improve Feature selection attention.
  • S32 Input the low-level feature matrix of the left original image to the second branch of the attention mechanism module to obtain the second output feature matrix of the left original image, where the second branch includes a convolution layer using a 1 ⁇ 1 convolution kernel.
  • the convolution layer of the 1 ⁇ 1 convolution kernel can be used to adjust the size of the feature matrix of the lower layer of the left original image, so that it can be the same size as the first output feature matrix of the left original image, so as to facilitate subsequent calculations.
  • the size of the low-level feature matrix of the left original image is the same as the size of the integrated output feature matrix of the left original image.
  • the elements at the corresponding positions are multiplied to obtain the integrated output feature matrix of the left original image.
  • this process can be understood as the processing of changing the weights of elements in the low-level feature matrix of the left original image, and multiplying it with the first output feature matrix of the left original image to reflect the high-level feature matrix of the left original image.
  • the high-level feature matrix of the left original image is combined with the low-level feature matrix of the left original image for the first time to obtain the comprehensive output feature matrix of the left original image.
  • the integrated output feature matrix of the left original image is obtained by multiplying the low-level feature matrix of the left original image by the weight matrix representing the image features of the high-level image of the left original image. It mainly reflects the high-level feature matrix of the left original image.
  • the integrated output feature matrix of the left original image and the low-level feature matrix of the left original image are added, and the high-level feature matrix of the left original image and the low-level feature matrix of the left original image can be combined again to further improve the attention of feature selection.
  • the comprehensive output feature matrix of the left original image is obtained by multiplying the low-level feature matrix of the left original image by the weight matrix representing the high-level image features of the left original image, although it mainly reflects the high-level feature matrix of the left original image. But it is also based on the low-level feature matrix of the left original image, so the result of the addition of the low-level feature matrix of the left original image in S34 can be made more accurate.
  • steps S31-S34 a specific embodiment of filtering the feature matrix of the left original image using a preset attention mechanism module is provided.
  • a preset attention mechanism module By combining the high-level feature matrix of the left original image and the low-level feature matrix of the left original image twice, Make full use of the attention mechanism to achieve a better filtering effect.
  • the attention mechanism module can also adopt other model structures. For example, keep the model structure in steps S31-S34, and add another parallel and same structure with the model structure in S31-S34. The difference is that the input data is just right. Contrary to S31-S34, finally add the value output in S34 and the value output in the newly added structure again, which can make full use of the attention mechanism to further improve the filtering effect.
  • the filtered left original image feature matrix and the filtered right original image feature matrix represent the effective image features of the left original image and the right original image.
  • the filtered left original image feature matrix and the filtered right original image feature matrix The matching cost matrix obtained from the feature matrix of the original image has high accuracy.
  • the matching cost matrix is a condition required for calculating the disparity map, and represents the similarity between every two pixels between the left original image and the right original image.
  • step S40 according to the filtered feature matrix of the left original image and the filtered feature matrix of the right original image, the matching cost matrix is obtained, which specifically includes:
  • the stereo matching algorithm is a method of calculating the disparity value, using matching costs (the three most common matching costs are the sum of absolute differences (Sum of Absolute Differences, SAD), and the sum of truncated absolute differences (Sum of Truncated Absolute Differences). STAD) and sum of squared differences (SSD) calculate the disparity value to determine the maximum disparity range.
  • SAD Sum of Absolute Differences
  • SSD sum of squared differences
  • cascade refers to the operation of matrix splicing.
  • the matching cost matrix obtained at this time represents the similarity between every two pixels between the left original image and the right original image. When the two pixels are more similar, It may be the greater the probability of the corresponding pixel.
  • a 4-dimensional matching cost matrix can be obtained by cascading the filtered left original image feature matrix and the filtered right original image feature matrix. If the size of the left original image and the right original image is W ⁇ H, the maximum disparity between the left original image and the right original image is D, the dimension after feature filtering is (H, W, c), then the cascade operation The latter size is (H, W, 2c), and the size of the matching cost matrix finally obtained is (D+1, H, W, 2c).
  • steps S41-S42 a specific embodiment for obtaining the matching cost matrix is provided.
  • the matching cost matrix obtained through the cascade operation retains the images of the filtered left original image feature matrix and the filtered right original image feature matrix Features can ensure the accuracy of the matching cost matrix.
  • S50 Input the matching cost matrix into the pre-trained convolutional neural network to obtain the target matching cost matrix.
  • the matching cost matrix can also perform feature extraction again, which can be implemented by using a pre-trained convolutional neural network, which can further improve the feature expression ability of the matching cost matrix and obtain the target matching cost matrix.
  • step S60 the matching cost matrix is input into the pre-trained convolutional neural network to obtain the target matching cost matrix, which specifically includes:
  • the size of the target matching cost matrix may be different from the left original image and the right original image. Therefore, an up-sampling method can be used to make the target matching cost matrix the same size as the left original image and the right original image.
  • S62 Perform regression calculation based on the target matching cost matrix after upsampling to obtain the regression value of the disparity value, where the regression value of the disparity value is expressed as Among them, Dmax represents the maximum disparity value, d represents the disparity value, ⁇ () represents the softmax function, and c d represents the loss value of the disparity value, and the loss value is obtained by using a preset loss function.
  • a preset stereo matching algorithm may be used to obtain the disparity value, and then the disparity value may be subjected to regression calculation to obtain the regression value of the disparity value. Understandably, the introduction of regression operations can reduce errors in the calculation process and further improve the accuracy of obtaining the disparity map.
  • the disparity map can be a disparity map obtained based on the left original image, or a disparity map obtained based on the right original image.
  • the disparity value on the corresponding pixels of the left original image and the right original image is returned
  • the disparity map can be determined and obtained.
  • the regression calculation produces a regression loss value during the training phase, and the regression loss value is obtained by using the matching cost loss function constructed by smoothL1, where the matching cost loss function is expressed as N represents the total number of pixels, D i denotes the i th disparity values, Represents the regression value of the i-th disparity value, where, When applied to the calculation of the matching cost loss function, x means
  • the entire system for obtaining the disparity map can be regarded as a model, which includes several neural networks. Therefore, the model also needs a training process to make the disparity map obtained by the steps S10-S60 more accurate, specifically , Regression calculation produces regression loss values during the training phase, so the above process of calculating regression loss values can be used to update the network parameters in the training phase according to the regression loss values.
  • the left original image and the right original image with parallax relationship are first obtained; then the pre-trained feature extraction model is used to perform feature extraction on the left original image and the right original image, and the attention mechanism is adopted after the feature extraction
  • the module filters the feature matrix of the left original image and the feature matrix of the right original image.
  • the attention mechanism can filter out the useless and negative information contained in the feature matrix of the left original image and the feature matrix of the right original image, thereby helping to improve the accuracy of the disparity map.
  • the matching cost matrix is obtained, which can represent the similarity between every two pixels between the left original image and the right original image The more similar the two points are, the greater the probability of the corresponding point between the left original image and the right original image.
  • the attention mechanism is used for feature selection, it can help to obtain a more accurate matching cost matrix, thereby helping to improve The accuracy of the disparity map;
  • the matching cost matrix is input into the pre-trained convolutional neural network to obtain the target matching cost matrix, and the disparity map is obtained according to the target matching cost matrix.
  • an attention mechanism is used to perform feature selection on the feature matrix of the left original image and the feature matrix of the right original image, and useless information and negative information contained in the feature matrix are filtered out, thereby improving the accuracy of the disparity map.
  • the embodiment of the present application further provides an embodiment of a device that implements each step and method in the above method embodiment.
  • FIG. 2 shows the principle block diagram of the disparity map acquisition device based on the attention mechanism that corresponds to the disparity map acquisition method based on the attention mechanism one-to-one in the embodiment.
  • the disparity map acquisition device based on the attention mechanism includes an original image acquisition module 10, a feature extraction module 20, a filtering module 30, a matching cost matrix acquisition module 40, a target matching cost matrix acquisition module 50, and a disparity map acquisition module. Module 60.
  • the realization functions of the original image acquisition module 10, the feature extraction module 20, the filtering module 30, the matching cost matrix acquisition module 40, the target matching cost matrix acquisition module 50, and the disparity map acquisition module 60 and the disparity based on the attention mechanism in the embodiment are in one-to-one correspondence, and in order to avoid repetition, this embodiment will not describe them one by one.
  • the original image acquisition module 10 is configured to obtain a left original image and a right original image, where the left original image and the right original image are image pairs having a parallax relationship.
  • the feature extraction module 20 is configured to use a pre-trained feature extraction model to extract a feature matrix of the left original image from the left original image, and extract a feature matrix of the right original image from the right original image, wherein the features of the left original image include the lower layers of the left original image
  • the feature matrix and the high-level feature matrix of the left original image, and the features of the right original image include the low-level feature matrix of the right original image and the high-level feature matrix of the right original image.
  • the filtering module 30 is used to filter the feature matrix of the left original image and the feature matrix of the right original image by using a preset attention mechanism module, wherein the preset attention mechanism module is used to realize the comparison of the low-level feature matrix of the left original image and the left original image
  • the high-level feature matrix performs feature selection, and the feature selection is performed on the low-level feature matrix of the right original image and the high-level feature matrix of the right original image.
  • the matching cost matrix obtaining module 40 is configured to obtain a matching cost matrix according to the filtered left original image feature matrix and the filtered right original image feature matrix.
  • the target matching cost matrix obtaining module 50 is used to input the matching cost matrix into the pre-trained convolutional neural network to obtain the target matching cost matrix.
  • the disparity map obtaining module 60 is configured to obtain the disparity map according to the target matching cost matrix.
  • the attention mechanism module includes a first branch and a second branch.
  • the filtering module includes:
  • the first acquisition unit is used to input the high-level feature matrix of the left original image to the first branch of the attention mechanism module to obtain the first output feature matrix of the left original image, where the first branch includes the use of a 1 ⁇ 1 convolution kernel
  • the convolutional layer, batch specification layer, nonlinear layer and transformation layer is used to input the high-level feature matrix of the left original image to the first branch of the attention mechanism module to obtain the first output feature matrix of the left original image, where the first branch includes the use of a 1 ⁇ 1 convolution kernel.
  • the convolution layer of the 1 ⁇ 1 convolution kernel can be used to adjust the size of the high-level feature matrix of the left original image
  • the batch specification layer can effectively improve the accuracy of the first branch extraction
  • the non-linear layer can specifically use Relu (Rectified Linear Unit, Linear rectification function) is implemented
  • the transformation layer can be implemented using a sigmoid function.
  • the first output feature matrix of the left original image is actually It uses weights to express image features, and the first output feature matrix (weight matrix) of the left original image that uses this weight to express image features can also be subsequently used to combine with the low-level feature matrix of the left original image to improve Feature selection attention.
  • the second acquisition unit is used to input the low-level feature matrix of the left original image to the second branch of the attention mechanism module to obtain the second output feature matrix of the left original image, where the second branch includes the use of a 1 ⁇ 1 convolution kernel The convolutional layer.
  • the convolution layer of the 1 ⁇ 1 convolution kernel can be used to adjust the size of the feature matrix of the lower layer of the left original image, so that it can be the same size as the first output feature matrix of the left original image, so as to facilitate subsequent calculations.
  • the third acquiring unit is configured to multiply the first output feature matrix of the left original image and the second output feature matrix of the left original image by corresponding elements to obtain the comprehensive output feature matrix of the left original image.
  • the size of the low-level feature matrix of the left original image is the same as the size of the integrated output feature matrix of the left original image.
  • the elements at the corresponding positions are multiplied to obtain the integrated output feature matrix of the left original image.
  • this process can be understood as a process of changing the element weights of the low-level feature matrix of the left original image, and by multiplying with the first output feature matrix of the left original image, it reflects the image of the high-level feature matrix of the left original image.
  • Feature The high-level feature matrix of the left original image is combined with the low-level feature matrix of the left original image for the first time to obtain the comprehensive output feature matrix of the left original image.
  • the fourth acquiring unit is used to add the integrated output feature matrix of the left original image and the low-level feature matrix of the left original image to obtain the filtered feature matrix of the left original image.
  • the integrated output feature matrix of the left original image is obtained by multiplying the low-level feature matrix of the left original image by the weight matrix representing the image features of the high-level image of the left original image. It mainly reflects the high-level feature matrix of the left original image.
  • the integrated output feature matrix of the left original image and the low-level feature matrix of the left original image are added, and the high-level feature matrix of the left original image and the low-level feature matrix of the left original image can be combined again to further improve the attention of feature selection.
  • the high-level feature matrix of the left original image refers to the output of the nth convolutional layer in the feature extraction model
  • the low-level feature matrix of the left original image refers to the output of the mth convolutional layer in the feature extraction model
  • the high-level feature matrix of the right original image refers to the output of the q-th convolutional layer in the feature extraction model
  • the low-level feature matrix of the right original image refers to the output of the p-th convolutional layer in the feature extraction model.
  • the matching cost matrix obtaining module is specifically used for:
  • a preset stereo matching algorithm is used to determine the maximum parallax range.
  • the filtered left original image feature matrix and the filtered right original image feature matrix are cascaded to obtain the matching cost matrix.
  • the disparity map acquisition module is specifically used for:
  • Dmax represents the maximum disparity value
  • d represents the disparity value
  • ⁇ () represents the softmax function
  • c d represents the loss value of the disparity value
  • the loss value is obtained by using a preset loss function
  • the disparity map is obtained according to the regression value of the disparity value.
  • the regression calculation generates a regression loss value during the training phase, and the regression loss value is obtained by using the matching cost loss function constructed by smoothL1, where the matching cost loss function is expressed as N represents the total number of pixels, D i denotes the i th disparity values, Represents the regression value of the i-th disparity value, where, When applied to the calculation of the matching cost loss function, x means
  • the network parameters in the training phase are updated according to the regression loss value.
  • the left original image and the right original image with parallax relationship are first obtained; then the pre-trained feature extraction model is used to perform feature extraction on the left original image and the right original image, and the attention mechanism is adopted after the feature extraction
  • the module filters the feature matrix of the left original image and the feature matrix of the right original image.
  • the attention mechanism can filter out the useless and negative information contained in the feature matrix of the left original image and the feature matrix of the right original image, thereby helping to improve the accuracy of the disparity map.
  • the matching cost matrix is obtained, which can represent the similarity between every two pixels between the left original image and the right original image The more similar the two points are, the greater the probability of the corresponding point between the left original image and the right original image.
  • the attention mechanism is used for feature selection, it can help to obtain a more accurate matching cost matrix, thereby helping to improve The accuracy of the disparity map;
  • the matching cost matrix is input into the pre-trained convolutional neural network to obtain the target matching cost matrix, and the disparity map is obtained according to the target matching cost matrix.
  • an attention mechanism is used to perform feature selection on the feature matrix of the left original image and the feature matrix of the right original image, and useless information and negative information contained in the feature matrix are filtered out, thereby improving the accuracy of the disparity map.
  • This embodiment provides a computer-readable storage medium.
  • the above-mentioned storage medium may be a non-volatile storage medium or a volatile storage medium.
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the method for obtaining the disparity map based on the attention mechanism in the embodiment is implemented. In order to avoid repetition, details are not repeated here.
  • the computer program is executed by the processor, the function of each module/unit in the disparity map acquisition apparatus based on the attention mechanism in the embodiment is realized. In order to avoid repetition, it will not be repeated here.
  • Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the computer device 70 of this embodiment includes: a processor 71, a memory 72, and a computer program 73 stored in the memory 72 and running on the processor 71.
  • the computer program 73 is executed by the processor 71
  • the method for acquiring the disparity map based on the attention mechanism in the embodiment is implemented.
  • the computer program 73 is executed by the processor 71, the functions of the models/units in the disparity map acquisition apparatus based on the attention mechanism that correspond to the disparity map acquisition method based on the attention mechanism one-to-one in the embodiment are implemented.
  • the computer device 70 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device 70 may include, but is not limited to, a processor 71 and a memory 72.
  • FIG. 3 is only an example of the computer device 70, and does not constitute a limitation on the computer device 70. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • computer equipment may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 71 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 72 may be an internal storage unit of the computer device 70, such as a hard disk or memory of the computer device 70.
  • the memory 72 may also be an external storage device of the computer device 70, such as a plug-in hard disk equipped on the computer device 70, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 72 may also include both an internal storage unit of the computer device 70 and an external storage device.
  • the memory 72 is used to store computer programs and other programs and data required by the computer equipment.
  • the memory 72 can also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种基于注意力机制的视差图获取方法、装置、计算机设备及存储介质,涉及人工智能技术领域。该基于注意力机制的视差图获取方法包括:获取左原图和右原图;采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵;采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵;根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;根据所述目标匹配代价矩阵得到视差图。采用该基于注意力机制的视差图获取方法能够得到准确度较高的视差图。

Description

基于注意力机制的视差图获取方法和装置
[根据细则91更正 26.10.2020] 
本申请要求于2020年02月18日提交中国专利局、申请号为202010097878.7,发明名称为“基于注意力机制的视差图获取方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于注意力机制的视差图获取方法和装置。
背景技术
视差是指从有一定距离的两个点上观察同一个目标所产生的方向差异。目前,发明人意识到,当具有视差关系的图像对上存在仿射畸变、辐射畸变的情况,或者受遮挡、弱纹理、重复纹理、反射表面等病态区域时,计算视差值时的精确度较低,无法得到准确度较高的视差图。
技术问题
有鉴于此,本申请实施例提供了一种基于注意力机制的视差图获取方法、装置、计算机设备及存储介质,用以解决目前在具有视差关系的图像对上存在仿射畸变、辐射畸变或病态区域时,无法得到准确度较高的视差图的问题。
技术解决方案
第一方面,本申请实施例提供了一种基于注意力机制的视差图获取方法,包括:
获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
根据所述目标匹配代价矩阵得到视差图。
第二方面,本申请实施例提供了一种基于注意力机制的视差图获取装置,包括:
原图获取模块,用于获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
特征提取模块,用于采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
过滤模块,用于采用预设的注意力机制模块过滤所述左原图特征矩阵和所述 右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
匹配代价矩阵获取模块,用于根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
目标匹配代价矩阵获取模块,用于将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
视差图获取模块,用于根据所述目标匹配代价矩阵得到视差图。
第三方面,一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种基于注意力机制的视差图获取方法的步骤:
获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
根据所述目标匹配代价矩阵得到视差图。
第四方面,本申请实施例提供了一种计算机可读存储介质,包括:计算机程序,所述计算机程序被处理器执行时实现一种基于注意力机制的视差图获取方法的步骤:
获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
根据所述目标匹配代价矩阵得到视差图。
有益效果
在本申请实施例中,首先获取具有视差关系的左原图和右原图;然后采用预先训练的特征提取模型对左原图和右原图进行特征提取,并在特征提取后采用注意力机制模块过滤左原图特征矩阵和右原图特征矩阵,采用该注意力机制能够过滤掉左原图特征矩阵和右原图特征矩阵中包含的无用信息以及负面信息,从而帮助提高视差图的准确度;接着根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵,该匹配代价矩阵能够代表左原图和右原图之间每两个像素点之间的相似性,两个点越相似,可能为左原图和右原图之间的对应点的概率越大,在采用注意力机制进行特征选择时,能够帮助得到更准确的匹配代价矩阵,从而帮助提高视差图的准确度;最后将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵,并根据目标匹配代价矩阵得到视差图。本实施例中,通过采用注意力机制对左原图特征矩阵和右原图特征矩阵进行特征选择,过滤掉特征矩阵中包含的无用信息以及负面信息,从而提高视差图的准确度。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。
图1是本申请一实施例中基于注意力机制的视差图获取方法的一流程图;
图2是本申请一实施例中基于注意力机制的视差图获取装置的原理框图;
图3是本申请一实施例中计算机设备的一示意图。
本发明的最佳实施方式
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的相同的字段,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应当理解,尽管在本申请实施例中可能采用术语第一、第二、第三等来描述预设范围等,但这些预设范围不应限于这些术语。这些术语仅用来将预设范围彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一预设范围也可以被称为第二预设范围,类似地,第二预设范围也可以被称为第一预设范围。
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
图1示出本实施例中基于注意力机制的视差图获取方法的一流程图。该基于注意力机制的视差图获取方法可应用在视差图获取系统上,在对具有视差关系的图像对获取视差图时可采用该视差图获取方法实现。该视差图获取系统具体可应用在计算机设备上,其中,该计算机设备是可与用户进行人机交互的设备,包括但不限于电脑、智能手机和平板等设备。如图1所示,该基于注意力机制的视差图获取方法包括:
S10:获取左原图和右原图,其中,左原图和右原图为具有视差关系的图像对。
视差是指从有一定距离的两个点上观察同一个目标所产生的方向差异。可以理解地,例如人在观察同一个目标时,人的左眼和右眼所观察的目标是有差别的,而这种差别称为视差。
在一实施例中,具体可以采用双目摄像头等设备获取左原图和右原图,由于双目摄像头不是从同一个点出发去拍摄图像,因此双目摄像头得到的图像,左原图和右原图存在视差关系。
S20:采用预先训练的特征提取模型从左原图提取得到左原图特征矩阵,以及从右原图提取得到右原图特征矩阵,其中,左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵。
进一步地,左原图高层特征矩阵是指在特征提取模型中第n层卷积层的输出,左原图低层特征矩阵是指在特征提取模型中第m层卷积层的输出,其中,0<m<n,右原图高层特征矩阵是指在特征提取模型中第q层卷积层的输出,右原图低层特征矩阵是指在特征提取模型中第p层卷积层的输出,其中,0<p<q。
其中,预先训练的特征提取模型包括卷积层,用于对输入的左原图和右原图进行特征矩阵的提取。可以理解地,在采用卷积神经网络进行特征提取时,神经网络包含的卷积层层数越多时,其提取的特征矩阵表示更深的图像特征。需要说明的是,本实施例中提及的低层特征矩阵和高层特征矩阵是一个相对的概念,低层特征矩阵表示采用较少的卷积层提取图像特征,高层特征矩阵表示采用较多的卷积层来提取图像特征,如左原图高层特征矩阵为在特征提取模型中第n层卷积层的输出,左原图低层特征矩阵为在特征提取模型中第m层卷积层的输出,其中,0<m<n。图像的低层特征矩阵和高层特征矩阵所表达的图像特征不同,一般来说,高层特征矩阵表达的图像特征更精简,低层特征矩阵虽然表达的图像特征中有部分无用的信息,但是仍保留着高层特征矩阵中所缺少的图像特征。低层特征矩阵中实际上仍然包括有价值的图像特征。
在一实施例中,m具体可以是5层,n具体可以是16层。p、q表示的层数可以分别与m、n相同,也可以不同,在此不作限定。
可以理解地,本实施例中对左原图和右原图进行的特征提取实际上分别进行了两次,提取了图像的低层特征矩阵和高层特征矩阵,为后续进行特征选择,提高特征矩阵表达能力提高了重要的基础。
S30:采用预设的注意力机制模块过滤左原图特征矩阵和右原图特征矩阵,其中,预设的注意力机制模块用于实现对左原图低层特征矩阵和左原图高层特征矩阵进行特征选择,以及对右原图低层特征矩阵和右原图高层特征矩阵进行特征选择。
其中,该注意力机制模块可看作是一个特征选择器或者特征过滤器,其将低层特征矩阵和高层特征矩阵进行结合处理,互相提高特征选择的注意力,低层特 征矩阵和高层特征矩阵将一起进行比较并从中选出有效特征。
可以理解地,注意力机制模块可以采用多种方式实现,该注意力机制的重点是要将低层特征矩阵和高层特征矩阵进行结合处理,互相提高特征选择的注意力,因此注意力机制模块可采用不同的预设模型结构实现对左原图、右原图的低层特征矩阵和高层特征矩阵进行处理。
进一步地,可以采用以下的模型结构实现特征实现过滤左原图特征矩阵和右原图特征矩阵:
首先,注意力机制模块包括第一分支和第二分支。
在步骤S30中,采用预设的注意力机制模块过滤左原图特征矩阵和右原图特征矩阵中,具体包括对左原图特征矩阵进行过滤的步骤如下:
S31:将左原图高层特征矩阵输入到注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层。
其中,1×1卷积核的卷积层可用来调整左原图高层特征矩阵的尺寸,批规范层能够有效提高第一分支提取的准确性,非线性层具体可以采用Relu(Rectified Linear Unit,线性整流函数)实现,变换层可以采用sigmoid函数实现。通过该第一分支的提取,得到的左原图第一输出特征矩阵的矩阵元素的区间在(0,1)之间(由sigmoid函数实现),此时实际上左原图第一输出特征矩阵是采用权重的方式表达图像特征,并且,采用该权重的方式表达图像特征的左原图第一输出特征矩阵(权重矩阵)还能够后续用于与左原图低层特征矩阵进行结合处理,从而提高特征选择的注意力。
S32:将左原图低层特征矩阵输入到注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,第二分支上包括采用1×1卷积核的卷积层。
其中,1×1卷积核的卷积层可用来调整左原图低层特征矩阵的尺寸,使其能够与左原图第一输出特征矩阵的尺寸相同,以便于进行后续的计算。
S33:将左原图第一输出特征矩阵和左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵。
可以理解地,左原图低层特征矩阵的尺寸和左原图综合输出特征矩阵的尺寸相同,则两者进行相乘时是按对应位置的元素进行相乘,从而得到左原图综合输出特征矩阵。
可以理解地,实际上该过程可以理解为给左原图低层特征矩阵作了矩阵中元素权重变化的处理,通过与左原图第一输出特征矩阵相乘,从而体现左原图高层特征矩阵的图像特征,将左原图高层特征矩阵与左原图低层特征矩阵作了初次结合,得到左原图综合输出特征矩阵。
S34:将左原图综合输出特征矩阵和左原图低层特征矩阵相加,得到过滤后的左原图特征矩阵。
可以理解地,左原图综合输出特征矩阵是在左原图低层特征矩阵的基础上乘以代表左原图高层的图像特征的权重矩阵得到的,主要体现的是左原图高层特征矩阵,在本实施例中将左原图综合输出特征矩阵和左原图低层特征矩阵相加,能够将左原图高层特征矩阵与左原图低层特征矩阵再进行一次结合,进一步提高特征选择的注意力。
需要说明的是,左原图综合输出特征矩阵是在左原图低层特征矩阵的基础上乘以代表左原图高层的图像特征的权重矩阵得到的,虽然主要体现的是左原图高层特征矩阵,但也是建立在左原图低层特征矩阵的基础上,因此能够使得在S34 中与左原图低层特征矩阵的相加得到的结果更加准确。
在步骤S31-S34中,提供了一种采用预设的注意力机制模块过滤左原图特征矩阵的具体实施例,通过将左原图高层特征矩阵和左原图低层特征矩阵进行两次结合,充分运用注意力机制,从而实现效果较好的过滤效果。
进一步地,注意力机制模块还可以采用其他的模型结构,例如,保留步骤S31-S34中的模型结构,增加另外一条与S31-S34中的模型结构并行且相同的结构,区别在于输入的数据正好与S31-S34的相反,最后将S34中输出的值和新增的结构中输出的值再进行一次相加,可以充分运用注意力机制,进一步提高过滤效果。
S40:根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵。
可以理解地,过滤后的左原图特征矩阵和过滤后的右原图特征矩阵代表的是左原图和右原图有效的图像特征,通过过滤后的左原图特征矩阵和过滤后的右原图特征矩阵得到的匹配代价矩阵具有较高的准确性。
在本实施例中,匹配代价矩阵是计算视差图所需的条件,代表了左原图和右原图之间每两个像素点之间的相似性。
进一步地,在步骤S40中,根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵,具体包括:
S41:采用预设的立体匹配算法确定最大视差范围。
其中,立体匹配算法是计算视差值的方法,采用匹配代价(最常见的三种匹配代价为绝对差值和(Sum of Absolute Differences,SAD)、截断绝对差值和(Sum of Truncated Absolute Differences,STAD)、差值平方和(Sum of squared Differences,SSD))计算视差值,从而确定最大视差范围。
S42:在最大视差范围内,级联过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵。
其中,级联是指矩阵进行拼接的操作,此时得到的匹配代价矩阵代表了左原图和右原图之间每两个像素点之间的相似性,当两个像素点越相似时,可能为对应像素点的概率越大。
具体地,在最大视差范围内,通过级联过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,可得到一个4维的匹配代价矩阵。若设左原图和右原图大小是W×H,左原图和右原图之间的最大视差为D,经过特征过滤后的维度为(H,W,c),则经过级联操作后的尺寸为(H,W,2c),最终获得的匹配代价矩阵的尺寸为(D+1,H,W,2c)。
在步骤S41-S42中,提供了一种得到匹配代价矩阵的具体实施例,通过级联操作得到的匹配代价矩阵保留了过滤后的左原图特征矩阵和过滤后的右原图特征矩阵的图像特征,能够保证匹配代价矩阵的准确性。
S50:将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵。
具体地,匹配代价矩阵还可以再进行一次特征提取,具体可以是采用预先训练的卷积神经网络实现,能够进一步提高匹配代价矩阵的特征表达能力,得到目标匹配代价矩阵。
S60:根据目标匹配代价矩阵得到视差图。
进一步地,在步骤S60中,将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵,具体包括:
S61:上采样目标匹配代价矩阵。
可以理解地,目标匹配代价矩阵的尺寸大小可能与左原图和右原图不相同,因此可采用上采样的方法使目标匹配代价矩阵的尺寸大小与左原图和右原图相同。
S62:基于上采样后的目标匹配代价矩阵进行回归计算,得到视差值的回归值,其中,视差值的回归值表示为
Figure PCTCN2020119379-appb-000001
其中,Dmax表示最大视差值,d表示视差值,σ()表示softmax函数,c d表示视差值的损失值,该损失值采用预设的损失函数得到。
具体地,可根据目标匹配代价矩阵,采用预设的立体匹配算法得到视差值,再对视差值进行回归计算,得到视差值的回归值。可以理解地,引入回归运算能够减少计算过程中的误差,进一步提高视差图获取的精确度。
S63:根据视差值的回归值得到视差图。
可以理解地,视差图可以是以左原图为基础得到的视差图,也可以是以右原图为基础得到的视差图,当左原图和右原图对应像素上的视差值的回归值确定时,即可确定并得到视差图。
进一步地,回归计算在训练阶段产生回归损失值,回归损失值采用smoothL1构建的匹配代价损失函数得到,其中,匹配代价损失函数表示为
Figure PCTCN2020119379-appb-000002
Figure PCTCN2020119379-appb-000003
N表示像素总数,d i表示第i个视差值,
Figure PCTCN2020119379-appb-000004
表示第i个视差值的回归值,其中,
Figure PCTCN2020119379-appb-000005
应用到计算匹配代价损失函数中时x表示
Figure PCTCN2020119379-appb-000006
其中,可以将整个获取视差图的系统看作一个模型,该模型中包括若干的神经网络,因此该模型也需要一个训练的过程,以使采用S10-S60步骤得到的视差图更加准确,具体地,回归计算在训练阶段产生回归损失值,因此可利用以上计算回归损失值的过程,根据回归损失值对训练阶段的网络参数进行更新。
在本申请实施例中,首先获取具有视差关系的左原图和右原图;然后采用预先训练的特征提取模型对左原图和右原图进行特征提取,并在特征提取后采用注意力机制模块过滤左原图特征矩阵和右原图特征矩阵,采用该注意力机制能够过滤掉左原图特征矩阵和右原图特征矩阵中包含的无用信息以及负面信息,从而帮助提高视差图的准确度;接着根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵,该匹配代价矩阵能够代表左原图和右原图之间每两个像素点之间的相似性,两个点越相似,可能为左原图和右原图之间的对应点的概率越大,在采用注意力机制进行特征选择时,能够帮助得到更准确的匹配代价矩阵,从而帮助提高视差图的准确度;最后将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵,并根据目标匹配代价矩阵得到视差图。本实施例中,通过采用注意力机制对左原图特征矩阵和右原图特征矩阵进行特征选择,过滤掉特征矩阵中包含的无用信息以及负面信息,从而提高视差图的准确度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程 构成任何限定。
基于实施例中所提供的基于注意力机制的视差图获取方法,本申请实施例进一步给出实现上述方法实施例中各步骤及方法的装置实施例。
图2示出与实施例中基于注意力机制的视差图获取方法一一对应的基于注意力机制的视差图获取装置的原理框图。如图2所示,该基于注意力机制的视差图获取装置包括原图获取模块10、特征提取模块20、过滤模块30、匹配代价矩阵获取模块40、目标匹配代价矩阵获取模块50和视差图获取模块60。其中,原图获取模块10、特征提取模块20、过滤模块30、匹配代价矩阵获取模块40、目标匹配代价矩阵获取模块50和视差图获取模块60的实现功能与实施例中基于注意力机制的视差图获取方法对应的步骤一一对应,为避免赘述,本实施例不一一详述。
原图获取模块10,用于获取左原图和右原图,其中,左原图和右原图为具有视差关系的图像对。
特征提取模块20,用于采用预先训练的特征提取模型从左原图提取得到左原图特征矩阵,以及从右原图提取得到右原图特征矩阵,其中,左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵。
过滤模块30,用于采用预设的注意力机制模块过滤左原图特征矩阵和右原图特征矩阵,其中,预设的注意力机制模块用于实现对左原图低层特征矩阵和左原图高层特征矩阵进行特征选择,以及对右原图低层特征矩阵和右原图高层特征矩阵进行特征选择。
匹配代价矩阵获取模块40,用于根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵。
目标匹配代价矩阵获取模块50,用于将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵。
视差图获取模块60,用于根据目标匹配代价矩阵得到视差图。
可选地,注意力机制模块包括第一分支和第二分支。
可选地,过滤模块包括:
第一获取单元,用于将左原图高层特征矩阵输入到注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层。
其中,1×1卷积核的卷积层可用来调整左原图高层特征矩阵的尺寸,批规范层能够有效提高第一分支提取的准确性,非线性层具体可以采用Relu(Rectified Linear Unit,线性整流函数)实现,变换层可以采用sigmoid函数实现。通过该第一分支的提取,得到的左原图第一输出特征矩阵的矩阵元素的区间在(0,1)之间(由sigmoid函数实现),此时实际上左原图第一输出特征矩阵是采用权重的方式表达图像特征,并且,采用该权重的方式表达图像特征的左原图第一输出特征矩阵(权重矩阵)还能够后续用于与左原图低层特征矩阵进行结合处理,从而提高特征选择的注意力。
第二获取单元,用于将左原图低层特征矩阵输入到注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,第二分支上包括采用1×1卷积核的卷积层。
其中,1×1卷积核的卷积层可用来调整左原图低层特征矩阵的尺寸,使其能够与左原图第一输出特征矩阵的尺寸相同,以便于进行后续的计算。
第三获取单元,用于将左原图第一输出特征矩阵和左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵。
可以理解地,左原图低层特征矩阵的尺寸和左原图综合输出特征矩阵的尺寸相同,则两者进行相乘时是按对应位置的元素进行相乘,从而得到左原图综合输出特征矩阵。
可以理解地,实际上该过程可以理解为给左原图低层特征矩阵作了一次元素权重变化的处理,通过与左原图第一输出特征矩阵相乘,体现了左原图高层特征矩阵的图像特征,将左原图高层特征矩阵与左原图低层特征矩阵作了初次结合,得到左原图综合输出特征矩阵。
第四获取单元,用于将左原图综合输出特征矩阵和左原图低层特征矩阵相加,得到过滤后的左原图特征矩阵。
可以理解地,左原图综合输出特征矩阵是在左原图低层特征矩阵的基础上乘以代表左原图高层的图像特征的权重矩阵得到的,主要体现的是左原图高层特征矩阵,在本实施例中将左原图综合输出特征矩阵和左原图低层特征矩阵相加,能够将左原图高层特征矩阵与左原图低层特征矩阵再进行一次结合,进一步提高特征选择的注意力。
可选地,左原图高层特征矩阵是指在特征提取模型中第n层卷积层的输出,左原图低层特征矩阵是指在特征提取模型中第m层卷积层的输出,其中,0<m<n,右原图高层特征矩阵是指在特征提取模型中第q层卷积层的输出,右原图低层特征矩阵是指在特征提取模型中第p层卷积层的输出,其中,0<p<q。
可选地,匹配代价矩阵获取模块具体用于:
采用预设的立体匹配算法确定最大视差范围。
在最大视差范围内,级联过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵。
可选地,视差图获取模块具体用于:
上采样目标匹配代价矩阵;
基于上采样后的目标匹配代价矩阵进行回归计算,得到视差值的回归值,其中,视差值的回归值表示为
Figure PCTCN2020119379-appb-000007
其中,Dmax表示最大视差值,d表示视差值,σ()表示softmax函数,c d表示视差值的损失值,该损失值采用预设的损失函数得到;
根据视差值的回归值得到视差图。
可选地,回归计算在训练阶段产生回归损失值,回归损失值采用smoothL1构建的匹配代价损失函数得到,其中,匹配代价损失函数表示为
Figure PCTCN2020119379-appb-000008
Figure PCTCN2020119379-appb-000009
N表示像素总数,d i表示第i个视差值,
Figure PCTCN2020119379-appb-000010
表示第i个视差值的回归值,其中,
Figure PCTCN2020119379-appb-000011
应用到计算匹配代价损失函数中时x表示
Figure PCTCN2020119379-appb-000012
根据回归损失值对训练阶段的网络参数进行更新。
在本申请实施例中,首先获取具有视差关系的左原图和右原图;然后采用预先训练的特征提取模型对左原图和右原图进行特征提取,并在特征提取后采用注 意力机制模块过滤左原图特征矩阵和右原图特征矩阵,采用该注意力机制能够过滤掉左原图特征矩阵和右原图特征矩阵中包含的无用信息以及负面信息,从而帮助提高视差图的准确度;接着根据过滤后的左原图特征矩阵和过滤后的右原图特征矩阵,得到匹配代价矩阵,该匹配代价矩阵能够代表左原图和右原图之间每两个像素点之间的相似性,两个点越相似,可能为左原图和右原图之间的对应点的概率越大,在采用注意力机制进行特征选择时,能够帮助得到更准确的匹配代价矩阵,从而帮助提高视差图的准确度;最后将匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵,并根据目标匹配代价矩阵得到视差图。本实施例中,通过采用注意力机制对左原图特征矩阵和右原图特征矩阵进行特征选择,过滤掉特征矩阵中包含的无用信息以及负面信息,从而提高视差图的准确度。
本实施例提供一计算机可读存储介质,上述存储介质可以是非易失性存储介质,也可以是易失性存储介质。该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现实施例中基于注意力机制的视差图获取方法,为避免重复,此处不一一赘述。或者,该计算机程序被处理器执行时实现实施例中基于注意力机制的视差图获取装置中各模块/单元的功能,为避免重复,此处不一一赘述。
图3是本申请一实施例提供的计算机设备的示意图。如图3所示,该实施例的计算机设备70包括:处理器71、存储器72以及存储在存储器72中并可在处理器71上运行的计算机程序73,该计算机程序73被处理器71执行时实现实施例中基于注意力机制的视差图获取方法。或者,该计算机程序73被处理器71执行时实现实施例中与基于注意力机制的视差图获取方法一一对应的基于注意力机制的视差图获取装置中各模型/单元的功能。
计算机设备70可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备70可包括,但不仅限于,处理器71、存储器72。本领域技术人员可以理解,图3仅仅是计算机设备70的示例,并不构成对计算机设备70的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器71可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器72可以是计算机设备70的内部存储单元,例如计算机设备70的硬盘或内存。存储器72也可以是计算机设备70的外部存储设备,例如计算机设备70上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器72还可以既包括计算机设备70的内部存储单元也包括外部存储设备。存储器72用于存储计算机程序以及计算机设备所需的其他程序和数据。存储器72还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将装置的内部结构划分成不同的功能单 元或模块,以完成以上描述的全部或者部分功能。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于注意力机制的视差图获取方法,其中,包括:
    获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
    采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
    采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
    根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
    将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
    根据所述目标匹配代价矩阵得到视差图。
  2. 根据权利要求1所述的方法,其中,所述左原图高层特征矩阵是指在所述特征提取模型中第n层卷积层的输出,所述左原图低层特征矩阵是指在所述特征提取模型中第m层卷积层的输出,其中,0<m<n,所述右原图高层特征矩阵是指在所述特征提取模型中第q层卷积层的输出,所述右原图低层特征矩阵是指在所述特征提取模型中第p层卷积层的输出,其中,0<p<q。
  3. 根据权利要求1所述的方法,其中,所述注意力机制模块包括第一分支和第二分支,所述采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵中,包括对所述左原图特征矩阵进行过滤的步骤如下:
    将所述左原图高层特征矩阵输入到所述注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,所述第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层;
    将所述左原图低层特征矩阵输入到所述注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,所述第二分支上包括采用1×1卷积核的卷积层;
    将所述左原图第一输出特征矩阵和所述左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵;
    将所述左原图综合输出特征矩阵和所述左原图低层特征矩阵相加,得到所述过滤后的左原图特征矩阵。
  4. 根据权利要求1所述的方法,其中,所述根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵,包括:
    采用预设的立体匹配算法确定最大视差范围;
    在所述最大视差范围内,级联过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到所述匹配代价矩阵。
  5. 根据权利要求1-4任意一项所述的方法,其中,所述根据所述目标匹配代价矩阵得到视差图,包括:
    上采样所述目标匹配代价矩阵;
    基于上采样后的所述目标匹配代价矩阵进行回归计算,得到视差值的回归值,其中,所述视差值的回归值表示为
    Figure PCTCN2020119379-appb-100001
    其中,Dmax表示最大视差值,d表示视差值,σ()表示softmax函数,c d表示视差值的损失值,该损失值采用预设的损失函数得到;
    根据所述视差值的回归值得到视差图。
  6. 根据权利要求5所述的方法,其中,所述回归计算在训练阶段产生回归损失值,所述回归损失值采用smoothL1构建的匹配代价损失函数得到,其中,所述匹配代价损失函数表示为
    Figure PCTCN2020119379-appb-100002
    N表示像素总数,d i表示第i个视差值,
    Figure PCTCN2020119379-appb-100003
    表示第i个视差值的回归值,其中,
    Figure PCTCN2020119379-appb-100004
    Figure PCTCN2020119379-appb-100005
    应用到计算匹配代价损失函数中时所述x表示
    Figure PCTCN2020119379-appb-100006
    根据所述回归损失值对训练阶段的网络参数进行更新。
  7. 一种基于注意力机制的视差图获取装置,其中,所述装置包括:
    原图获取模块,用于获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
    特征提取模块,用于采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
    过滤模块,用于采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
    匹配代价矩阵获取模块,用于根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
    目标匹配代价矩阵获取模块,用于将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
    视差图获取模块,用于根据所述目标匹配代价矩阵得到视差图。
  8. 根据权利要求7所述的装置,其中,所述注意力机制模块包括第一分支和第二分支,所述过滤模块包括:
    第一获取单元,用于将所述左原图高层特征矩阵输入到所述注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,所述第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层;
    第二获取单元,用于将所述左原图低层特征矩阵输入到所述注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,所述第二分支上包括采用1×1卷积核的卷积层;
    第三获取单元,用于将所述左原图第一输出特征矩阵和所述左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵;
    第四获取单元,用于将所述左原图综合输出特征矩阵和所述左原图低层特征矩阵相加,得到所述过滤后的左原图特征矩阵。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现一种基于注意力机制的视差图获取方法的步骤:
    获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
    采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
    采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
    根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
    将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
    根据所述目标匹配代价矩阵得到视差图。
  10. 根据权利要求9所述的计算机设备,其中,所述左原图高层特征矩阵是指在所述特征提取模型中第n层卷积层的输出,所述左原图低层特征矩阵是指在所述特征提取模型中第m层卷积层的输出,其中,0<m<n,所述右原图高层特征矩阵是指在所述特征提取模型中第q层卷积层的输出,所述右原图低层特征矩阵是指在所述特征提取模型中第p层卷积层的输出,其中,0<p<q。
  11. 根据权利要求9所述的计算机设备,其中,所述注意力机制模块包括第一分支和第二分支,所述采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵中,包括对所述左原图特征矩阵进行过滤的步骤如下:
    将所述左原图高层特征矩阵输入到所述注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,所述第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层;
    将所述左原图低层特征矩阵输入到所述注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,所述第二分支上包括采用1×1卷积核的卷积层;
    将所述左原图第一输出特征矩阵和所述左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵;
    将所述左原图综合输出特征矩阵和所述左原图低层特征矩阵相加,得到所述过滤后的左原图特征矩阵。
  12. 根据权利要求9所述的计算机设备,其中,所述根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵,包括:
    采用预设的立体匹配算法确定最大视差范围;
    在所述最大视差范围内,级联过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到所述匹配代价矩阵。
  13. 根据权利要求9-12任意一项所述的计算机设备,其中,所述根据所述 目标匹配代价矩阵得到视差图,包括:
    上采样所述目标匹配代价矩阵;
    基于上采样后的所述目标匹配代价矩阵进行回归计算,得到视差值的回归值,其中,所述视差值的回归值表示为
    Figure PCTCN2020119379-appb-100007
    其中,Dmax表示最大视差值,d表示视差值,σ()表示softmax函数,c d表示视差值的损失值,该损失值采用预设的损失函数得到;
    根据所述视差值的回归值得到视差图。
  14. 根据权利要求13所述的计算机设备,其中,所述回归计算在训练阶段产生回归损失值,所述回归损失值采用smoothL1构建的匹配代价损失函数得到,其中,所述匹配代价损失函数表示为
    Figure PCTCN2020119379-appb-100008
    N表示像素总数,d i表示第i个视差值,
    Figure PCTCN2020119379-appb-100009
    表示第i个视差值的回归值,其中,
    Figure PCTCN2020119379-appb-100010
    应用到计算匹配代价损失函数中时所述x表示
    Figure PCTCN2020119379-appb-100011
    根据所述回归损失值对训练阶段的网络参数进行更新。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种基于注意力机制的视差图获取方法的步骤:
    获取左原图和右原图,其中,所述左原图和所述右原图为具有视差关系的图像对;
    采用预先训练的特征提取模型从所述左原图提取得到左原图特征矩阵,以及从所述右原图提取得到右原图特征矩阵,其中,所述左原图特征包括左原图低层特征矩阵和左原图高层特征矩阵,所述右原图特征包括右原图低层特征矩阵和右原图高层特征矩阵;
    采用预设的注意力机制模块过滤所述左原图特征矩阵和所述右原图特征矩阵,其中,所述预设的注意力机制模块用于实现对所述左原图低层特征矩阵和所述左原图高层特征矩阵进行特征选择,以及对所述右原图低层特征矩阵和所述右原图高层特征矩阵进行特征选择;
    根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵;
    将所述匹配代价矩阵输入到预先训练的卷积神经网络中,得到目标匹配代价矩阵;
    根据所述目标匹配代价矩阵得到视差图。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述左原图高层特征矩阵是指在所述特征提取模型中第n层卷积层的输出,所述左原图低层特征矩阵是指在所述特征提取模型中第m层卷积层的输出,其中,0<m<n,所述右原图高层特征矩阵是指在所述特征提取模型中第q层卷积层的输出,所述右原图低层特征矩阵是指在所述特征提取模型中第p层卷积层的输出,其中,0<p<q。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述注意力机制模块包括第一分支和第二分支,所述采用预设的注意力机制模块过滤所述左原图 特征矩阵和所述右原图特征矩阵中,包括对所述左原图特征矩阵进行过滤的步骤如下:
    将所述左原图高层特征矩阵输入到所述注意力机制模块的第一分支上,得到左原图第一输出特征矩阵,其中,所述第一分支上包括采用1×1卷积核的卷积层、批规范层、非线性层和变换层;
    将所述左原图低层特征矩阵输入到所述注意力机制模块的第二分支上,得到左原图第二输出特征矩阵,其中,所述第二分支上包括采用1×1卷积核的卷积层;
    将所述左原图第一输出特征矩阵和所述左原图第二输出特征矩阵在对应元素上进行相乘,得到左原图综合输出特征矩阵;
    将所述左原图综合输出特征矩阵和所述左原图低层特征矩阵相加,得到所述过滤后的左原图特征矩阵。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述根据过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到匹配代价矩阵,包括:
    采用预设的立体匹配算法确定最大视差范围;
    在所述最大视差范围内,级联过滤后的所述左原图特征矩阵和过滤后的所述右原图特征矩阵,得到所述匹配代价矩阵。
  19. 根据权利要求15-18任意一项所述的计算机可读存储介质,其中,所述根据所述目标匹配代价矩阵得到视差图,包括:
    上采样所述目标匹配代价矩阵;
    基于上采样后的所述目标匹配代价矩阵进行回归计算,得到视差值的回归值,其中,所述视差值的回归值表示为
    Figure PCTCN2020119379-appb-100012
    其中,Dmax表示最大视差值,d表示视差值,σ()表示softmax函数,c d表示视差值的损失值,该损失值采用预设的损失函数得到;
    根据所述视差值的回归值得到视差图。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述回归计算在训练阶段产生回归损失值,所述回归损失值采用smoothL1构建的匹配代价损失函数得到,其中,所述匹配代价损失函数表示为
    Figure PCTCN2020119379-appb-100013
    Figure PCTCN2020119379-appb-100014
    N表示像素总数,d i表示第i个视差值,
    Figure PCTCN2020119379-appb-100015
    表示第i个视差值的回归值,其中,
    Figure PCTCN2020119379-appb-100016
    应用到计算匹配代价损失函数中时所述x表示
    Figure PCTCN2020119379-appb-100017
    根据所述回归损失值对训练阶段的网络参数进行更新。
PCT/CN2020/119379 2020-02-18 2020-09-30 基于注意力机制的视差图获取方法和装置 WO2021164269A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010097878.7A CN111340077B (zh) 2020-02-18 2020-02-18 基于注意力机制的视差图获取方法和装置
CN202010097878.7 2020-02-18

Publications (1)

Publication Number Publication Date
WO2021164269A1 true WO2021164269A1 (zh) 2021-08-26

Family

ID=71183509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119379 WO2021164269A1 (zh) 2020-02-18 2020-09-30 基于注意力机制的视差图获取方法和装置

Country Status (2)

Country Link
CN (1) CN111340077B (zh)
WO (1) WO2021164269A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340077B (zh) * 2020-02-18 2024-04-12 平安科技(深圳)有限公司 基于注意力机制的视差图获取方法和装置
CN111985551B (zh) * 2020-08-14 2023-10-27 湖南理工学院 一种基于多重注意力网络的立体匹配算法
CN112581517B (zh) * 2020-12-16 2022-02-18 电子科技大学中山学院 双目立体匹配装置及方法
CN113470099B (zh) * 2021-07-09 2022-03-25 北京的卢深视科技有限公司 深度成像的方法、电子设备及存储介质
WO2023231173A1 (zh) * 2022-06-01 2023-12-07 五邑大学 双目立体匹配方法、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070489A (zh) * 2019-04-30 2019-07-30 中国人民解放军国防科技大学 一种基于视差注意力机制的双目图像超分辨方法
CN110084742A (zh) * 2019-05-08 2019-08-02 北京奇艺世纪科技有限公司 一种视差图预测方法、装置及电子设备
US20190253625A1 (en) * 2017-01-04 2019-08-15 Texas Instruments Incorporated Rear-stitched view panorama for rear-view visualization
CN110188685A (zh) * 2019-05-30 2019-08-30 燕山大学 一种基于双注意力多尺度级联网络的目标计数方法及系统
CN111340077A (zh) * 2020-02-18 2020-06-26 平安科技(深圳)有限公司 基于注意力机制的视差图获取方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750731B (zh) * 2012-07-05 2016-03-23 北京大学 基于左右单眼感受野和双目融合的立体视觉显著计算方法
CN106197417B (zh) * 2016-06-22 2017-11-10 平安科技(深圳)有限公司 手持终端的室内导航方法及手持终端
CN109086653B (zh) * 2018-06-04 2023-04-18 平安科技(深圳)有限公司 手写模型训练方法、手写字识别方法、装置、设备及介质
KR102013649B1 (ko) * 2018-12-20 2019-08-23 아주대학교산학협력단 스테레오 정합을 위한 영상처리 방법 및 이를 이용하는 프로그램

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190253625A1 (en) * 2017-01-04 2019-08-15 Texas Instruments Incorporated Rear-stitched view panorama for rear-view visualization
CN110070489A (zh) * 2019-04-30 2019-07-30 中国人民解放军国防科技大学 一种基于视差注意力机制的双目图像超分辨方法
CN110084742A (zh) * 2019-05-08 2019-08-02 北京奇艺世纪科技有限公司 一种视差图预测方法、装置及电子设备
CN110188685A (zh) * 2019-05-30 2019-08-30 燕山大学 一种基于双注意力多尺度级联网络的目标计数方法及系统
CN111340077A (zh) * 2020-02-18 2020-06-26 平安科技(深圳)有限公司 基于注意力机制的视差图获取方法和装置

Also Published As

Publication number Publication date
CN111340077B (zh) 2024-04-12
CN111340077A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
WO2021164269A1 (zh) 基于注意力机制的视差图获取方法和装置
WO2021057848A1 (zh) 网络的训练方法、图像处理方法、网络、终端设备及介质
CN109766925B (zh) 特征融合方法、装置、电子设备及存储介质
WO2021018163A1 (zh) 神经网络的搜索方法及装置
CN112001914A (zh) 深度图像补全的方法和装置
WO2020228522A1 (zh) 目标跟踪方法、装置、存储介质及电子设备
CN111860398B (zh) 遥感图像目标检测方法、系统及终端设备
WO2019238029A1 (zh) 卷积神经网络系统和卷积神经网络量化的方法
EP4163831A1 (en) Neural network distillation method and device
WO2021189733A1 (zh) 图像处理方法及装置、电子设备、存储介质
WO2023206944A1 (zh) 一种语义分割方法、装置、计算机设备和存储介质
WO2023151511A1 (zh) 模型训练方法、图像去摩尔纹方法、装置及电子设备
WO2023124040A1 (zh) 一种人脸识别方法及装置
CN113379627A (zh) 图像增强模型的训练方法和对图像进行增强的方法
WO2019128248A1 (zh) 一种信号处理方法及装置
CN113963072B (zh) 双目摄像头标定方法、装置、计算机设备和存储介质
CN113033448B (zh) 一种基于多尺度卷积和注意力的遥感影像去云残差神经网络系统、方法、设备及存储介质
CN113298931B (zh) 一种物体模型的重建方法、装置、终端设备和存储介质
CN112528978B (zh) 人脸关键点的检测方法、装置、电子设备及存储介质
WO2021109863A1 (zh) 照片处理方法及照片处理装置
WO2024021504A1 (zh) 人脸识别模型训练方法、识别方法、装置、设备及介质
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
WO2023060575A1 (zh) 图像识别方法、装置、电子设备及存储介质
CN113139490B (zh) 一种图像特征匹配方法、装置、计算机设备及存储介质
CN113361602B (zh) 神经网络模型的训练方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920447

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920447

Country of ref document: EP

Kind code of ref document: A1