WO2020232666A1 - 一种图像处理方法、终端、系统以及计算机可读存储介质 - Google Patents

一种图像处理方法、终端、系统以及计算机可读存储介质 Download PDF

Info

Publication number
WO2020232666A1
WO2020232666A1 PCT/CN2019/087935 CN2019087935W WO2020232666A1 WO 2020232666 A1 WO2020232666 A1 WO 2020232666A1 CN 2019087935 W CN2019087935 W CN 2019087935W WO 2020232666 A1 WO2020232666 A1 WO 2020232666A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
feature
target
vector
feature maps
Prior art date
Application number
PCT/CN2019/087935
Other languages
English (en)
French (fr)
Inventor
李力达
曹子晟
胡攀
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/087935 priority Critical patent/WO2020232666A1/zh
Priority to CN201980007770.XA priority patent/CN111656359A/zh
Publication of WO2020232666A1 publication Critical patent/WO2020232666A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of data processing technology, and in particular to an image processing method, terminal, system and computer-readable storage medium.
  • the above method is only applicable to the case that the set of feature maps T can be linearly represented by a small number of feature maps, and it is not applicable to the case that the set of feature maps T cannot be linearly represented by a small number of feature maps.
  • the above method introduces additional parameters C LOW 1*1 convolution, batch normalization layer and rectification linear unit, which will occupy additional storage space; in addition, additional convolution operations are added, which are in most devices The degree of parallelism is poor, which will lead to longer calculation time in the pooling process and lower pooling efficiency.
  • the embodiment of the present invention discloses an image processing method, terminal, system, and computer-readable storage medium, which can pool feature maps based on grouping, thereby effectively improving pooling efficiency.
  • the first aspect of the embodiments of the present invention discloses an image processing method, the method including:
  • Acquire multiple feature maps and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and any two of the at least two groups
  • the included feature maps are not the same;
  • the second aspect of the embodiments of the present invention discloses a terminal, which includes a memory and a processor,
  • the memory is used to store program instructions
  • the processor is configured to execute program instructions stored in the memory, and when the program instructions are executed, the processor is configured to:
  • Acquire multiple feature maps and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and any two of the at least two groups
  • the included feature maps are not the same;
  • the third aspect of the embodiments of the present invention discloses a system, including: a drone and a mobile terminal, the drone is equipped with a camera and a stabilizer of the camera, the camera is installed on the stabilizer on;
  • the mobile terminal After receiving the multiple images sent by the drone, the mobile terminal processes the multiple images to obtain multiple feature maps, divides the multiple feature maps into at least two groups, and calculates each A set of feature vectors corresponding to feature maps, and pool the multiple feature vectors obtained by calculation to obtain feature vectors corresponding to the multiple feature maps; wherein, each of the at least two groups includes at least one A feature map, and any two of the at least two groups include different feature maps.
  • the fourth aspect of the embodiments of the present invention discloses an unmanned aerial vehicle, which is used to perform the steps of the method described in the first aspect.
  • a fifth aspect of the embodiments of the present invention discloses a photographing device, characterized in that the photographing device is used to execute the steps of the method described in the first aspect.
  • a sixth aspect of the embodiments of the present invention discloses a vehicle, which is characterized in that the vehicle is used to execute the steps of the method described in the first aspect.
  • a seventh aspect of the embodiments of the present invention discloses a mobile terminal, which is characterized in that the mobile terminal is used to execute the steps of the method described in the first aspect.
  • An eighth aspect of the embodiments of the present invention discloses a stabilizer with a photographing device, wherein the stabilizer is used to perform the steps of the method described in the first aspect.
  • a ninth aspect of the embodiments of the present invention discloses a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described in the first aspect are implemented .
  • multiple feature maps output by a neural network are obtained, and the multiple feature maps are divided into at least two groups, and then the feature vectors corresponding to each group of feature maps are calculated respectively, and the calculated multiple feature vectors Pooling is performed to obtain the feature vectors corresponding to the multiple feature maps, so that the feature maps can be pooled based on grouping.
  • Grouping makes the number of feature maps in each group smaller, effectively reducing the amount of calculation for each group, and grouping in parallel Can effectively improve the efficiency of pooling.
  • Fig. 1 is a schematic diagram of a pooling process disclosed in an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an image processing method disclosed in an embodiment of the present invention.
  • Fig. 3 is a geometric schematic diagram of a three-dimensional tensor disclosed in an embodiment of the present invention.
  • Figure 4 is a schematic structural diagram of a terminal disclosed in an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a system disclosed in an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a pooling process according to an embodiment of the present invention.
  • the terminal first obtains multiple feature maps output by the neural network Among them, H represents the number of rows of the feature map, W represents the number of columns of the feature map, and C represents the number of feature maps in the multiple feature maps. Generally, H ⁇ W is much smaller than C.
  • the multiple feature maps output by the neural network are divided into K groups C 1 , C 2 ??C k ; where K is a positive integer greater than 1, and each of the K groups includes at least A feature map, and any two of the K groups include different feature maps.
  • the terminal separately calculates the feature vector corresponding to each group of feature maps, that is, calculates the feature vector 1 corresponding to the feature map of the C 1 group, the feature vector 2 corresponding to the feature map of the C 2 group, and the feature map corresponding to the C K group There are K eigenvectors such as the eigenvector K. Then the calculated K feature vectors are pooled to obtain the target feature vectors corresponding to the multiple feature maps output by the neural network.
  • multiple feature maps output by a neural network are obtained, and the multiple feature maps are divided into at least two groups, and then the feature vectors corresponding to each group of feature maps are calculated respectively, and the calculated multiple feature vectors Pooling is performed to obtain the feature vectors corresponding to the multiple feature maps, so that the feature maps can be pooled based on the grouping, and the feature maps can be pooled by groups to make the number of feature maps in each group far less than the total number of feature maps.
  • the quantity will greatly reduce the calculation amount of each group, effectively reduce the time required for pooling, and improve the efficiency of pooling, which will be described in detail below.
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment of the present invention.
  • the image processing method described in the embodiment of the present invention may include:
  • S201 The terminal obtains multiple feature maps, and divides the multiple feature maps into at least two groups.
  • the terminal may first use its configured neural network to process multiple input images, and then obtain multiple feature maps output by the neural network after processing the multiple input images.
  • the terminal may also obtain multiple feature maps from the network or other terminals, which are output by the neural network after processing multiple images.
  • the terminal groups the acquired multiple feature maps to obtain at least two groups.
  • Each of the at least two groups includes at least one feature map, and any two of the at least two groups include different feature maps.
  • each feature map corresponds to a channel of the neural network; the feature maps included in any two of the at least two groups are not the same, that is, the feature maps included in any two groups have different channels.
  • the number of feature maps included in each of the at least two groups may be the same, and the number of feature maps included in each of the at least two groups may be different or not completely the same.
  • the terminal may group the multiple feature maps according to the channels corresponding to the feature maps to obtain at least two groups. Specifically, the terminal may group the multiple feature maps according to the sorting order of the channels corresponding to the feature maps to obtain at least two groups; or randomly group the multiple feature maps according to the channels corresponding to the feature maps to obtain At least two groups.
  • the terminal before dividing the plurality of feature maps into at least two groups, determines the number of groups K according to the number of the plurality of feature maps, where K is a positive integer greater than 1, and then divides the number of groups according to the determined number of groups K. Multiple feature maps are divided into K groups.
  • the terminal may determine the number of groups K according to the mapping relationship between the number of feature maps and the number of groups stored in advance.
  • S202 The terminal separately calculates the feature vector corresponding to each group of feature maps.
  • the terminal first calculates the ⁇ -th power of the covariance matrix corresponding to the feature map of the target group to obtain the target matrix; and then determines the feature vector corresponding to the feature map of the target group according to the target matrix.
  • the target group is any one of the above at least two groups, and the ⁇ is a positive rational number.
  • the ⁇ may specifically be 0.5.
  • the terminal calculates the ⁇ -th power of the covariance matrix corresponding to the feature map of the target group, and obtains the target matrix by first obtaining the three-dimensional tensor corresponding to the feature map of the target group, and adding the The elements of the first dimension and the elements of the second dimension are merged to obtain the two-dimensional tensor corresponding to the feature map of the target group.
  • the first dimension is the dimension corresponding to the row direction in the three-dimensional tensor
  • the second dimension is the dimension corresponding to the column direction in the three-dimensional tensor.
  • the three-dimensional tensor also includes elements of a third dimension, and the third dimension is a dimension corresponding to the depth direction in the three-dimensional tensor.
  • the covariance matrix of the two-dimensional tensor is calculated, and the ⁇ -th power of the covariance matrix is calculated to obtain the target matrix.
  • the ⁇ -th power of the covariance matrix corresponding to each group of feature maps can be calculated.
  • the covariance matrix can be used to indicate the correlation between the feature maps of the target group. Since the correlation is mutual, the elements located at the upper and lower triangle positions in the target matrix determined above are usually Symmetrical.
  • the use of the covariance matrix to calculate the features of the feature maps of the target group can make better use of the associated information between the feature maps, which can make the subsequent image classification tasks more accurate.
  • the method for the terminal to determine the feature vector corresponding to the feature map of the target group according to the target matrix is to obtain a target element in the target matrix, where the target element is an element located at an upper triangular position or a lower triangular position in the target matrix; Then, according to the position of the target element in the target matrix, the target elements are arranged to generate the feature vector corresponding to the feature map of the target group.
  • the terminal may first arrange the target elements according to the number of rows the target elements belong to in the target matrix, and then for the target elements belonging to the same number of rows in the target matrix, the target elements are arranged according to the target elements in the target matrix. The number of columns belonging to arranging the target elements belonging to the same number of rows.
  • the target elements belonging to the same number of columns are arranged.
  • S203 The terminal pools the multiple feature vectors obtained by calculation to obtain feature vectors corresponding to the multiple feature maps.
  • the terminal obtains the first vector and the second vector corresponding to the target eigenvector, the target eigenvector is any one of the multiple eigenvectors obtained by calculation; then the first matrix is determined according to the first vector, and The second matrix is determined according to the second vector; finally, according to the first matrix and the second matrix, the feature vectors corresponding to the multiple feature maps output by the neural network are determined.
  • the first vector includes the first element corresponding to the element in the target feature vector, and the value of the element other than the first element in the first vector is 0; the second vector corresponds to the first element The value of the second element of is 1, and the value of elements other than the second element in the second vector is 0.
  • the number of elements in the first vector corresponding to each target feature vector is the same, and the number of elements in the first vector corresponding to the target feature vector is the same as the number of elements in the corresponding second vector.
  • the first matrix is determined based on multiple first vectors corresponding to multiple eigenvectors obtained by calculation. One column of elements in the first matrix corresponds to one first vector; the second matrix is based on multiple calculations corresponding to multiple eigenvectors. A second vector is determined, and a column of elements in the second matrix corresponds to a second vector.
  • adding 0 elements to the first vector and the second vector corresponding to the target feature vector is to ensure that the number of elements in the first vector corresponding to each target feature vector is the same, and the number of elements in the second vector corresponds to The number of elements in the first vector is the same, which is convenient for subsequent calculations.
  • the number of feature maps included in each of the at least two groups is the same, the number of elements in the feature vector corresponding to each group of feature maps is the same.
  • the first vector may only include the first element corresponding to the element in the target feature vector; the second vector may only include the second element corresponding to the first element, and the value of the second element is 1. .
  • the first vector includes only the first element corresponding to the element in the target feature vector, and there is no need to determine the second vector and the second matrix, and the first matrix is directly determined according to the multiple first vectors corresponding to the feature maps of each group , And directly determine the feature vectors corresponding to the multiple feature maps output by the neural network according to the first matrix.
  • the terminal determines the eigenvectors corresponding to the multiple eigen maps output by the neural network according to the first matrix and the second matrix: respectively calculating the average value of the third element in each row of the first matrix , And generate the corresponding feature vector output by the neural network according to the calculated average value of the third element in each row; or, calculate the sum of the third element in each row of the first matrix separately, and calculate the The sum of the third element in each row generates the feature vector corresponding to the multiple feature maps output by the neural network; or, the product of the third element in each row of the first matrix is calculated separately, and according to the calculated value in each row
  • the product of the third element is used to generate the corresponding feature vector output by the neural network; or, the maximum value of the third element in each row of the first matrix is obtained, and the third element in each row is obtained according to the The maximum value is used to generate feature vectors corresponding to the multiple feature maps output by the neural network; or, to obtain the minimum value of the third element in each row of the first matrix, and
  • the terminal first obtains multiple feature maps output by the neural network Among them, H represents the number of rows in the feature map, or the number of pixels in a column of the feature map; W represents the number of columns in the feature map, or the number of pixels in a row in the feature map; C is the number of pixels The number of feature maps in a feature map; usually H ⁇ W is much smaller than C.
  • the terminal then divides the multiple feature maps into K groups according to the channels corresponding to the feature maps.
  • the terminal separately calculates the feature vector of each group of feature maps.
  • the first dimension is the dimension corresponding to the row direction in the three-dimensional tensor, and the second dimension is the dimension corresponding to the column direction in the three-dimensional tensor.
  • the i-th feature map includes two feature maps
  • the matrix forms corresponding to the two feature maps are:
  • the number of rows and columns of the feature map is 3, the elements a 1,1 in the matrix can represent the pixel values of the pixels located in the first row and the first column of the feature map, and the other elements in the matrix can be deduced by analogy.
  • the matrix form of the three-dimensional tensor corresponding to the above two feature maps can be obtained as:
  • the three-dimensional tensor is a 3 ⁇ 3 ⁇ 2 matrix, or a three-dimensional matrix with 3 rows, 3 columns, and 2 depth.
  • Figure 3 is the geometric representation of the three-dimensional tensor corresponding to the two feature maps.
  • 301 and 302 respectively represent a feature map
  • the two feature maps represented by 301 and 302 form a three-dimensional tensor.
  • the three-dimensional tensor includes three dimensions. The first dimension is the dimension corresponding to the row direction in the three-dimensional tensor, the second dimension is the dimension corresponding to the column direction in the three-dimensional tensor, and the third dimension is the dimension corresponding to the depth direction in the three-dimensional tensor. .
  • the row direction is the direction corresponding to the vertical axis
  • the column direction is the direction corresponding to the horizontal axis
  • the depth direction is the direction corresponding to the vertical axis.
  • the two-dimensional tensor obtained by merging the elements of the first dimension and the elements of the second dimension in the three-dimensional tensor is a 2 ⁇ 9 matrix, or a matrix with 2 rows and 9 columns; Or the obtained two-dimensional tensor is a 9 ⁇ 2 matrix, or a matrix with 9 rows and 2 columns.
  • the terminal calculates the two-dimensional tensor U i , it calculates the covariance matrix of the two-dimensional tensor U i .
  • the calculation method of the covariance matrix of the two-dimensional tensor U i is shown in Equation 1:
  • ⁇ i is the covariance matrix of the two-dimensional tensor U i
  • ⁇ i is the C i ⁇ C i matrix
  • C i is the number of feature maps in the i-th group of feature maps
  • Is the transposed matrix of the two-dimensional tensor U i
  • I is a unit matrix
  • 1 is a square matrix with all elements of 1
  • H and W are the number of rows and columns of the feature map in the i-th group.
  • the terminal calculates the covariance matrix ⁇ i of the two-dimensional tensor U i , it calculates the ⁇ power of the covariance matrix ⁇ i .
  • the calculation method of the ⁇ power of the covariance matrix ⁇ i is shown in Equation 2:
  • V i ⁇ i ⁇ ,
  • V i is the ⁇ -th power of the covariance matrix ⁇ i
  • V i is the C i ⁇ C i matrix
  • FIG feature of i-th group is located on the angular position of the target element i in V is the matrix C 1,1, C 1,2 and C 2,1, generating feature vectors V i may be [C 1,1, C 1 ,2 , C 2,1 ], or [C 1,1 , C 2,1 , C 1,2 ].
  • V is located in the target element i in the angular position is C 1,2, i-th group and feature map C 2,1 C 2,2, generating feature vectors V i may be [C 1,2, C 2, 1 , C 2,2 ], or [C 2,1 , C 1,2 , C 2,2 ].
  • the terminal determines the feature vector corresponding to each set of features in FIG p i first vector and a second vector m i, determining a first vector matrix P according to the respective first p i, and a second matrix is determined according to the respective second vector m i M.
  • the first matrix Second matrix The first matrix P and the second matrix M are both l ⁇ K matrices; one column of elements in the first matrix P corresponds to a first vector, and one column of elements in the second matrix M corresponds to a second vector. Then determine the multiple feature maps output by the neural network according to the first matrix and the second matrix ⁇ eigenvector v.
  • the value of the element in row j and column i; the value range of j is [1, l], and the value range of i is [1, K].
  • the feature vector v 1 of the first set of feature maps is [8, 10, 6]
  • the feature vector v 2 of the second set of feature maps is [12, 0, 7, 10]
  • the feature vector v 3 of the third group of feature maps is [4, 11, 5, 13, 5].
  • the maximum number of elements in the 3 v is 5, the feature vector corresponding to a first vector v 1 p 1 is [8,10,6,0,0], corresponding to the second m 1 is the vector [1,1,1,0,0]; eigenvector v 2 p 2 corresponding to the first vector [12,0,7,10,0], corresponding to the second vector m 2 [1 , 1,1,1,0]; eigenvectors v 3 p 3 corresponding to the first vector [4,11,5,13,5], corresponding to the second vector m 3 is [1,1,1,1 ,1].
  • the first matrix P determined according to the first vectors p 1 , p 2 , and p 3 and the second matrix M determined according to the second vectors m 1 , m 2 , and m 3 are respectively:
  • the third row element of the first matrix P is in the third matrix M
  • f can also be the sum, maximum, minimum, or product of all elements whose corresponding bits of P in M are non-zero in the row calculation.
  • the specific calculation method can refer to the above description, which will not be repeated here.
  • multiple feature maps output by the neural network can be grouped first, and then the feature vector of each grouped feature map can be calculated in parallel; the number of feature maps in each grouping is caused by the grouping It is much less than the total number of feature maps, which greatly reduces the amount of calculation for each group, effectively reduces the time and computer resources required for pooling, and improves pooling efficiency.
  • the use of the covariance matrix to calculate the features of the feature maps of the target group can make better use of the associated information between the feature maps, which can make the subsequent image classification tasks more accurate.
  • the image processing method in the embodiment of the present invention does not need to introduce additional parameters, it does not need to occupy additional storage space.
  • the image processing method in the embodiment of the present invention does not need to introduce a convolution operation, and the grouping calculation of the covariance-based features of each group of feature maps only involves matrix multiplication, the entire algorithm flow can be executed in a highly parallel manner, effectively saving calculations time. Since the image processing method in the embodiment of the present invention uses grouping to calculate the features of each group of feature maps, the utilization of information between feature maps can be improved, and the information between a large number of feature maps output by the end of the neural network can be better utilized. Since the image processing method in the embodiment of the present invention does not have any restriction on any set of input feature maps, there is no need to require the backbone neural network to have a special structure design.
  • the backbone neural network is required to have two branches, and each branch has the same The size of the output, etc.
  • the image processing method in the embodiment of the present invention supports dividing the multiple feature maps output by the neural network into two or more groups, so the limitation of the image processing method on the input size of the backbone neural network due to the need for fusion can be removed. Since the image processing method in the embodiment of the present invention utilizes the high-level statistical information of multiple feature maps output by the neural network, compared to the image processing method that is only applicable to first-order information, the image processing method in the embodiment of the present invention is adopted. The method can make the accuracy of subsequent image classification tasks higher.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal described in the embodiment of the present invention includes: a processor 401, a communication interface 402, and a memory 403.
  • the processor 401, the communication interface 402, and the memory 403 may be connected through a bus or in other ways.
  • the embodiment of the present invention takes the connection through a bus as an example.
  • the processor 401 may be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP.
  • the processor 401 may also be a multi-core CPU or a core used to implement communication identification binding in a multi-core NP.
  • the processor 401 may be a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
  • the PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
  • the communication interface 402 can be used to exchange information or signaling, and to receive and transmit signals.
  • the memory 403 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system and a storage program required by at least one function (such as a text storage function, a location storage function, etc.); the storage data area may store Data (such as image data, text data) created according to the use of the device, etc., and may include application storage programs, etc.
  • the memory 403 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 403 is also used to store program instructions.
  • the processor 401 is configured to execute program instructions stored in the memory 403, and when the program instructions are executed, the processor 401 is configured to:
  • Acquire multiple feature maps and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and any two of the at least two groups
  • the included feature maps are not the same;
  • each of the at least two groups includes the same number of feature maps.
  • the processor 401 when the processor 401 separately calculates the feature vector corresponding to each group of feature maps, it is specifically configured to: calculate the ⁇ -th power of the covariance matrix corresponding to the feature map of the target group to obtain the target matrix.
  • the target group is any one of the at least two groups, and the ⁇ is a positive rational number; the characteristic vector corresponding to the characteristic map of the target group is determined according to the target matrix.
  • the processor 401 calculates the ⁇ -th power of the covariance matrix corresponding to the feature map of the target group, and when obtaining the target matrix, it is specifically used to: obtain the three-dimensional tensor corresponding to the feature map of the target group; The elements of the first dimension and the elements of the second dimension in the three-dimensional tensor are combined to obtain a two-dimensional tensor, where the first dimension is the dimension corresponding to the row direction in the three-dimensional tensor, and the second dimension is the three-dimensional The dimension corresponding to the column direction in the tensor; calculate the covariance matrix corresponding to the two-dimensional tensor, and calculate the ⁇ -th power of the covariance matrix to obtain the target matrix.
  • the ⁇ is 0.5.
  • the processor 401 determines the feature vector corresponding to the feature map of the target group according to the target matrix, it is specifically configured to: obtain a target element in the target matrix, where the target element is all An element located at an upper triangular position or a lower triangular position in the target matrix; according to the position of the target element in the target matrix, the target elements are arranged to generate a feature vector corresponding to the feature map of the target group.
  • the processor 401 pools the multiple feature vectors obtained by calculation, and when obtaining feature vectors corresponding to the multiple feature maps, it is specifically used to: obtain the first vector and the corresponding target feature vector.
  • the second vector, the target eigenvector is any one of the multiple eigenvectors obtained by calculation; the first matrix is determined according to the first vector, and the second matrix is determined according to the second vector; according to the first The matrix and the second matrix determine the feature vectors corresponding to the multiple feature maps.
  • the first vector includes a first element corresponding to an element in the target feature vector, and elements in the first vector other than the first element take a value of 0;
  • the second element in the second vector corresponding to the first element takes the value 1, and the element other than the second element in the second vector takes the value 0.
  • a column of elements in the first matrix corresponds to a first vector
  • a column of elements in the second matrix corresponds to a second vector
  • the processor 401 when the processor 401 determines the feature vectors corresponding to the multiple feature maps according to the first matrix and the second matrix, it is specifically configured to: calculate each row of the first matrix separately The average value of the third element in each row is calculated, and the eigenvectors corresponding to the multiple feature maps are generated according to the calculated average value of the third element in each row; wherein, the third element is in the first matrix Elements corresponding to non-zero elements in the second matrix.
  • the processor 401 when the processor 401 determines the feature vectors corresponding to the multiple feature maps according to the first matrix and the second matrix, it is specifically configured to: calculate each row of the first matrix separately According to the calculated sum of the third element in each row, the feature vector corresponding to the multiple feature maps is generated; wherein, the third element is the sum of the third elements in the first matrix Describe the elements corresponding to the non-zero elements in the second matrix.
  • the processor 401 when the processor 401 determines the feature vectors corresponding to the multiple feature maps according to the first matrix and the second matrix, it is specifically configured to: calculate each row of the first matrix separately According to the product of the third element in each row obtained by calculation, the eigenvectors corresponding to the multiple feature maps are generated; wherein, the third element is the sum of the third elements in the first matrix Describe the elements corresponding to the non-zero elements in the second matrix.
  • the processor 401 determines the feature vectors corresponding to the multiple feature maps according to the first matrix and the second matrix, it is specifically configured to: obtain each row of the first matrix separately The maximum or minimum value in the third element in each row, and generate the feature vector corresponding to the multiple feature maps according to the obtained maximum or minimum value in the third element in each row; wherein, the third The elements are elements in the first matrix corresponding to non-zero elements in the second matrix.
  • the processor 401, the communication interface 402, and the memory 403 described in the embodiment of the present invention can execute the implementation manner described in the image processing method provided in the embodiment of the present invention, and details are not described herein again.
  • multiple feature maps output by a neural network are obtained, and the multiple feature maps are divided into at least two groups, and then the feature vectors corresponding to each group of feature maps are calculated respectively, and the calculated multiple feature vectors Pooling is performed to obtain the feature vectors corresponding to the multiple feature maps, so that the feature maps can be pooled based on grouping, which effectively improves the pooling efficiency.
  • FIG. 5 is a schematic structural diagram of a system provided by an embodiment of the present invention.
  • the system includes a drone 500 and a mobile terminal 600, a communication connection is established between the drone 500 and the mobile terminal 600, and the mobile terminal 600 corresponds to the aforementioned terminal.
  • the drone 500 is equipped with a camera 502 and a stabilizer 501 of the camera, and the camera 502 is mounted on the stabilizer 501. among them:
  • the drone 500 controls the camera 502 to take pictures to obtain multiple images, and sends the multiple images to the mobile terminal 600.
  • the mobile terminal 600 may control the drone 500 to fly along the route, and during the flight of the drone 500 along the route, control the camera 502 to take pictures to obtain multiple images.
  • the mobile terminal 600 After receiving the multiple images sent by the drone, the mobile terminal 600 first processes the multiple images to obtain multiple feature maps, and divides the multiple feature maps into at least two groups, and then Calculate the feature vector corresponding to each set of feature maps, and pool the multiple feature vectors obtained to obtain the feature vector corresponding to the multiple feature maps; wherein, each of the at least two groups is grouped At least one feature map is included, and any two of the at least two groups include different feature maps.
  • the mobile terminal 600 described in the embodiment of the present invention can execute the implementation manner described in the image processing method provided in the embodiment of the present invention, which is not repeated here.
  • An embodiment of the present invention also provides a drone, which corresponds to the aforementioned terminal.
  • the drone is used to: obtain multiple feature maps, and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and the at least two The feature maps included in any two of the groups are not the same; calculate the feature vector corresponding to each group of feature maps separately; pool the multiple feature vectors obtained by the calculation to obtain the feature vector corresponding to the multiple feature maps .
  • the multiple feature maps may be obtained by the drone processing the images collected by its equipped camera; the multiple feature maps may also be obtained by the drone from other terminals.
  • the drone described in the embodiment of the present invention can execute the implementation described in the image processing method provided in the embodiment of the present invention, which will not be repeated here.
  • the embodiment of the present invention also provides a photographing device corresponding to the aforementioned terminal.
  • the photographing device is configured to: acquire multiple feature maps, and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and the at least two The feature maps included in any two groups in the grouping are different; the feature vectors corresponding to each group of feature maps are calculated separately; the multiple feature vectors obtained by the calculation are pooled to obtain the feature vectors corresponding to the multiple feature maps.
  • the multiple feature maps may be obtained by processing the images collected by the camera; the multiple feature maps may also be obtained by the camera from other terminals.
  • the photographing device described in the embodiment of the present invention can execute the implementation manner described in the image processing method provided in the embodiment of the present invention, which is not repeated here.
  • the embodiment of the present invention also provides a vehicle, the vehicle corresponding to the aforementioned terminal, and the vehicle may be a car, a bicycle, a boat, or the like.
  • the vehicle is used to: obtain multiple feature maps, and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and the at least two The feature maps included in any two groups in the grouping are different; the feature vectors corresponding to each group of feature maps are calculated separately; the multiple feature vectors obtained by the calculation are pooled to obtain the feature vectors corresponding to the multiple feature maps.
  • the multiple feature maps may be obtained by the vehicle processing the images collected by its configured photographing device; the multiple feature maps may also be obtained by the vehicle from other terminals.
  • the vehicle described in the embodiment of the present invention can execute the implementation manner described in the image processing method provided in the embodiment of the present invention, which is not repeated here.
  • the embodiment of the present invention also provides a mobile terminal corresponding to the aforementioned terminal.
  • the mobile terminal is configured to: acquire multiple feature maps, and divide the multiple feature maps into at least two groups, each of the at least two groups includes at least one feature map, and the at least two The feature maps included in any two groups in the grouping are different; the feature vectors corresponding to each group of feature maps are calculated separately; the multiple feature vectors obtained by the calculation are pooled to obtain the feature vectors corresponding to the multiple feature maps.
  • the multiple feature maps may be obtained by the mobile terminal processing images collected by its configured photographing device; the multiple feature maps may also be obtained by the mobile terminal from other terminals.
  • the mobile terminal described in the embodiment of the present invention can execute the implementation manner described in the image processing method provided in the embodiment of the present invention, which will not be repeated here.
  • An embodiment of the present invention also provides a stabilizer with a photographing device, and the stabilizer with a photographing device corresponds to the aforementioned terminal.
  • the stabilizer with a photographing device is used to: obtain a plurality of feature maps, and divide the plurality of feature maps into at least two groups, each of the at least two groups includes at least one feature map, the Any two of the at least two groups include different feature maps; calculate the feature vector corresponding to each group of feature maps separately; pool the multiple calculated feature vectors to obtain the corresponding feature maps Feature vector.
  • the multiple feature maps may be obtained by the stabilizer processing images collected by its configured shooting device; the multiple feature maps may also be obtained by the stabilizer from other terminals.
  • the stabilizer with a photographing device described in the embodiment of the present invention can execute the implementation manner described in an image processing method provided by the embodiment of the present invention, which is not repeated here.
  • the embodiment of the present invention also provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the image processing method described in the foregoing method embodiment is implemented.
  • the embodiment of the present invention also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the image processing method described in the foregoing method embodiment.
  • the program can be stored in a computer-readable storage medium, and the storage medium can include: Flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法、终端、系统以及及计算机可读存储介质,其中,所述方法包括:获取神经网络输出的多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。通过本发明实施例可以基于分组对特征图进行池化,分组使得每一组的特征图数量较少,有效减少每一组的计算量,分组并行计算可以有效提高池化效率。

Description

一种图像处理方法、终端、系统以及计算机可读存储介质 技术领域
本发明涉及数据处理技术领域,尤其涉及一种图像处理方法、终端、系统以及计算机可读存储介质。
背景技术
近年来,卷积神经网络已经广泛应用于各个领域,尤其在图像分类应用中具有很好的分类效果。如何将卷积神经网络输出的一组特征图T进行池化,进而得到有辨别性的特征向量是当前的研究热点。
由于卷积神经网络输出的一组特征图T的数量C一般都非常大,通常在几千的数量级,直接对该组特征图进行池化则会占用大量计算资源,池化效率低。目前,在对该组特征图进行池化之前,首先依次使用C LOW(C LOW通常在10 1-10 2数量级,C LOW远小于C)个1*1卷积、批规范化层、整流线性单元三层,将该组特征图T转换为低数量级的一组特征图T’。然后对转换得到的低数量级的一组特征图T’进行池化得到特征向量。
但上述方式只适用于该组特征图T可以由其中少量的特征图线性表示的情况,对于该组特征图T不可以由其中少量的特征图线性表示的情况则不适用。并且上述方式引入了额外的参数C LOW个1*1卷积、批规范化层和整流线性单元,会占据额外的存储空间;另外还增加了额外的卷积操作,卷积操作在大多数设备中并行程度较差,会导致池化过程中的计算时间变长,池化效率也较低。
发明内容
本发明实施例公开了一种图像处理方法、终端、系统以及计算机可读存储介质,可以基于分组对特征图进行池化,有效提高池化效率。
本发明实施例第一方面公开了一种图像处理方法,所述方法包括:
获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;
分别计算每一组特征图对应的特征向量;
将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
本发明实施例第二方面公开了一种终端,包括:存储器和处理器,
所述存储器,用于存储程序指令;
所述处理器,用于执行所述存储器存储的程序指令,当所述程序指令被执行时,所述处理器用于:
获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;
分别计算每一组特征图对应的特征向量;
将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
本发明实施例第三方面公开了一种系统,包括:无人机和移动终端,所述无人机上配置有拍摄装置和所述拍摄装置的稳定器,所述拍摄装置安装于所述稳定器上;
所述无人机在沿航线飞行的过程中,控制所述拍摄装置拍照得到多张图像,并将所述多张图像发送给所述移动终端;
所述移动终端在接收到所述无人机发送的多张图像之后,对所述多张图像进行处理得到多张特征图,将所述多张特征图分成至少两个分组,并分别计算每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量;其中,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同。
本发明实施例第四方面公开了一种无人机,所述无人机用于执行如上述第一方面所述方法的步骤。
本发明实施例第五方面公开了一种拍摄装置,其特征在于,所述拍摄装置用于执行如上述第一方面所述方法的步骤。
本发明实施例第六方面公开了一种交通工具,其特征在于,所述交通工具用于执行如上述第一方面所述方法的步骤。
本发明实施例第七方面公开了一种移动终端,其特征在于,所述移动终端用于执行如上述第一方面所述方法的步骤。
本发明实施例第八方面公开了一种具有拍摄装置的稳定器,其特征在于,所述稳定器用于执行如上述第一方面所述方法的步骤。
本发明实施例第九方面公开了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面所述方法的步骤。
本发明实施例通过获取神经网络输出的多张特征图,并将该多张特征图分成至少两个分组,然后分别计算每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到该多张特征图对应的特征向量,从而可以基于分组对特征图进行池化,分组使得每一组的特征图数量较少,有效减少每一组的计算量,分组并行计算可以有效提高池化效率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例公开的一种池化过程的示意图;
图2是本发明实施例公开的一种图像处理方法的流程示意图;
图3是本发明实施例公开的一种三维张量的几何示意图;
图4是本发明实施例公开的一种终端的结构示意图;
图5是本发明实施例公开的一种系统的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
请参阅图1,图1为本发明实施例提供的一种池化过程的示意图。如图1所示,终端首先获取神经网络输出的多张特征图
Figure PCTCN2019087935-appb-000001
其中,H表示特征 图的行数,W表示特征图的列数,C表示该多张特征图中特征图的数量,通常存在H×W远小于C。然后将神经网络输出的多张特征图分成C 1、C 2......C k共K个分组;其中,K为大于1的正整数,该K个分组中的每一个分组包括至少一张特征图,并且该K个分组中的任意两个分组包括的特征图不相同。进一步地,终端分别计算每一组特征图对应的特征向量,也即计算C 1分组的特征图对应的特征向量1,C 2分组的特征图对应的特征向量2,C K分组的特征图对应的特征向量K等共K个特征向量。然后将计算得到的K个特征向量进行池化,得到神经网络输出的多张特征图对应的目标特征向量。本发明实施例通过获取神经网络输出的多张特征图,并将该多张特征图分成至少两个分组,然后分别计算每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到该多张特征图对应的特征向量,从而可以基于分组对特征图进行池化,分组对特征图进行池化可以使得每一个分组中的特征图的数量远少于特征图总数量,会使得每一组的计算量大大减少,有效减少池化所需时间,提高池化效率,以下进行详细说明。
请参阅图2,图2为本发明实施例提供的一种图像处理方法的流程示意图。本发明实施例中所描述的图像处理方法可以包括:
S201、终端获取多张特征图,并将所述多张特征图分成至少两个分组。
本发明实施例中,终端可以是首先利用其配置的神经网络对输入的多张图像进行处理,然后获取神经网络在对输入的多张图像进行处理后输出的多张特征图。终端也可以是从网络中或者其他终端处获取神经网络对多张图像进行处理后输出的多张特征图。进一步地,终端对获取到的多张特征图进行分组,得到至少两个分组。该至少两个分组中的每一个分组包括至少一张特征图,该至少两个分组中的任意两个分组包括的特征图不相同。其中,每一张特征图对应神经网络的一个通道;该至少两个分组中的任意两个分组包括的特征图不相同,也即是指任意两个分组包括的特征图对应的通道不相同。该至少两个分组中的每一个分组包括的特征图的数量可以相同,该至少两个分组中的各个分组包括的特征图的数量可以不相同或者不完全相同。
在一实施方式中,终端可以是按照特征图对应的通道对该多张特征图进行分组,得到至少两个分组。具体地,终端可以是按照特征图对应的通道的排序 顺序对该多张特征图进行分组,得到至少两个分组;也可以是按照特征图对应的通道随机对该多张特征图进行分组,得到至少两个分组。
在一实施方式,终端将该多张特征图分成至少两个分组之前,根据该多张特征图的数量确定分组数目K,K为大于1的正整数;然后根据确定出的分组数目K将该多张特征图分成K个分组。其中,终端可以是根据预先存储的特征图的数量与分组数目之间的映射关系,确定分组数目K的。
S202、所述终端分别计算每一组特征图对应的特征向量。
本发明实施例中,终端首先计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵;然后根据该目标矩阵确定目标分组的特征图对应的特征向量。其中,目标分组为上述至少两个分组中的任意一个,所述α为正有理数。在一实施方式中,所述α具体可以为0.5。
在一实施方式中,终端计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵的方式为:首先获取目标分组的特征图对应的三维张量,并将所述三维张量中的第一维度的元素和第二维度的元素进行合并,得到目标分组的特征图对应的二维张量。其中,第一维度为三维张量中的行方向对应的维度,第二维度为三维张量中的列方向对应的维度。所述三维张量中还包括第三维度的元素,第三维度为三维张量中的深度方向对应的维度。进一步地,计算所述二维张量的协方差矩阵,并计算所述协方差矩阵的α次幂,得到目标矩阵。以此类推,可以计算得到每一组特征图对应的协方差矩阵的α次幂。需要说明的是,协方差矩阵可以用于指示目标分组的特征图之间的关联性,由于关联性是相互的,故上述确定出的目标矩阵中位于上三角位置和下三角位置的元素通常是对称的。另外,采用协方差矩阵来计算目标分组的特征图的特征,可以更好的利用特征图之间的关联信息,可以使得在后续的图像分类任务上准确率更高。
在一实施方式中,终端根据目标矩阵确定目标分组的特征图对应的特征向量的方式为:获取目标矩阵中的目标元素,该目标元素为目标矩阵中位于上三角位置或者下三角位置的元素;然后根据目标元素在目标矩阵中的位置,对目标元素进行排列,生成目标分组的特征图对应的特征向量。在一实施方式中,终端具体可以是先按照目标元素在目标矩阵中所属的行数对目标元素进行排列,然后对于在目标矩阵中属于同一行数的目标元素,则分别按照目标元素在 目标矩阵中所属的列数对所述属于同一行数的目标元素进行排列。或者,先按照目标元素在目标矩阵中所属的列数对目标元素进行排列,然后对于在目标矩阵中属于同一列数的目标元素,则分别按照目标元素在目标矩阵中所属的行数对所述属于同一列数的目标元素进行排列。
S203、所述终端将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
本发明实施例中,终端获取目标特征向量对应的第一向量和第二向量,目标特征向量为计算得到的多个特征向量中的任意一个;然后根据所述第一向量确定第一矩阵,并根据所述第二向量确定第二矩阵;最后根据所述第一矩阵以及所述第二矩阵,确定神经网络输出的多张特征图对应的特征向量。其中,第一向量中包括与目标特征向量中的元素对应的第一元素,第一向量中除所述第一元素之外的元素取值为0;第二向量中与所述第一元素对应的第二元素取值为1,第二向量中除所述第二元素之外的元素取值为0。各个目标特征向量对应的第一向量中的元素数量相同,目标特征向量对应的第一向量中的元素数量和对应的第二向量中的元素数量相同。第一矩阵是根据计算得到的多个特征向量对应的多个第一向量确定的,第一矩阵中的一列元素对应一个第一向量;第二矩阵是根据计算得到的多个特征向量对应的多个第二向量确定的,第二矩阵中的一列元素对应一个第二向量。
需要说明的是,在目标特征向量对应的第一向量和第二向量中添加0元素,是为了保证各个目标特征向量对应的第一向量中的元素数量相同,第二向量中的元素数量与相应第一向量中的元素数量相同,便于后续计算。对于上述至少两个分组中每一个分组包括的特征图的数量相同的情况,每一组特征图对应的特征向量中的元素数量相同。此时,第一向量中可以只包括与目标特征向量中的元素对应的第一元素;第二向量中可以只包括与所述第一元素对应的第二元素,第二元素的取值为1。或者,第一向量中只包括与目标特征向量中的元素对应的第一元素,并且无需确定第二向量和第二矩阵,直接根据各个分组的特征图对应的多个第一向量确定第一矩阵,并根据第一矩阵直接确定神经网络输出的多张特征图对应的特征向量。
在一实施方式中,终端根据第一矩阵以及第二矩阵,确定神经网络输出的 多张特征图对应的特征向量的方式为:分别计算所述第一矩阵每一行中的第三元素的平均值,并根据计算得到的各行中的第三元素的平均值,生成神经网络输出的对应的特征向量;或者,分别计算所述第一矩阵每一行中的第三元素的和,并根据计算得到的各行中的第三元素的和,生成神经网络输出的多张特征图对应的特征向量;或者,分别计算所述第一矩阵每一行中的第三元素的乘积,并根据计算得到的各行中的第三元素的乘积,生成神经网络输出的对应的特征向量;或者,分别获取所述第一矩阵每一行中的第三元素中的最大值,并根据获取到的各行中的第三元素中的最大值,生成神经网络输出的多张特征图对应的特征向量;或者,分别获取所述第一矩阵每一行中的第三元素中的最小值,并根据获取到的各行中的第三元素中的最小值,生成神经网络输出的多张特征图对应的特征向量。其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
为更好的理解本发明实施例中的图像处理方法,下面结合相应公式以及例子进行详细说明。终端首先获取神经网络输出的多张特征图
Figure PCTCN2019087935-appb-000002
其中,H表示特征图的行数,或者说用于表示特征图中一列像素点的数量;W表示特征图的列数,或者说用于表示特征图中一行像素点的数量;C为该多张特征图中特征图的数量;通常存在H×W远小于C。终端然后按照特征图对应的通道将该多张特征图分成K组。其中,K为大于1的正整数;第i组包含C i张特征图,第i组特征图为
Figure PCTCN2019087935-appb-000003
i的取值范围为[1,k],并且满足C=∑ iC i,∑函数表示求和。在一实施方式中,K可以被C整除,并且C 1=C 2=…=C K;在另一实施方式中,C 1、C 2、...、C K不相等或者不完全相等。
进一步地,终端分别计算各组特征图的特征向量。对于第i组特征图,终端首先获取第i组特征图对应的三维张量,并将该三维张量中的第一维度的元素和第二维度的元素进行合并,得到二维张量
Figure PCTCN2019087935-appb-000004
其中,d=H×W;二维张量U i为d×C i矩阵。第一维度为该三维张量中的行方向对应的维度,第二维度为该三维张量中的列方向对应的维度。为便于说明,假设第i组特征图中包括两张特征图,该两张特征图对应的矩阵形式分别为:
Figure PCTCN2019087935-appb-000005
Figure PCTCN2019087935-appb-000006
其中,特征图的行列数均为3,矩阵中的元素a 1,1可以表示特征图中位于第一行第一列的像素点的像素值,矩阵中的其他元素则以此类推。根据上述两个矩阵可以得到上述两张特征图对应的三维张量的矩阵形式为:
Figure PCTCN2019087935-appb-000007
其中,该三维张量为一个3×3×2矩阵,或者说为一个行数为3、列数为3、深度为2的三维矩阵。请一并参见图3,为上述两张特征图对应的三维张量的几何表示。如图3所示,301和302分别表示一张特征图,301和302表示的两张特征图形成一个三维张量。该三维张量包括三个维度,第一维度为该三维张量中的行方向对应的维度,第二维度为该三维张量中的列方向对应的维度,第三维度为该三维张量中的深度方向对应的维度。在三维空间坐标系中,行方向为纵轴对应的方向,列方向为横轴对应的方向,深度方向为竖轴对应的方向。将该三维张量中的第一维度的元素和第二维度的元素进行合并后得到的二维张量的矩阵形式为:
Figure PCTCN2019087935-appb-000008
或者为:
Figure PCTCN2019087935-appb-000009
其中,将该三维张量中的第一维度的元素和第二维度的元素进行合并后得到的二维张量为一个2×9矩阵,或者说为一个行数为2、列数为9的矩阵;或者得到的二维张量为一个9×2矩阵,或者说为一个行数为9、列数为2的矩阵。
终端计算得到二维张量U i之后,计算二维张量U i的协方差矩阵。二维张量U i的协方差矩阵的计算方式如式一所示:
Figure PCTCN2019087935-appb-000010
其中,Σ i为二维张量U i的协方差矩阵,Σ i为C i×C i矩阵,C i为第i组特征图中特征图的数量;
Figure PCTCN2019087935-appb-000011
为二维张量U i的转置矩阵;
Figure PCTCN2019087935-appb-000012
I为单位阵,1为各项元素全为1的方阵,I和1均为d×d矩阵,d=H×W,H与W分别为第i组特征图中特征图的行列数。
终端计算得到二维张量U i的协方差矩阵Σ i之后,计算协方差矩阵Σ i的α次幂。协方差矩阵Σ i的α次幂的计算方式如式二所示:
V i=Σ i α
Figure PCTCN2019087935-appb-000013
其中,V i为协方差矩阵Σ i的α次幂,V i为C i×C i矩阵;α为正有理数,在一实施方式中,α=0.5;可以采用迭代法针对式二进行迭代求解,因迭代过程只包含简单的矩阵加法与乘法,整个求解过程便于并行计算,求解效率高;也可以采用本征分解的方式对式二进行求解。
终端计算得到协方差矩阵Σ i的α次幂V i之后,获取矩阵V i中位于上三角位置或者下三角位置的目标元素;然后根据目标元素在矩阵中的位置,将目标元素排列成第i组特征图的特征向量
Figure PCTCN2019087935-appb-000014
其中,特征向量v i中的元素数量为l i,l i=C i×(C i+1)/2,C i为第i组特征图中特征图的数量。结合前文所述的例子,假设协方差矩阵Σ i的α次幂V i的矩阵形式为:
Figure PCTCN2019087935-appb-000015
其中,矩阵V i中的元素对称,或者说C 1,1与C 1,2相同,C 1,2与C 2,1相同。位于矩阵V i中上三角位置的目标元素为C 1,1、C 1,2和C 2,1,则生成的第i组特征图的特征向量v i可以为[C 1,1,C 1,2,C 2,1],或者为[C 1,1,C 2,1,C 1,2]。位于V i中下三角位置的目标元素为C 1,2、C 2,1和C 2,2,则生成的第i组特征图的特征向量v i可以为[C 1,2,C 2,1,C 2,2],或者为[C 2,1,C 1,2,C 2,2]。采用上述方式,可以计算得到各组特征图的特征向量v i,i=1,…,K。
进一步地,终端计算得到各组特征图的特征向量之后,将各组特征图的特征向量v i,i=1,…,K进行池化,得到神经网络输出的多张特征图
Figure PCTCN2019087935-appb-000016
的特征向量v。在一实施方式中,终端首先计算l i,i=1,…,K的最大值l, l=max{l i};l i为特征向量v i中的元素数量。然后确定各组特征图的特征向量v i,i=1,…,K分别对应的第一向量和第二向量。对于
Figure PCTCN2019087935-appb-000017
p i=[v i;0 i]为第i组特征图的特征向量v i对应的第一向量,m i=[1 i;0 i]为第i组特征图的特征向量v i对应的第二向量;其中,
Figure PCTCN2019087935-appb-000018
为各项元素全为1的向量,1 i中的元素数量为l i
Figure PCTCN2019087935-appb-000019
为各项元素全为0的向量,0 i中的元素数量为l-l i;第二向量中的元素数量和第一向量中的元素数量均为l。结合前文中的例子,假设生成的第i组特征图的特征向量v i为[C 1,1,C 1,2,C 2,1],l i,i=1,…,K的最大值l为5;则第i组特征图的特征向量v i对应的第一向量为[C 1,1,C 1,2,C 2,1,0,0];第i组特征图的特征向量v i对应的第二向量为[1,1,1,0,0]。
终端确定出各组特征图的特征向量对应的第一向量p i以及第二向量m i之后,根据各个第一向量p i确定第一矩阵P,并根据各个第二向量m i确定第二矩阵M。其中,第一矩阵
Figure PCTCN2019087935-appb-000020
第二矩阵
Figure PCTCN2019087935-appb-000021
第一矩阵P和第二矩阵M均为l×K矩阵;第一矩阵P中的一列元素对应一个第一向量,第二矩阵M中的一列元素对应一个第二向量。然后根据第一矩阵和第二矩阵确定神经网络输出的多张特征图
Figure PCTCN2019087935-appb-000022
的特征向量v。其中,
Figure PCTCN2019087935-appb-000023
特征向量v中的元素数量为l;f表示按行计算P在M中对应位置上为非零的所有元素的平均值,即v[j]=(∑ iP j,i×(M j,i≠0))/(∑ i(M j,i≠0)),P j,i表示第一矩阵P中第j行第i列元素的取值,M j,i表示第一矩阵M中第j行第i列元素的取值;j的取值范围为[1,l],i的取值范围为[1,K]。
假设将神经网络输出的多张特征图总共分成三组,第一组特征图的特征向量v 1为[8,10,6],第二组特征图的特征向量v 2为[12,0,7,10],第三组特征图的特征向量v 3为[4,11,5,13,5]。可见,特征向量v 1、v 2、v 3中的元素数量最大值为5,则特征向量v 1对应的第一向量p 1为[8,10,6,0,0],对应的第二向量m 1为[1,1,1,0,0];特征向量v 2对应的第一向量p 2为[12,0,7,10,0],对应的第二向量m 2为[1,1,1,1,0];特征向量v 3对应的第一向量p 3为[4,11,5,13,5],对应的第二向量m 3为[1,1,1,1,1]。根据第一向量p 1、p 2、p 3确定出的第一矩阵P,以及根据第二向量m 1、m 2、m 3确定出的第二矩阵M分别为:
Figure PCTCN2019087935-appb-000024
其中,第一矩阵P的第一行元素中在第二矩阵M中对应位置上为非零的所有元素的平均值为(8+12+4)/3=8;第一矩阵P的第二行元素中在第二矩阵M中对应位置上为非零的所有元素的平均值为(10+0+11)/3=7;第一矩阵P的第三行元素中在第三矩阵M中对应位置上为非零的所有元素的平均值为(6+7+5)/3=6。第一矩阵P的第四行元素中在第二矩阵M中对应位置上为非零的所有元素的平均值为(10+13)/2=11.5;第一矩阵P的第五行元素中在第二矩阵M中对应位置上为非零的所有元素的平均值为(5)/1=5。根据上述结果,可以确定出神经网络输出的多张特征图T的特征向量v为[8,7,6,11.5,5]。
需要说明的是,f还可以是表示按行计算P在M中对应位上为非零的所有元素的和、最大值、最小值或者乘积。具体计算方式可参照上述描述,此处不再赘述。
采用本发明实施例提供的图像处理方法,可以先将神经网络输出的多张特征图进行分组,然后并行计算每一个分组的特征图的特征向量;由于分组使得每一个分组中的特征图的数量远少于特征图总数量,从而使得每一组的计算量大大减少,有效减少池化所需时间以及计算机资源,提高池化效率。另外,采用协方差矩阵来计算目标分组的特征图的特征,可以更好的利用特征图之间的关联信息,可以使得在后续的图像分类任务上准确率更高。另外,由于本发明实施例中的图像处理方法不需引入额外参数,因此无需占用额外的存储空间。由于本发明实施例中的图像处理方法不需引入卷积操作,并且分组计算每一组特征图的基于协方差的特征时仅涉及矩阵的乘法,因此整个算法流程可高度并行执行,有效节省运算时间。由于本发明实施例中的图像处理方法采用了分组计算每一组特征图的特征,可以提高特征图之间信息的利用率,更好地利用神经网络末端输出的大量特征图之间的信息。由于本发明实施例中的图像处理方法对于输入的任一组特征图没有任何限制,因此无需要求主干神经网络有特殊的结构设计,例如要求主干神经网络存在两路分支,每一路的分支有相同大小 的输出等。另外,本发明实施例中的图像处理方法支持将神经网络输出的多张特征图分成两组甚至更多组,因此可以去除因为融合需要,图像处理方法对于主干神经网络的输入大小的限制。本发明实施例中的图像处理方法由于利用了神经网络输出的多张特征图的高阶统计信息,因此,相比仅适用于一阶信息的图像处理方法,采用本发明实施例中的图像处理方法可以使得在后续的图像分类任务上准确率更高。
请参阅图4,图4为本发明实施例提供的一种终端的结构示意图。本发明实施例中所描述的终端包括:处理器401、通信接口402、存储器403。其中,处理器401、通信接口402、存储器403可通过总线或其他方式连接,本发明实施例以通过总线连接为例。
处理器401可以是中央处理器(central processing unit,CPU),网络处理器(network processor,NP),或者CPU和NP的组合。处理器401也可以是多核CPU、或多核NP中用于实现通信标识绑定的核。
所述处理器401可以是硬件芯片。所述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。所述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
所述通信接口402可用于收发信息或信令的交互,以及信号的接收和传递。所述存储器403可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的存储程序(比如文字存储功能、位置存储功能等);存储数据区可存储根据装置的使用所创建的数据(比如图像数据、文字数据)等,并可以包括应用存储程序等。此外,存储器403可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述存储器403还用于存储程序指令。所述处理器401,用于执行所述存储器403存储的程序指令,当所述程序指令被执行时,所述处理器401用于:
获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分 组包括的特征图不相同;
分别计算每一组特征图对应的特征向量;
将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
本发明实施例中处理器执行的方法均从处理器的角度来描述,可以理解的是,本发明实施例中处理器要执行上述方法需要其他硬件结构的配合。本发明实施例对具体的实现过程不作详细描述和限制。
在一实施方式中,所述至少两个分组中的每一个分组包括的特征图的数量相同。
在一实施方式中,所述处理器401分别计算每一组特征图对应的特征向量时,具体用于:计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵,所述目标分组为所述至少两个分组中的任意一个,所述α为正有理数;根据所述目标矩阵确定所述目标分组的特征图对应的特征向量。
在一实施方式中,所述处理器401计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵时,具体用于:获取目标分组的特征图对应的三维张量;将所述三维张量中的第一维度的元素和第二维度的元素进行合并,得到二维张量,所述第一维度为所述三维张量中的行方向对应的维度,所述第二维度为所述三维张量中的列方向对应的维度;计算所述二维张量对应的协方差矩阵,并计算所述协方差矩阵的α次幂,得到目标矩阵。
在一实施方式中,所述α为0.5。
在一实施方式中,所述处理器401根据所述目标矩阵确定所述目标分组的特征图对应的特征向量时,具体用于:获取所述目标矩阵中的目标元素,所述目标元素为所述目标矩阵中位于上三角位置或者下三角位置的元素;根据所述目标元素在所述目标矩阵中的位置,对所述目标元素进行排列,生成所述目标分组的特征图对应的特征向量。
在一实施方式中,所述处理器401将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量时,具体用于:获取目标特征向量对应的第一向量和第二向量,所述目标特征向量为计算得到的多个特征向量中的任意一个;根据所述第一向量确定第一矩阵,并根据所述第二向量确定第二矩阵; 根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量。
在一实施方式中,所述第一向量中包括与所述目标特征向量中的元素对应的第一元素,所述第一向量中除所述第一元素之外的元素取值为0;所述第二向量中与所述第一元素对应的第二元素取值为1,所述第二向量中除所述第二元素之外的元素取值为0。
在一实施方式中,所述第一矩阵中的一列元素对应一个第一向量,所述第二矩阵中的一列元素对应一个第二向量。
在一实施方式中,所述处理器401根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:分别计算所述第一矩阵每一行中的第三元素的平均值,并根据计算得到的各行中的第三元素的平均值,生成所述多张特征图对应的特征向量;其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
在一实施方式中,所述处理器401根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:分别计算所述第一矩阵每一行中的第三元素的和,并根据计算得到的各行中的第三元素的和,生成所述多张特征图对应的特征向量;其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
在一实施方式中,所述处理器401根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:分别计算所述第一矩阵每一行中的第三元素的乘积,并根据计算得到的各行中的第三元素的乘积,生成所述多张特征图对应的特征向量;其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
在一实施方式中,所述处理器401根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:分别获取所述第一矩阵每一行中的第三元素中的最大值或者最小值,并根据获取到的各行中的第三元素中的最大值或者最小值,生成所述多张特征图对应的特征向量;其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
具体实现中,本发明实施例中所描述的处理器401、通信接口402、存储器403可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在 此不再赘述。
本发明实施例通过获取神经网络输出的多张特征图,并将该多张特征图分成至少两个分组,然后分别计算每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到该多张特征图对应的特征向量,从而可以基于分组对特征图进行池化,有效提高池化效率。
请参阅图5,图5为本发明实施例提供的一种系统的架构示意图。如图5所示,该系统包括无人机500和移动终端600,所述无人机500和所述移动终端600之间建立有通信连接,移动终端600对应前文所述的终端。所述无人机500上配置有拍摄装置502和所述拍摄装置的稳定器501,所述拍摄装置502安装于所述稳定器501上。其中:
所述无人机500在沿航线飞行的过程中,控制所述拍摄装置502拍照得到多张图像,并将所述多张图像发送给所述移动终端600。其中,可以是由所述移动终端600控制所述无人机500沿航线飞行,并在所述无人机500沿航线飞行的过程中,控制所述拍摄装置502拍照得到多张图像。所述移动终端600在接收到所述无人机发送的多张图像之后,先对所述多张图像进行处理得到多张特征图,并将所述多张特征图分成至少两个分组,然后分别计算每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量;其中,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同。具体实现中,本发明实施例中所描述的移动终端600可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种无人机,所述无人机对应前文所述的终端。所述无人机用于:获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。其中,所述多张特征图可以是无人机对其配置的拍摄装置采集到的图像进行处理得到的;所述多张特征图也可以是无人机从其他终端处获取到的。具体实现中,本发明实施例中所描述的无人机可执行本发明实施例提供的一种图像处理方 法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种拍摄装置,所述拍摄装置对应前文所述的终端。所述拍摄装置用于:获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。其中,所述多张特征图可以是拍摄装置针对其采集到的图像进行处理得到的;所述多张特征图也可以是拍摄装置从其他终端处获取到的。具体实现中,本发明实施例中所描述的拍摄装置可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种交通工具,所述交通工具对应前文所述的终端,所述交通工具可以是汽车、单车、船等。所述交通工具用于:获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。其中,所述多张特征图可以是交通工具对其配置的拍摄装置采集到的图像进行处理得到的;所述多张特征图也可以是交通工具从其他终端处获取到的。具体实现中,本发明实施例中所描述的交通工具可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种移动终端,所述移动终端对应前文所述的终端。所述移动终端用于:获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。其中,所述多张特征图可以是移动终端对其配置的拍摄装置采集到的图像进行处理得到的;所述多张特征图也可以是移动终端从其他终端处获取到的。具体实现中,本发明实施例中所描述的移动终端可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种具有拍摄装置的稳定器,所述具有拍摄装置的稳定器对应前文所述的终端。所述具有拍摄装置的稳定器用于:获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;分别计算每一组特征图对应的特征向量;将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。其中,所述多张特征图可以是所述稳定器对其配置的拍摄装置采集到的图像进行处理得到的;所述多张特征图也可以是所述稳定器从其他终端处获取到的。具体实现中,本发明实施例中所描述的具有拍摄装置的稳定器可执行本发明实施例提供的一种图像处理方法中所描述的实现方式,在此不再赘述。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现上述方法实施例所述的图像处理方法。
本发明实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法实施例所述的图像处理方法。
需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
以上对本发明实施例所提供的一种图像处理方法、终端及系统进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (33)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同;
    分别计算每一组特征图对应的特征向量;
    将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
  2. 根据权利要求1所述的方法,其特征在于,所述至少两个分组中的每一个分组包括的特征图的数量相同。
  3. 根据权利要求1或2所述的方法,其特征在于,所述分别计算每一组特征图对应的特征向量,包括:
    计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵,所述目标分组为所述至少两个分组中的任意一个,所述α为正有理数;
    根据所述目标矩阵确定所述目标分组的特征图对应的特征向量。
  4. 根据权利要求3所述的方法,其特征在于,所述计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵,包括:
    获取目标分组的特征图对应的三维张量;
    将所述三维张量中的第一维度的元素和第二维度的元素进行合并,得到二维张量,所述第一维度为所述三维张量中的行方向对应的维度,所述第二维度为所述三维张量中的列方向对应的维度;
    计算所述二维张量对应的协方差矩阵,并计算所述协方差矩阵的α次幂,得到目标矩阵。
  5. 根据权利要求3所述的方法,其特征在于,所述α为0.5。
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述目标矩阵确定所述目标分组的特征图对应的特征向量,包括:
    获取所述目标矩阵中的目标元素,所述目标元素为所述目标矩阵中位于上三角位置或者下三角位置的元素;
    根据所述目标元素在所述目标矩阵中的位置,对所述目标元素进行排列,生成所述目标分组的特征图对应的特征向量。
  7. 根据权利要求1所述的方法,其特征在于,所述将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量,包括:
    获取目标特征向量对应的第一向量和第二向量,所述目标特征向量为计算得到的多个特征向量中的任意一个;
    根据所述第一向量确定第一矩阵,并根据所述第二向量确定第二矩阵;
    根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量。
  8. 根据权利要求7所述的方法,其特征在于,所述第一向量中包括与所述目标特征向量中的元素对应的第一元素,所述第一向量中除所述第一元素之外的元素取值为0;所述第二向量中与所述第一元素对应的第二元素取值为1,所述第二向量中除所述第二元素之外的元素取值为0。
  9. 根据权利要求7所述的方法,其特征在于,所述第一矩阵中的一列元素对应一个第一向量,所述第二矩阵中的一列元素对应一个第二向量。
  10. 根据权利要求7至9中任一项所述的方法,其特征在于,所述根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量,包括:
    分别计算所述第一矩阵每一行中的第三元素的平均值,并根据计算得到的各行中的第三元素的平均值,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  11. 根据权利要求7至9中任一项所述的方法,其特征在于,所述根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量,包括:
    分别计算所述第一矩阵每一行中的第三元素的和,并根据计算得到的各行中的第三元素的和,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  12. 根据权利要求7至9中任一项所述的方法,其特征在于,所述根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量,包括:
    分别计算所述第一矩阵每一行中的第三元素的乘积,并根据计算得到的各行中的第三元素的乘积,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  13. 根据权利要求7至9中任一项所述的方法,其特征在于,所述根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量,包括:
    分别获取所述第一矩阵每一行中的第三元素中的最大值或者最小值,并根据获取到的各行中的第三元素中的最大值或者最小值,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  14. 一种终端,其特征在于,包括:存储器和处理器,
    所述存储器,用于存储程序指令;
    所述处理器,用于执行所述存储器存储的程序指令,当所述程序指令被执行时,所述处理器用于:
    获取多张特征图,并将所述多张特征图分成至少两个分组,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分 组包括的特征图不相同;
    分别计算每一组特征图对应的特征向量;
    将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量。
  15. 根据权利要求14所述的终端,其特征在于,所述至少两个分组中的每一个分组包括的特征图的数量相同。
  16. 根据权利要求14或15所述的终端,其特征在于,所述处理器分别计算每一组特征图对应的特征向量时,具体用于:
    计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵,所述目标分组为所述至少两个分组中的任意一个,所述α为正有理数;
    根据所述目标矩阵确定所述目标分组的特征图对应的特征向量。
  17. 根据权利要求16所述的终端,其特征在于,所述处理器计算目标分组的特征图对应的协方差矩阵的α次幂,得到目标矩阵时,具体用于:
    获取目标分组的特征图对应的三维张量;
    将所述三维张量中的第一维度的元素和第二维度的元素进行合并,得到二维张量,所述第一维度为所述三维张量中的行方向对应的维度,所述第二维度为所述三维张量中的列方向对应的维度;
    计算所述二维张量对应的协方差矩阵,并计算所述协方差矩阵的α次幂,得到目标矩阵。
  18. 根据权利要求16所述的终端,其特征在于,所述α为0.5。
  19. 根据权利要求16所述的终端,其特征在于,所述处理器根据所述目标矩阵确定所述目标分组的特征图对应的特征向量时,具体用于:
    获取所述目标矩阵中的目标元素,所述目标元素为所述目标矩阵中位于上三角位置或者下三角位置的元素;
    根据所述目标元素在所述目标矩阵中的位置,对所述目标元素进行排列,生成所述目标分组的特征图对应的特征向量。
  20. 根据权利要求14所述的终端,其特征在于,所述处理器将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量时,具体用于:
    获取目标特征向量对应的第一向量和第二向量,所述目标特征向量为计算得到的多个特征向量中的任意一个;
    根据所述第一向量确定第一矩阵,并根据所述第二向量确定第二矩阵;
    根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量。
  21. 根据权利要求20所述的终端,其特征在于,所述第一向量中包括与所述目标特征向量中的元素对应的第一元素,所述第一向量中除所述第一元素之外的元素取值为0;所述第二向量中与所述第一元素对应的第二元素取值为1,所述第二向量中除所述第二元素之外的元素取值为0。
  22. 根据权利要求20所述的终端,其特征在于,所述第一矩阵中的一列元素对应一个第一向量,所述第二矩阵中的一列元素对应一个第二向量。
  23. 根据权利要求20至22中任一项所述的终端,其特征在于,所述处理器根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:
    分别计算所述第一矩阵每一行中的第三元素的平均值,并根据计算得到的各行中的第三元素的平均值,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  24. 根据权利要求20至22中任一项所述的终端,其特征在于,所述处理器根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时, 具体用于:
    分别计算所述第一矩阵每一行中的第三元素的和,并根据计算得到的各行中的第三元素的和,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  25. 根据权利要求20至22中任一项所述的终端,其特征在于,所述处理器根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:
    分别计算所述第一矩阵每一行中的第三元素的乘积,并根据计算得到的各行中的第三元素的乘积,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  26. 根据权利要求20至22中任一项所述的终端,其特征在于,所述处理器根据所述第一矩阵以及所述第二矩阵,确定所述多张特征图对应的特征向量时,具体用于:
    分别获取所述第一矩阵每一行中的第三元素中的最大值或者最小值,并根据获取到的各行中的第三元素中的最大值或者最小值,生成所述多张特征图对应的特征向量;
    其中,所述第三元素为所述第一矩阵中与所述第二矩阵中的非零元素对应的元素。
  27. 一种系统,其特征在于,包括:无人机和移动终端,所述无人机上配置有拍摄装置和所述拍摄装置的稳定器,所述拍摄装置安装于所述稳定器上;
    所述无人机在沿航线飞行的过程中,控制所述拍摄装置拍照得到多张图像,并将所述多张图像发送给所述移动终端;
    所述移动终端在接收到所述无人机发送的多张图像之后,对所述多张图像进行处理得到多张特征图,将所述多张特征图分成至少两个分组,并分别计算 每一组特征图对应的特征向量,并将计算得到的多个特征向量进行池化,得到所述多张特征图对应的特征向量;其中,所述至少两个分组中的每一个分组包括至少一张特征图,所述至少两个分组中的任意两个分组包括的特征图不相同。
  28. 一种无人机,其特征在于,所述无人机用于执行如权利要求1至13中任一项所述方法的步骤。
  29. 一种拍摄装置,其特征在于,所述拍摄装置用于执行如权利要求1至13中任一项所述方法的步骤。
  30. 一种交通工具,其特征在于,所述交通工具用于执行如权利要求1至13中任一项所述方法的步骤。
  31. 一种移动终端,其特征在于,所述移动终端用于执行如权利要求1至13中任一项所述方法的步骤。
  32. 一种具有拍摄装置的稳定器,其特征在于,所述稳定器用于执行如权利要求1至13中任一项所述方法的步骤。
  33. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现如权利要求1至13中任一项所述方法的步骤。
PCT/CN2019/087935 2019-05-22 2019-05-22 一种图像处理方法、终端、系统以及计算机可读存储介质 WO2020232666A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/087935 WO2020232666A1 (zh) 2019-05-22 2019-05-22 一种图像处理方法、终端、系统以及计算机可读存储介质
CN201980007770.XA CN111656359A (zh) 2019-05-22 2019-05-22 一种图像处理方法、终端、系统以及计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087935 WO2020232666A1 (zh) 2019-05-22 2019-05-22 一种图像处理方法、终端、系统以及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2020232666A1 true WO2020232666A1 (zh) 2020-11-26

Family

ID=72348584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087935 WO2020232666A1 (zh) 2019-05-22 2019-05-22 一种图像处理方法、终端、系统以及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111656359A (zh)
WO (1) WO2020232666A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723922A (zh) * 2022-02-24 2022-07-08 北京深势科技有限公司 基于数据降维的三维结构数据对比呈现方法和装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112363844B (zh) * 2021-01-12 2021-04-09 之江实验室 一种面向图像处理的卷积神经网络垂直分割方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764336A (zh) * 2018-05-28 2018-11-06 北京陌上花科技有限公司 用于图像识别的深度学习方法及装置、客户端、服务器
CN109241880A (zh) * 2018-08-22 2019-01-18 北京旷视科技有限公司 图像处理方法、图像处理装置、计算机可读存储介质
WO2019040168A1 (en) * 2017-08-25 2019-02-28 Microsoft Technology Licensing, Llc OBJECT DETECTION BASED ON DEEP NEURAL NETWORK

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104142995B (zh) * 2014-07-30 2017-09-26 中国科学院自动化研究所 基于视觉属性的社会事件识别方法
US10043101B2 (en) * 2014-11-07 2018-08-07 Adobe Systems Incorporated Local feature representation for image recognition
US20160267111A1 (en) * 2015-03-11 2016-09-15 Microsoft Technology Licensing, Llc Two-stage vector reduction using two-dimensional and one-dimensional systolic arrays
JP7113657B2 (ja) * 2017-05-22 2022-08-05 キヤノン株式会社 情報処理装置、情報処理方法、及びプログラム
CN107292352B (zh) * 2017-08-07 2020-06-02 北京中星微人工智能芯片技术有限公司 基于卷积神经网络的图像分类方法和装置
US10007865B1 (en) * 2017-10-16 2018-06-26 StradVision, Inc. Learning method and learning device for adjusting parameters of CNN by using multi-scale feature maps and testing method and testing device using the same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019040168A1 (en) * 2017-08-25 2019-02-28 Microsoft Technology Licensing, Llc OBJECT DETECTION BASED ON DEEP NEURAL NETWORK
CN108764336A (zh) * 2018-05-28 2018-11-06 北京陌上花科技有限公司 用于图像识别的深度学习方法及装置、客户端、服务器
CN109241880A (zh) * 2018-08-22 2019-01-18 北京旷视科技有限公司 图像处理方法、图像处理装置、计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723922A (zh) * 2022-02-24 2022-07-08 北京深势科技有限公司 基于数据降维的三维结构数据对比呈现方法和装置
CN114723922B (zh) * 2022-02-24 2023-04-18 北京深势科技有限公司 基于数据降维的三维结构数据对比呈现方法和装置

Also Published As

Publication number Publication date
CN111656359A (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
Deng et al. Voxel r-cnn: Towards high performance voxel-based 3d object detection
Engelmann et al. Know what your neighbors do: 3D semantic segmentation of point clouds
He et al. Learning depth from single images with deep neural network embedding focal length
Ye et al. 3d recurrent neural networks with context fusion for point cloud semantic segmentation
CN111583284B (zh) 一种基于混合模型的小样本图像语义分割方法
CN111160214B (zh) 一种基于数据融合的3d目标检测方法
US20190303731A1 (en) Target detection method and device, computing device and readable storage medium
CN111383333B (zh) 一种分段式sfm三维重建方法
CN112328715B (zh) 视觉定位方法及相关模型的训练方法及相关装置、设备
Zhuang et al. Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation
CN111028327A (zh) 一种三维点云的处理方法、装置及设备
WO2021052283A1 (zh) 处理三维点云数据的方法和计算设备
KR20160058058A (ko) 이미지 처리 방법 및 장치
WO2020232666A1 (zh) 一种图像处理方法、终端、系统以及计算机可读存储介质
Chen et al. StereoEngine: An FPGA-based accelerator for real-time high-quality stereo estimation with binary neural network
CN113326851A (zh) 图像特征提取方法、装置、电子设备及存储介质
CN113449612A (zh) 一种基于子流型稀疏卷积的三维目标点云识别的方法
Tian et al. Multi-scale dilated convolution network based depth estimation in intelligent transportation systems
CN116152792A (zh) 基于跨上下文和特征响应注意力机制的车辆重识别方法
He et al. Spindle-net: Cnns for monocular depth inference with dilation kernel method
Singh et al. Mesh classification with dilated mesh convolutions
KR100956747B1 (ko) 신경망회로와 병렬처리 프로세서를 결합한 컴퓨터구조 및그를 이용한 처리방법
CN113989296A (zh) 基于改进U-net网络的无人机麦田遥感图像分割方法
CN113759338B (zh) 一种目标检测方法、装置、电子设备及存储介质
CN112819832A (zh) 基于激光点云的城市场景语义分割细粒度边界提取方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929466

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929466

Country of ref document: EP

Kind code of ref document: A1