WO2021139156A1 - 卷积计算方法及相关设备 - Google Patents

卷积计算方法及相关设备 Download PDF

Info

Publication number
WO2021139156A1
WO2021139156A1 PCT/CN2020/109062 CN2020109062W WO2021139156A1 WO 2021139156 A1 WO2021139156 A1 WO 2021139156A1 CN 2020109062 W CN2020109062 W CN 2020109062W WO 2021139156 A1 WO2021139156 A1 WO 2021139156A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
row
network layer
input data
matrix
Prior art date
Application number
PCT/CN2020/109062
Other languages
English (en)
French (fr)
Inventor
曹庆新
李炜
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Priority to US17/623,605 priority Critical patent/US11551438B2/en
Publication of WO2021139156A1 publication Critical patent/WO2021139156A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the technical field of neural networks, and in particular to an image analysis method and related equipment based on convolutional neural networks.
  • the neural network processor includes multiple network layers, and different network layers correspond to different convolution steps.
  • the output matrix of the network layer (multiple features included in the image) is obtained by the neural network processor by performing multiple convolution calculations based on the input matrix and convolution kernel of the network layer; each time according to the input matrix of the network layer (obtained based on the input image)
  • the process of convolution calculation with the convolution kernel is: first, the neural network processor selects the operation matrix from the input matrix of the network layer according to the convolution step length; then, the neural network processor performs the operation matrix and the convolution kernel Convolution calculation.
  • This convolution calculation method leads to lower calculation efficiency of the neural network processor under different convolution step sizes, which indirectly reduces the efficiency of image analysis.
  • the embodiments of the present application provide an image analysis method and related equipment based on a convolutional neural network, which are used to improve the calculation efficiency of a neural network processor under different convolution step sizes, thereby indirectly improving the efficiency of image analysis.
  • an embodiment of the present application provides an image analysis method based on a convolutional neural network, which is applied to a neural network processor, and includes:
  • the input matrix of network layer A which is one of the multiple network layers included in the convolutional neural network model, and the input matrix of network layer A is obtained based on the target type image;
  • the target convolution step size is used to filter the convolution calculation from the input matrix of network layer A Multiple rows of input data are required, and the output matrix of network layer A is used to represent multiple features included in the target category image;
  • the target preset operation is performed according to the multiple features included in the target category image.
  • the target category image is a face image
  • the multiple features included in the target category image are multiple face features.
  • the target preset operation is performed according to the output matrix of the network layer A, including:
  • the target person information corresponding to the target face feature set is determined according to the mapping relationship between the prestored face feature set and the person information, and the target face feature set Belong to the face feature database;
  • the target category image is a license plate image
  • the multiple features included in the target category image are the target license plate number.
  • the target preset operation is performed according to the output matrix of the network layer A, including:
  • the target license plate registration information corresponding to the target license plate number is determined according to the mapping relationship between the prestored license plate number and the vehicle registration information;
  • the target convolution step size is S1 ⁇ S2
  • the size of the input matrix of network layer A is R1 ⁇ R2
  • the size of the target convolution kernel is F ⁇ F.
  • the network layer The input matrix of A and the target convolution kernel are subjected to convolution calculation to obtain the output matrix of network layer A, including:
  • the output matrix of the network layer A is obtained according to the (R1-F+1) row of output data, and the (R1-F+1) row of output data includes the i-th row of output data.
  • the target convolution step size is S3 ⁇ S4
  • the input matrix size of network layer A is R3 ⁇ R4
  • the target convolution kernel size is F ⁇ F.
  • the network layer The input matrix of A and the target convolution kernel are subjected to convolution calculation to obtain the output matrix of network layer A, including:
  • the output matrix of the network layer A is obtained according to the output data of the row [(R3-F)/S4+1], and the output data of the row [(R3-F)/S4+1] includes the output data of the jth row.
  • an embodiment of the present application provides an image analysis device based on a convolutional neural network, which is applied to a neural network processor, and includes:
  • the first obtaining unit is used to obtain the input matrix of the network layer A, the network layer A is one of the multiple network layers included in the convolutional neural network model, and the input matrix of the network layer A is obtained based on the target category image;
  • the first obtaining unit is used to obtain the target convolution kernel and the target convolution step size corresponding to the network layer A, and different network layers correspond to different convolution step sizes;
  • the calculation unit is used to perform convolution calculations on the input matrix of the network layer A and the target convolution kernel according to the target convolution step size to obtain the output matrix of the network layer A.
  • the target convolution step size is used for the input matrix from the network layer A Filter multiple rows of input data required for convolution calculation in the middle, and the output matrix of network layer A is used to represent multiple features included in the target category image;
  • the determining unit is configured to determine the target preset operation corresponding to the target type image according to the mapping relationship between the pre-stored category image and the preset operation;
  • the execution unit is configured to execute the target preset operation according to the multiple features included in the target category image.
  • an embodiment of the present application provides a neural network processor, and the foregoing neural network processor is used to implement part or all of the steps of the method in the first aspect of the embodiment of the present application.
  • an embodiment of the present application provides a neural network processor, and the foregoing neural network processor includes the convolution calculation device of the second aspect of the embodiment of the present application.
  • embodiments of the present application provide an electronic device, including a processor, a memory, a communication interface, and one or more programs.
  • the one or more programs are stored in the memory and configured to be executed by the processor.
  • the above-mentioned program includes instructions for executing part or all of the steps in the method of the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the foregoing computer-readable storage medium is used to store a computer program, and the foregoing computer program is executed by a processor to implement the method described in the first aspect of the embodiment of the present application. Some or all of the steps described.
  • an embodiment of the present application provides a computer program product.
  • the foregoing computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the foregoing computer program is operable to cause a computer to execute the first embodiment of the present application. Some or all of the steps described in the method of the aspect.
  • the computer program product may be a software installation package.
  • the neural network processor needs to select the operation from the input matrix of the network layer according to the convolution step size. Matrix, and the convolution calculation of the operation matrix and the convolution kernel.
  • the convolution step length is used for the input matrix from the network layer A (based on the target type
  • the multiple lines of input data required for the convolution calculation are filtered in the image obtained)
  • the neural network processor performs convolution calculations on the multiple lines of input data and the convolution kernel required for the convolution calculation to obtain the output of the network layer A Matrix (used to characterize the multiple features included in the target category image), which helps to improve the computational efficiency of the neural network processor under different convolution steps. Since the time to obtain multiple features included in the target category image is greatly shortened, the target preset operation corresponding to the target category image can be performed faster based on the multiple features included in the target category image, thereby indirectly improving the efficiency of image analysis.
  • FIG. 1 is a schematic diagram of the architecture of an image analysis system based on a convolutional neural network provided by an embodiment of the present application;
  • 2A is a schematic flowchart of an image analysis method based on convolutional neural network provided by an embodiment of the present application
  • 2B is a schematic diagram of a filling provided by an embodiment of the present application.
  • 2C is a schematic diagram of determining P1 processing elements required to perform convolution calculation on the first input matrix of the network layer A according to an embodiment of the present application;
  • 2D is a schematic diagram of multiple processing elements provided by an embodiment of the present application.
  • 2E is a schematic diagram of determining the output data of the i-th row of the output matrix of the network layer A according to an embodiment of the present application;
  • FIG. 2F is a schematic diagram of determining an output matrix of network layer A according to an embodiment of the present application.
  • 2G is a schematic diagram of filtering input data from row 1 to layer 3 input data provided by an embodiment of the present application;
  • 2H is another schematic diagram of determining the output matrix of the network layer A provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another image analysis method based on convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a block diagram of functional units of a convolutional neural network-based image analysis device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of the architecture of an image analysis system based on a convolutional neural network provided by an embodiment of the present application.
  • the image analysis system based on a convolutional neural network includes a neural network processor, wherein:
  • the neural network processor is used to obtain the input matrix of the network layer A, the network layer A is one of the multiple network layers included in the convolutional neural network model, and the input matrix of the network layer A is obtained based on the target type image;
  • the neural network processor is also used to obtain the target convolution kernel and target convolution step size corresponding to network layer A, and different network layers correspond to different convolution step sizes;
  • the neural network processor is also used to perform convolution calculations on the input matrix of network layer A and the target convolution kernel according to the target convolution step size to obtain the output matrix of network layer A.
  • the target convolution step size is used to obtain the output matrix from network layer A. Filter multiple rows of input data required for convolution calculation in the input matrix of, and the output matrix of network layer A is used to represent multiple features included in the target category image;
  • the neural network processor is further configured to determine the target preset operation corresponding to the target type image according to the mapping relationship between the pre-stored category image and the preset operation;
  • the neural network processor is also used to perform target preset operations according to multiple features included in the target category image.
  • FIG. 2A is a schematic flowchart of a convolutional neural network-based image analysis method provided by an embodiment of the present application, which is applied to a neural network processor.
  • the convolutional neural network-based image analysis method includes steps 201- 205, as follows:
  • the neural network processor obtains an input matrix of network layer A, which is one of multiple network layers included in the convolutional neural network model, and the input matrix of network layer A is obtained based on the target category image.
  • the N network layers include an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer.
  • the input matrix of the network layer A may be obtained based on the face image or the license plate image, which is not limited here. Among them, the face image or the license plate image is collected by the camera.
  • the neural network processor obtains the target convolution kernel and the target convolution step size corresponding to the network layer A, and different network layers correspond to different convolution step sizes.
  • the neural network processor obtains the target convolution kernel and target convolution step size corresponding to network layer A, including:
  • the neural network processor obtains the target convolution kernel corresponding to the network layer A according to the mapping relationship between the network layer and the convolution kernel;
  • the neural network processor obtains the target convolution step size corresponding to the network layer A according to the mapping relationship between the network layer and the convolution step size.
  • mapping relationship between the network layer and the convolution kernel is stored in the neural network processor in advance, and the mapping relationship between the network layer and the convolution kernel is shown in Table 1 below:
  • Network layer Convolution kernel Input layer The first convolution kernel Convolutional layer Second convolution kernel Pooling layer Third convolution kernel Fully connected layer Fourth convolution kernel Output layer Fifth convolution kernel
  • mapping relationship between the network layer and the convolution step size is stored in the neural network processor in advance, and the mapping relationship between the network layer and the convolution step size is shown in Table 2 below:
  • the neural network processor can obtain the target convolution step size corresponding to network layer A by sending a convolution step size acquisition request carrying network layer A to the central processing unit.
  • the convolution step size acquisition request is used to instruct the central processing unit to feed back to the network layer.
  • the neural network processor performs convolution calculations on the input matrix of network layer A and the target convolution kernel according to the target convolution step size to obtain the output matrix of network layer A.
  • the target convolution step size is used for the input from network layer A Multiple rows of input data required for convolution calculation are selected in the matrix, and the output matrix of network layer A is used to represent multiple features included in the target category image.
  • the neural network processor includes P2 processing elements, each of the P2 processing elements includes Q multiplication and accumulation units, P2 and Q are both integers greater than 1, the neural network processor according to the target
  • the convolution step size performs convolution calculation on the input matrix of network layer A and the target convolution kernel, and before obtaining the output matrix of network layer A, the method also includes:
  • the neural network processor fills the input matrix of the network layer A according to the target convolution kernel and the target convolution step length to obtain the first input matrix of the network layer A;
  • the neural network processor determines P1 processing elements required for convolution calculation of the first input matrix of the network layer A according to the first input matrix of the network layer A, P2 and Q.
  • the size of the target convolution kernel is F ⁇ F
  • the target convolution step size is S5 ⁇ S6.
  • the neural network processor inputs the network layer A according to the target convolution kernel and the target convolution step size. Perform filling to obtain the first input matrix of network layer A, including:
  • the neural network processor obtains the size of the input matrix of network layer A R5 ⁇ R6;
  • the neural network processor calculates (R5-F)/S6 to obtain the first remainder, and determines the row filling data corresponding to the input matrix of the network layer A according to the first remainder and S6;
  • the neural network processor calculates (R6-F)/S5 to obtain the second remainder, and determines the column filling data corresponding to the input matrix of the network layer A according to the second remainder and S5;
  • the neural network processor performs a filling operation on the input matrix of the network layer A according to the row filling data and the column filling data to obtain the first input matrix of the network layer A.
  • the target convolution step size includes the horizontal convolution step size and the vertical convolution step size.
  • the horizontal convolution step size is S5 and the vertical convolution step size is S6.
  • the input matrix of network layer A includes R5 rows of input data and R6 columns of input data.
  • the implementation manner in which the neural network processor determines the row filling data corresponding to the input matrix of the network layer A according to the first remainder and S6 may be:
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the 0th row of input data and the (R5+1)th row of input data;
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the [-(S6+1)/2+2] row input data to Input data in line 0 and input data in line (R5+1) to input data in line [R5+(S6+1)/2];
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the (-S6/2+1) row input data to the 0th row input The data and the input data of the (R5+1)th row to the input data of the (R5+S6/2)th row.
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the 0th row of input data, the (R5+1)th row of input data, and the (R5)th row of input data. +2) Line input data;
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the first [- (T1+1) /2+1] line input data to line 0 input data and (R5+1) line input data to line [R5+(T1+1)/2] line input data;
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the first [- (T1+2)/2+2] line input data to line 0 input data and line (R5+1) line input data to line [R5+(T1+2)/2] line input data;
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the first [-(T1+1) )/2+2] line input data to line 0 input data and line (R5+1) line input data to line [R5+(T1+1)/2] line input data;
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is (-T1/2+ 1) Line input data to line 0 input data and (R5+1) line input data to line (R5+T1/2) line input data.
  • the implementation manner in which the neural network processor determines the column filling data corresponding to the input matrix of the network layer A according to the second remainder and S5 may be:
  • the neural network processor determines that the column filling data corresponding to the input matrix of the network layer A is the 0th column input data and the (R6+1)th column input data;
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is the column [-(S5+1)/2+2] input data to Input data in column 0 and input data in column (R6+1) to input data in column [R6+(S5+1)/2];
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is the (-S5/2+1)th column input data to the 0th column input Data and input data in column (R6+1) to input data in column (R6+S5/2);
  • the neural network processor determines that the column filling data corresponding to the input matrix of the network layer A is the 0th column input data, the (R6+1)th column input data, and the (R6th column) input data. +2) Column input data;
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is the first [- (S5+1)/2+1] column input data to the 0th column input data and (R6+1) column input data to the [R6+(S5+1)/2] column input data;
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is the first [- (S5+2)/2+2] column input data to the 0th column input data and (R6+1) column input data to the [R6+(S5+2)/2] column input data;
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is [-(T2+1) )/2+2] column input data to the 0th column input data and (R6+1) column input data to the [R6+(T2+1)/2] column input data;
  • the neural network processor determines that the column filling data corresponding to the input matrix of network layer A is (-T2/2+ 1) Column input data to column 0 input data and (R6+1) column input data to column (R6+T2/2) input data.
  • FIG. 2B is a schematic diagram of filling provided by an embodiment of the present application.
  • the size of the input matrix of the network layer A is 8 ⁇ 8, the size of the target convolution kernel is 3 ⁇ 3, and the target The convolution step size is 2 ⁇ 2,
  • the neural network processor determines that the row filling data corresponding to the input matrix of network layer A is the 0th row of input data, the 9th row of input data, and the 10th row of input data, and the neural network processor determines the network
  • the column filling data corresponding to the input matrix of layer A is the 0th column input data, the 9th column filling data and the 10th column input data.
  • the neural network processor is based on the row filling data and column filling data pairs corresponding to the input matrix of the network layer A
  • the input matrix of the network layer A is filled, and the first input matrix of the network layer A is obtained.
  • the neural network processor determines P1 processing elements required for convolution calculation of the first input matrix of network layer A according to the first input matrix of network layer A, P2 and Q, including:
  • the neural network processor obtains the size R7 ⁇ R8 of the first input matrix of the network layer A;
  • the neural network processor calculates R8/Q to obtain the quotient and the third remainder
  • the neural network processor determines the quotient as P1, which is the number of processing elements required for convolution calculation of the first input matrix of network layer A, and determines each of the P1 processing elements
  • P1 the number of processing elements required for convolution calculation of the first input matrix of network layer A
  • the neural network processor determines the quotient plus 1 as P1, which is the number of processing elements required for convolution calculation of the first input matrix of network layer A, and determines P1 processing elements
  • P1 the number of processing elements required for convolution calculation of the first input matrix of network layer A
  • FIG. 2C is a schematic diagram of determining P1 processing elements required to perform convolution calculation on the first input matrix of network layer A according to an embodiment of the present application.
  • the neural network processor determines the first to the network layer A
  • the input matrix for convolution calculation requires 10 processing elements, and each of the 10 processing elements includes 14 multiplication and accumulation units.
  • the target convolution step size is S1 ⁇ S2
  • the size of the input matrix of network layer A is R1 ⁇ R2
  • the size of the target convolution kernel is F ⁇ F
  • the neural network processor is based on the target convolution step Long performs convolution calculations on the input matrix of network layer A and the target convolution kernel to obtain the output matrix of network layer A, including:
  • the neural network processor obtains the input data of the i-th row of the input matrix of the network layer A to the input data of the (i+F-1)-th row, and i is 1 to (R1-F+1) Any of
  • the neural network processor performs convolution calculation on the input data from the i-th row to the (i+F-1)-th row input data and the target convolution kernel to obtain the output data of the ith row of the output matrix of the network layer A;
  • the neural network processor obtains the output matrix of the network layer A according to the (R1-F+1) row output data, and the (R1-F+1) row output data includes the i-th row output data.
  • the 9 element values included in the target convolution kernel are (c, b, a, f, e, d, i, h, g), and the neural network processor uses P1 processing elements to input data to the i-th row
  • the (i+F-1)-th row input data and the target convolution kernel perform convolution calculation operations to obtain the i-th row output data of the output matrix of the network layer A.
  • the implementation manner may be:
  • the neural network processor uses P1 processing elements to multiply the i-th row of input data by b to obtain R2 first intermediate values;
  • the neural network processor uses P1 processing elements to shift the input data of the i-th row to the left, and multiplies the output data of the i-th row after the left shift by a to obtain the second intermediate value of R2, and the second intermediate value of R2
  • the intermediate values are respectively accumulated with R2 first intermediate values to obtain R2 third intermediate values;
  • the neural network processor uses P1 processing elements to shift the i-th row of input data to the right, and multiplies the i-th row of output data after the right shift with c to obtain the fourth intermediate value of R2, and the fourth intermediate value of R2
  • the intermediate values are respectively accumulated with R2 third intermediate values to obtain R2 fifth intermediate values;
  • the neural network processor uses P1 processing elements to multiply the input data of row (i+1) with e to obtain R2 sixth intermediate values, and R2 sixth intermediate values and R2 fifth intermediate values respectively Accumulate to get the seventh intermediate value of R2;
  • the neural network processor uses P1 processing elements to shift the input data of the (i+1)th row to the left, and multiplies the output data of the (i+1)th row after the left shift by d to obtain the R2 eighth middle Value, and the R2 eighth intermediate value and the R2 seventh intermediate value are respectively accumulated to obtain the R2 ninth intermediate value;
  • the neural network processor uses P1 processing elements to shift the input data of the (i+1)th row to the right, and multiply the output data of the (i+1)th row after the right shift by f to obtain the R2 tenth middle Value, and accumulate the tenth intermediate value of R2 with the ninth intermediate value of R2 to obtain the eleventh intermediate value of R2;
  • the neural network processor uses P1 processing elements to multiply the input data of the ith row by h to obtain the twelfth intermediate value of R2, and accumulate the twelfth intermediate value of R2 and the eleventh intermediate value of R2 respectively , Get the thirteenth intermediate value of R2;
  • the neural network processor uses P1 processing elements to shift the input data of the (i+F-1)th row to the left, and multiplies the output data of the (i+F-1)th row after the left shift by g to obtain R2
  • a fourteenth intermediate value, and the fourteenth intermediate value of R2 and the thirteenth intermediate value of R2 are respectively accumulated to obtain the fifteenth intermediate value of R2;
  • the neural network processor uses P1 processing elements to shift the input data of the (i+F-1)th row to the right, and multiplies the output data of the (i+F-1)th row after the right shift by i to obtain R2
  • the sixteenth intermediate value of R2 and the sixteenth intermediate value of R2 are respectively accumulated with the fifteenth intermediate value of R2 to obtain the seventeenth intermediate value of R2.
  • At least one multiplication and accumulation unit included in each of the P1 processing elements performs parallel operations.
  • the left shift of each row of input data from the i-th row of input data to the (i+F-1)-th row of input data is realized by the left shift program, and the i-th row of input data to the (i+F-1)-th row input
  • the right shift of each row of input data in the data is realized by the right shift program, and the left shift program and the right shift program are stored in the neural network processor in advance.
  • Figure 2D is a schematic diagram of multiple processing elements provided by an embodiment of the present application.
  • the multiple processing elements include 1 High (high) PE, multiple Middle (middle) PEs, and 1 Low (low) PE.
  • the left MAC is the high MAC
  • the right MAC is the low MAC;
  • the processing element is Processing Elements, referred to as PE; the multiplication and accumulation unit is Multiply Accumulate unit, referred to as MAC.
  • the high-order MAC when shifting to the left, the high-order MAC obtains data from the right MAC in the PE, and the lowest-order MAC obtains data from the high-order MAC of the right PE; when shifting to the right, the highest-order MAC obtains data from the padding bus, and the low-order MAC obtains data from the padding bus.
  • the MAC obtains data from the left MAC in the PE, and the filling bus is used for data transmission between the filling data and the processing element;
  • the upper MAC gets data from the right MAC in the PE, and the lowest MAC gets data from the padding bus; when shifting to the right, the highest MAC gets data from the lower MAC of the left PE, and the lower The MAC obtains data from the left MAC in the PE;
  • the high MAC gets data from the right MAC in the PE, and the lowest MAC gets data from the high MAC of the right PE; when shifting to the right, the highest MAC gets data from the low MAC of the left PE Data, the lower MAC obtains data from the left MAC in the PE.
  • P1 processing elements included in a processing component group process a line of input data in parallel, and the data between adjacent processing elements can be shifted to the left or right.
  • the existing convolution calculation method is used each time When the processing element group performs convolution calculation on the input matrix and the convolution kernel matrix, data between adjacent processing elements cannot be moved.
  • the neural network processor uses P1 processing elements to perform convolution calculations on the input data from the i-th row to the (i+F-1)-th row input data and the target convolution kernel to obtain the network layer A
  • the neural network processor uses P1 processing elements to input data from the i-th row to the (i+F-1)-th row input data and target convolution
  • the kernel performs convolution calculation to obtain the output data of the i-th row of the output matrix of the network layer A, which will not be described here.
  • the 9 element values included in the target convolution kernel are (c, b, a , F, e, d, i, h, g), the neural network processor uses 7 multiplication and accumulation units to convolve the input data from row 1 to row 3 of the input matrix of network layer A and the target convolution kernel Product calculation, the output data of the first row of the output matrix of the network layer A is obtained as (U1, U2, U3, U4, U5, U6, U7).
  • the neural network processor uses P1 processing elements to perform convolution calculations on the input data from the i-th row to the (i+F-1)-th row of the input matrix of the input matrix of the network layer A and the target convolution kernel. , Get the output data of the i-th row of the output matrix of the network layer A. Since the P1 processing elements operate in parallel, this helps to improve the computational efficiency of the neural network processor.
  • the target convolution kernel also includes a bias value
  • the neural network processor obtains the output matrix of the network layer A according to the (R1-F+1) row output data.
  • the implementation manner may be: the neural network processor according to (R3-F +1)
  • the row output data determines the to-be-output matrix of the network layer A;
  • the neural network processor determines the sum of the element value and the offset value of the mth row and nth column of the network layer A to be output matrix as the output of the network layer A
  • the neural network processor divides the mth row of the [(R1-F+1) ⁇ (R2-F+1)] element values of the matrix to be output on the network layer A
  • the [(R1-F+1) ⁇ (R2-F+1)-1] element values other than the element value in the nth column perform the same operation to obtain the [(R1-F+1) of the output matrix of the network layer A )
  • the 3 element values included in the data are (2, 4, 3)
  • the second row of the output data of the output matrix of the network layer A includes the 3 element values (5, 7, 8)
  • the value of the output matrix of the network layer A is (5, 7, 8)
  • the 3 element values included in the output data in the third row are (9, 1, 6), and the bias value is 1.
  • the neural network processor determines that the 9 element values included in the output matrix of network layer A are (3, 5, 4). , 6, 8, 9, 10, 2, 7).
  • the output data of the i-th row of the output matrix of network layer A is (U1, U2, U3, U4, U5, U6, U7, U8, U9), when S1 is 3, the output of network layer A
  • the output data of the i-th row of the matrix is (U1, U4, U7).
  • the output data of row i of the output matrix of network layer A is (U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, U12, U13, U14, U15)
  • the output data of the i-th row of the output matrix of the network layer A is (U1, U6, U11).
  • the output data of row i of the output matrix of network layer A is (U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, U12, U13, U14, U15, U16, U17, U18, U19, U20, U21), when S1 is 7, the output data of the i-th row of the output matrix of the network layer A is (U1, U8, U15).
  • the target convolution step size is S3 ⁇ S4
  • the size of the input matrix of network layer A is R3 ⁇ R4
  • the size of the target convolution kernel is F ⁇ F
  • the neural network processor is based on the target convolution step Long performs convolution calculations on the input matrix of network layer A and the target convolution kernel to obtain the output matrix of network layer A, including:
  • the neural network processor obtains the input data of the (2j-1)th row to the (2j+1)th row of the input matrix of the network layer A, and j is 1 to [(R3-F )/S4+1];
  • the neural network processor performs convolution calculation on the filtered input data from the (2j-1)th row to the (2j+1)th row input data and the target convolution kernel to obtain the jth row output data of the output matrix of the network layer A ;
  • the neural network processor obtains the output matrix of the network layer A according to the [(R3-F)/S4+1] row output data, and the [(R3-F)/S4+1] row output data includes the j-th row output data.
  • the neural network processor filters the input data from line (2j-1) to line (2j+1) according to the target convolution step size, and obtains the filtered line (2j-1) Line input data to the (2j+1) line input data, including:
  • the neural network processor filters the input data of row (2j-1) F times according to S3, and obtains the input data of row (2j-1) after filtering.
  • the input data of row (2j-1) after filtering includes F (2j-1) line sub-input data, the number of data in each (2j-1) line sub-input data is half of the number of input data in line (2j-1);
  • the neural network processor filters the input data of row 2j according to S3 F times, and obtains the input data of row 2j after filtering.
  • the input data of row 2j after filtering includes F sub-input data of row 2j, each row 2j
  • the number of sub-input data is half of the number of input data in line 2j;
  • the neural network processor filters the input data of the (2j+1)th row F times according to S3, and obtains the filtered input data of the (2j+1)th row.
  • the filtered input data of the (2j+1)th row includes F
  • the number of sub-input data in the (2j+1)th row is half of the number of input data in the (2j+1)th row.
  • Figure 2G is a schematic diagram of filtering input data from row 1 to row 3 according to an embodiment of the present application.
  • the number of data in each line of input data is 15, 0 and 14 in the input data in the first line are filled data, 16 and 30 in the input data in the second line are filled data, and 32 and 46 in the input data in the third line are both filled data.
  • the horizontal convolution step size S3 is 2, and the input data in the first row is filtered three times to obtain three sub-input data in the first row; the input data in the second row is filtered three times to obtain three second Row input data; filter the input data in the third row three times, and get 3 sub-input data in the third row.
  • the 9 element values included in the target convolution kernel are (c, b, a, f, e, d, i, h, g), and the neural network processor uses P1 processing elements to pair (2j-1) From the row input data to the (2j+1)th row input data and the target convolution kernel to perform the convolution calculation operation to obtain the jth row output data of the output matrix of the network layer A
  • the implementation may be:
  • the neural network processor selects the R4/S3 first element values to be multiplied from the input data in line (2j-1) according to S3, and uses P1 processing elements to separate the R4/S3 first element values to be multiplied Multiply by b to get the eighteenth intermediate value of R4/S3;
  • the neural network processor selects R4/S3 second element values to be multiplied from the (2j-1) line of input data according to S3, and uses P1 processing elements to divide the R4/S3 second element values to be multiplied with A is multiplied to obtain the nineteenth intermediate value of R4/S3, and the nineteenth intermediate value of R4/S3 is respectively accumulated with the eighteenth intermediate value of R4/S3 to obtain the twentieth intermediate value of R4/S3;
  • the neural network processor selects R4/S3 third element values to be multiplied from the input data in line (2j-1) according to S3, and uses P1 processing elements to separate R4/S3 third element values to be multiplied Multiply by c to get the twenty-first intermediate value of R4/S3, and accumulate the twenty-first intermediate value of R4/S3 with the twentieth intermediate value of R4/S3, respectively, to obtain the twentieth intermediate value of R4/S3 Two intermediate values;
  • the neural network processor selects R4/S3 fourth element values to be multiplied from the input data in the 2j row according to S3, and uses P1 processing elements to multiply the R4/S3 fourth element values to be multiplied by e respectively , Get the twenty-third intermediate value of R4/S3, and add the twenty-third intermediate value of R4/S3 and the twenty-second intermediate value of R4/S3 to get the twenty-fourth intermediate value of R4/S3 ;
  • the neural network processor selects R4/S3 fifth element values to be multiplied from the input data in line 2j according to S3, and uses P1 processing elements to multiply the R4/S3 fifth element values to be multiplied by d. , Get the twelfth five-year middle value of R4/S3, and add the twenty-fifth middle value of R4/S3 and the twenty-fourth middle value of R4/S3 respectively to get the twenty-sixth middle value of R4/S3 ;
  • the neural network processor selects R4/S3 sixth element values to be multiplied from the input data in line 2j according to S3, and uses P1 processing elements to multiply the R4/S3 sixth element values to be multiplied by f respectively , Get the twenty-seventh intermediate value of R4/S3, and accumulate the twenty-seventh intermediate value of R4/S3 and the twenty-sixth intermediate value of R4/S3 to get the twenty-eighth intermediate value of R4/S3 ;
  • the neural network processor selects the R4/S3 seventh element value to be multiplied from the input data in the (2j+1)th row according to S3, and uses P1 processing elements to separate the R4/S3 seventh element value to be multiplied Multiply by h to get the twenty-ninth intermediate value of R4/S3, and accumulate the twenty-ninth intermediate value of R4/S3 with the twenty-eighth intermediate value of R4/S3, respectively, to obtain R4/S3 third Ten median
  • the neural network processor selects the R4/S3 eighth element values to be multiplied from the input data in the (2j+1)th row according to S3, and uses P1 processing elements to separate the R4/S3 eighth element values to be multiplied Multiply with g to get the thirty-first intermediate value of R4/S3, and add the thirty-first intermediate value of R4/S3 and the thirty-first intermediate value of R4/S3 respectively to obtain the thirtieth intermediate value of R4/S3 Two intermediate values;
  • the neural network processor selects the R4/S3 ninth element value to be multiplied from the input data in the (2j+1)th row according to S3, and uses P1 processing elements to separate the R4/S3 ninth element value to be multiplied Multiply by i to get the 33rd intermediate value of R4/S3, and add the 33rd intermediate value of R4/S3 and the 32nd intermediate value of R4/S3 to get R4/S3 third Fourteen median value.
  • the neural network processor uses P1 processing element groups to perform convolution calculations on the input data from line (2j-1) to line (2j+1) and the target convolution kernel to obtain the network layer
  • the implementation of the output data of the jth row of the output matrix of A refers to when F is 3, the neural network processor uses P1 processing elements to input data from the (2j-1)th row to the (2j+1)th row input data and The implementation manner in which the target convolution kernel performs convolution calculation to obtain the output data of the j-th row of the output matrix of the network layer A will not be described here.
  • the output data of the j-th row of the matrix is (V1, V4, V7).
  • 16 and 30 are padding data
  • 32 and 46 are padding data.
  • the 9 element values included in the target convolution kernel are (c, b, a, f, e, d, i, h, g), the neural network processor uses 7 multiplication and accumulation units to perform convolution calculations on the input data from row 1 to row 3 of the input matrix of the network layer A and the target convolution kernel to obtain the network The first row of the output matrix of layer A outputs data.
  • the neural network processor uses P1 processing elements to convolve the input data from line (2j-1) to line (2j+1) of input data and the target convolution kernel of the input matrix of network layer A.
  • Product calculation the output data of the jth row of the output matrix of the network layer A is obtained. Since the P1 processing elements operate in parallel, this helps to improve the computational efficiency of the neural network processor.
  • the neural network processor determines the output matrix of the network layer A according to the [(R3-F)/S4+1] row output data and the offset value. Refer to the neural network processor according to the (R1-F+1) row output data and The implementation of the bias value determining the output matrix of the network layer A will not be described here.
  • the neural network processor determines the target preset operation corresponding to the target type image according to the mapping relationship between the pre-stored category image and the preset operation.
  • the category image corresponds to the preset operation one-to-one; if the category image is a face image, the preset operation is to obtain person information based on the face image; if the category image is a license plate image, the preset operation is to obtain license plate registration information based on the license plate image.
  • the neural network processor performs a target preset operation according to multiple features included in the target category image.
  • the neural network processor needs to select the operation from the input matrix of the network layer according to the convolution step size. Matrix, and the convolution calculation of the operation matrix and the convolution kernel.
  • the convolution step length is used for the input matrix from the network layer A (based on the target type
  • the multiple lines of input data required for the convolution calculation are filtered in the image obtained)
  • the neural network processor performs convolution calculations on the multiple lines of input data and the convolution kernel required for the convolution calculation to obtain the output of the network layer A Matrix (used to characterize the multiple features included in the target category image), which helps to improve the computational efficiency of the neural network processor under different convolution steps. Since the time to obtain multiple features included in the target category image is greatly shortened, the target preset operation corresponding to the target category image can be performed faster based on the multiple features included in the target category image, thereby indirectly improving the efficiency of image analysis.
  • the target type image is a face image
  • the multiple features included in the target type image are multiple face features
  • the neural network processor performs the target preset operation according to the output matrix of network layer A, including:
  • the neural network processor judges whether the facial feature set composed of multiple facial features matches the facial feature library
  • the neural network processor determines the target person information corresponding to the target face feature set according to the mapping relationship between the prestored face feature set and the person information.
  • the face feature set belongs to the face feature library;
  • the neural network processor performs output operations on the target character information.
  • mapping relationship between the face feature set and the person information is stored in the neural network processor in advance, and the mapping relationship between the face feature set and the person information is shown in Table 3 below:
  • Face feature collection Character information The first set of facial features First character information
  • the second set of facial features Second person information
  • Third face feature combination Third person information ... ...
  • the face feature set corresponds to the person information one-to-one; if the face feature set is the first face feature set, then the person information is the first person information.
  • the person information corresponding to the facial feature set composed of multiple facial features can be determined more quickly, thereby indirectly improving the corresponding person information obtained based on face image analysis. Information efficiency.
  • the target type image is a license plate image
  • the multiple features included in the target type image are the target license plate number.
  • the neural network processor performs the target preset operation according to the output matrix of network layer A, including:
  • the neural network processor judges whether the target license plate number matches the license plate number database
  • the neural network processor determines the target license plate registration information corresponding to the target license plate number according to the mapping relationship between the prestored license plate number and the vehicle registration information;
  • the neural network processor performs an output operation on the registration information of the target license plate.
  • mapping relationship between the license plate number and the vehicle registration information is stored in the neural network processor in advance, and the mapping relationship between the license plate number and the vehicle registration information is shown in Table 4 below:
  • License plate number Vehicle registration information First license plate number First vehicle registration information Second license plate number Second vehicle registration information Third license plate number Third vehicle registration information ... ...
  • the license plate number corresponds to the vehicle registration information one to one; if the license plate number is the first license plate number, then the vehicle registration information is the first vehicle registration information.
  • the vehicle registration information corresponding to the license plate number can be determined faster, thereby indirectly improving the efficiency of obtaining the corresponding license plate registration information based on the image analysis of the license plate.
  • FIG. 3 is a schematic flowchart of another image analysis method based on a convolutional neural network provided by an embodiment of the present application, which is applied to a neural network processor.
  • the image analysis method based on convolutional neural network includes steps 301-311, which are specifically as follows:
  • the neural network processor obtains the input matrix of network layer A.
  • the size of the input matrix of network layer A is R3 ⁇ R4.
  • Network layer A is one of the multiple network layers included in the convolutional neural network model.
  • Network layer A The input matrix of is based on the face image.
  • the neural network processor determines the target convolution kernel corresponding to the network layer A according to the mapping relationship between the network layer and the convolution kernel, and the size of the target convolution kernel is F ⁇ F.
  • the neural network processor obtains the target convolution step size corresponding to network layer A according to the mapping relationship between the network layer and the convolution step size.
  • the target convolution step size is S3 ⁇ S4, and different network layers correspond to different convolution step sizes. .
  • the neural network processor obtains the input data from the (2j-1)th row to the (2j+1)th row of the input matrix of the network layer A, and j is 1 to [(R3 Any one of -F)/S4+1].
  • the neural network processor filters the input data of line (2j-1) F times according to S3 to obtain the input data of line (2j-1) after filtering, and the input data of line (2j-1) after filtering includes F (2j-1)th row sub-input data, the number of data of each (2j-1)th row sub-input data is half of the data number of the (2j-1)th row input data.
  • the neural network processor performs F screening on the input data of the 2jth row according to S3 to obtain the filtered input data of the 2jth row.
  • the filtered input data of the 2jth row includes F sub-input data of the 2jth row.
  • the number of sub-input data in row 2j is half of the number of input data in row 2j.
  • the neural network processor filters the input data of the (2j+1)th row F times according to S3 to obtain the filtered input data of the (2j+1)th row, and the filtered input data of the (2j+1)th row includes F (2j+1)th row sub-input data, the data number of each (2j+1)th row sub-input data is half of the data number of the (2j+1)th row input data.
  • the neural network processor performs convolution calculation on the filtered input data from the (2j-1)th row to the (2j+1)th row input data and the target convolution kernel to obtain the jth row of the output matrix of the network layer A Output Data.
  • the neural network processor obtains the output matrix of network layer A according to the row output data of [(R3-F)/S4+1], the row output data of [(R3-F)/S4+1] includes the output data of the jth row,
  • the output matrix of network layer A is used to characterize multiple facial features.
  • the neural network processor determines the target face feature according to the mapping relationship between the prestored face feature set and the person information Set the corresponding target person information.
  • the neural network processor performs an output operation on the target character information.
  • FIG. 4 is a block diagram of functional units of a convolutional neural network-based image analysis device provided by an embodiment of the application, which is applied to a neural network processor.
  • the convolutional neural network-based image analysis device 400 includes :
  • the first obtaining unit 401 is configured to obtain the input matrix of the network layer A, the network layer A is one of the multiple network layers included in the convolutional neural network model, and the input matrix of the network layer A is obtained based on the target category image;
  • the second obtaining unit 402 is configured to obtain the target convolution kernel and the target convolution step size corresponding to the network layer A, and different network layers correspond to different convolution step sizes;
  • the calculation unit 403 is configured to perform convolution calculations on the input matrix of the network layer A and the target convolution kernel according to the target convolution step size to obtain the output matrix of the network layer A.
  • the target convolution step size is used for the input from the network layer A Multiple rows of input data required for convolution calculation are filtered in the matrix, and the output matrix of network layer A is used to represent multiple features included in the target category image;
  • the determining unit 404 is configured to determine the target preset operation corresponding to the target category image according to the mapping relationship between the pre-stored category image and the preset operation;
  • the execution unit 405 is configured to execute a target preset operation according to multiple features included in the target category image.
  • the neural network processor needs to select the operation from the input matrix of the network layer according to the convolution step size. Matrix, and the convolution calculation of the operation matrix and the convolution kernel.
  • the convolution step length is used for the input matrix from the network layer A (based on the target type
  • the multiple lines of input data required for the convolution calculation are filtered in the image obtained)
  • the neural network processor performs convolution calculations on the multiple lines of input data and the convolution kernel required for the convolution calculation to obtain the output of the network layer A Matrix (used to characterize the multiple features included in the target category image), which helps to improve the computational efficiency of the neural network processor under different convolution steps. Since the time to obtain multiple features included in the target category image is greatly shortened, the target preset operation corresponding to the target category image can be performed faster based on the multiple features included in the target category image, thereby indirectly improving the efficiency of image analysis.
  • the target category image is a face image
  • the multiple features included in the target category image are multiple face features.
  • the execution unit 405 is specifically configured to :
  • the target person information corresponding to the target face feature set is determined according to the mapping relationship between the prestored face feature set and the person information, and the target face feature set Belong to the face feature database;
  • the target type image is a license plate image
  • the multiple features included in the target type image are the target license plate number.
  • the execution unit 405 is specifically configured to:
  • the target license plate registration information corresponding to the target license plate number is determined according to the mapping relationship between the prestored license plate number and the vehicle registration information;
  • the above-mentioned second obtaining unit 402 is specifically configured to:
  • the target convolution step size corresponding to the network layer A is obtained.
  • the target convolution step size is S1 ⁇ S2
  • the size of the input matrix of network layer A is R1 ⁇ R2
  • the size of the target convolution kernel is F ⁇ F.
  • the network The input matrix of layer A and the target convolution kernel are subjected to convolution calculation to obtain the output matrix of network layer A.
  • the calculation unit 403 is specifically used for:
  • the output matrix of the network layer A is obtained according to the (R1-F+1) row of output data, and the (R1-F+1) row of output data includes the i-th row of output data.
  • the target convolution step size is S3 ⁇ S4
  • the size of the input matrix of network layer A is R3 ⁇ R4
  • the size of the target convolution kernel is F ⁇ F.
  • the network The input matrix of layer A and the target convolution kernel are subjected to convolution calculation to obtain the output matrix of network layer A.
  • the calculation unit 403 is specifically used for:
  • the output matrix of the network layer A is obtained according to the output data of the row [(R3-F)/S4+1], and the output data of the row [(R3-F)/S4+1] includes the output data of the jth row.
  • the input data from line (2j-1) to line (2j+1) is filtered according to the target convolution step size, and the filtered input data at line (2j-1) is obtained
  • the calculation unit 403 is specifically configured to:
  • the input data of line (2j-1) after filtering includes F (2j- 1) Line input data, the number of input data in each (2j-1) line is half of the number of input data in line (2j-1);
  • the input data of row 2j after filtering includes F sub-input data of row 2j, and each sub-input data of row 2j The number of data is half of the number of input data in line 2j;
  • the filtered input data of the (2j+1)th row includes the Fth (2j+) 1) Row sub-input data, the number of data in each (2j+1)th row of sub-input data is half of the number of data in the (2j+1)th row.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 500 includes a processor, a memory, a communication interface, and One or more programs, the one or more programs are stored in the memory and are configured to be executed by the processor, and the programs include instructions for executing the following steps:
  • the input matrix of network layer A which is one of the multiple network layers included in the convolutional neural network model, and the input matrix of network layer A is obtained based on the target type image;
  • the target convolution step size perform convolution calculation on the input matrix of network layer A and the target convolution kernel to obtain the output matrix of network layer A.
  • the target convolution step size is used to filter the convolution from the input matrix of network layer A Calculate multiple rows of input data required, and the output matrix of network layer A is used to represent multiple features included in the target category image;
  • the target preset operation is performed according to the multiple features included in the target category image.
  • the neural network processor needs to select the operation from the input matrix of the network layer according to the convolution step size. Matrix, and the convolution calculation of the operation matrix and the convolution kernel.
  • the convolution step length is used for the input matrix from the network layer A (based on the target type
  • the multiple lines of input data required for the convolution calculation are filtered in the image obtained)
  • the neural network processor performs convolution calculations on the multiple lines of input data and the convolution kernel required for the convolution calculation to obtain the output of the network layer A Matrix (used to characterize the multiple features included in the target category image), which helps to improve the computational efficiency of the neural network processor under different convolution steps. Since the time to obtain multiple features included in the target category image is greatly shortened, the target preset operation corresponding to the target category image can be performed faster based on the multiple features included in the target category image, thereby indirectly improving the efficiency of image analysis.
  • the target category image is a face image
  • the multiple features included in the target category image are multiple face features.
  • the target person information corresponding to the target face feature set is determined according to the mapping relationship between the prestored face feature set and the person information, and the target face feature set Belong to the face feature database;
  • the target type image is a license plate image
  • the multiple features included in the target type image are the target license plate number.
  • the target license plate registration information corresponding to the target license plate number is determined according to the mapping relationship between the prestored license plate number and the vehicle registration information;
  • the above program includes instructions specifically for executing the following steps:
  • the target convolution step size corresponding to the network layer A is obtained.
  • the target convolution step size is S1 ⁇ S2
  • the size of the input matrix of network layer A is R1 ⁇ R2
  • the size of the target convolution kernel is F ⁇ F.
  • the network The input matrix of layer A and the target convolution kernel perform convolution calculations to obtain the output matrix of network layer A.
  • the output matrix of the network layer A is obtained according to the (R1-F+1) row of output data, and the (R1-F+1) row of output data includes the i-th row of output data.
  • the target convolution step size is S3 ⁇ S4
  • the size of the input matrix of network layer A is R3 ⁇ R4
  • the size of the target convolution kernel is F ⁇ F.
  • the network The input matrix of layer A and the target convolution kernel perform convolution calculations to obtain the output matrix of network layer A.
  • the output matrix of the network layer A is obtained according to the output data of the row [(R3-F)/S4+1], and the output data of the row [(R3-F)/S4+1] includes the output data of the jth row.
  • the input data from line (2j-1) to line (2j+1) is filtered according to the target convolution step size, and the filtered input data at line (2j-1) is obtained
  • the above program includes instructions specifically for executing the following steps:
  • the input data of line (2j-1) after filtering includes F (2j- 1) Line input data, the number of input data in each (2j-1) line is half of the number of input data in line (2j-1);
  • the input data of row 2j after filtering includes F sub-input data of row 2j, and each sub-input data of row 2j The number of data is half of the number of input data in line 2j;
  • the filtered input data of the (2j+1)th row includes the Fth (2j+) 1) Row sub-input data, the number of data in each (2j+1)th row of sub-input data is half of the number of data in the (2j+1)th row.
  • the embodiment of the present application also provides a neural network processor, which is configured to implement part or all of the steps of any method as recorded in the above method embodiment.
  • An embodiment of the present application also provides a neural network processor.
  • the neural network processor includes any convolution calculation device as described in the foregoing device embodiment.
  • the embodiments of the present application also provide a computer-readable storage medium, which is used to store a computer program, and the computer program causes a computer to execute part or all of the steps of any method as described in the above-mentioned method embodiments.
  • Computers include electronic equipment.
  • the embodiments of the present application also provide a computer program product.
  • the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the above-mentioned computer program is operable to cause a computer to execute any of the methods described in the above-mentioned method embodiments. Part or all of the steps of the method.
  • the computer program product may be a software installation package, and the above-mentioned computer includes electronic equipment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种基于卷积神经网络的图像分析方法及相关设备,方法包括:获得网络层A的输入矩阵,网络层A的输入矩阵是基于目标种类图像得到的;获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;根据目标种类图像包括的多个特征执行目标预设操作。采用该方法有助于提高在不同卷积步长下神经网络处理器的计算效率,进而间接提高图像分析的效率。

Description

卷积计算方法及相关设备
本申请要求于2020年1月7日提交中国专利局,申请号为202010015744.6、发明名称为“卷积计算方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及神经网络技术领域,具体涉及一种基于卷积神经网络的图像分析方法及相关设备。
背景技术
目前,神经网络处理器包括多个网络层,不同的网络层对应不同的卷积步长。网络层的输出矩阵(图像包括的多个特征)是神经网络处理器根据网络层的输入矩阵和卷积核进行多次卷积计算得到的;每次根据网络层的输入矩阵(基于输入图像得到的)和卷积核进行卷积计算的过程为:首先,神经网络处理器根据卷积步长从网络层的输入矩阵中选取运算矩阵;然后,神经网络处理器对运算矩阵和卷积核进行卷积计算。这种卷积计算方式导致在不同卷积步长下神经网络处理器的计算效率较低,进而间接降低图像分析的效率。
发明内容
本申请实施例提供一种基于卷积神经网络的图像分析方法及相关设备,用于提高在不同卷积步长下神经网络处理器的计算效率,进而间接提高图像分析的效率。
第一方面,本申请实施例提供一种基于卷积神经网络的图像分析方法,应用于神经网络处理器,包括:
获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的;
获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得 到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;
根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;
根据目标种类图像包括的多个特征执行目标预设操作。
在一个可能的示例中,目标种类图像为人脸图像,目标种类图像包括的多个特征为多个人脸特征,根据网络层A的输出矩阵执行目标预设操作,包括:
判断多个人脸特征组成的人脸特征集合是否与人脸特征库匹配;
若多个人脸特征组成的人脸特征集合与目标人脸特征集合匹配,则根据预存的人脸特征集合与人物信息的映射关系确定目标人脸特征集合对应的目标人物信息,目标人脸特征集合属于人脸特征库;
对目标人物信息执行输出操作。
在一个可能的示例中,目标种类图像为车牌图像,目标种类图像包括的多个特征为目标车牌号码,根据网络层A的输出矩阵执行目标预设操作,包括:
判断目标车牌号码是否与车牌号码库匹配;
若目标车牌号码与车牌号码库匹配,则根据预存的车牌号码与车辆登记信息的映射关系确定目标车牌号码对应的目标车牌登记信息;
对目标车牌登记信息执行输出操作。
在一个可能的示例中,目标卷积步长为S1×S2,网络层A的输入矩阵的大小为R1×R2,目标卷积核的大小为F×F,根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,包括:
当S1和S2均为1时,获取网络层A的输入矩阵的第i行输入数据至第(i+F-1)行输入数据,i为1至(R1-F+1)中的任意一个;
对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据;
根据(R1-F+1)行输出数据获得网络层A的输出矩阵,(R1-F+1)行输出数据包括第i行输出数据。
在一个可能的示例中,目标卷积步长为S3×S4,网络层A的输入矩阵的大小为R3×R4,目标卷积核的大小为F×F,根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,包括:
当S3和S4均为2时,获取网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,j为1至[(R3-F)/S4+1]中的任意一个;
根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据;
对筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据;
根据[(R3-F)/S4+1]行输出数据获得网络层A的输出矩阵,[(R3-F)/S4+1]行输出数据包括第j行输出数据。
第二方面,本申请实施例提供一种基于卷积神经网络的图像分析装置,应 用于神经网络处理器,包括:
第一获得单元,用于获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的;
第一获得单元,用于获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
计算单元,用于根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;
确定单元,用于根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;
执行单元,用于根据目标种类图像包括的多个特征执行目标预设操作。
第三方面,本申请实施例提供一种神经网络处理器,上述神经网络处理器用于实现本申请实施例第一方面的方法的部分或全部步骤。
第四方面,本申请实施例提供一种神经网络处理器,上述神经网络处理器包括本申请实施例第二方面的卷积计算装置。
第五方面,本申请实施例提供一种电子设备,包括处理器、存储器、通信接口以及一个或多个程序,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行如本申请实施例第一方面的方法中的部分或全部步骤的指令。
第六方面,本申请实施例提供一种计算机可读存储介质,上述计算机可读存储介质用于存储计算机程序,上述计算机程序被处理器执行,以实现本申请实施例第一方面的方法中所描述的部分或全部步骤。
第七方面,本申请实施例提供了一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行本申请实施例第一方面的方法中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
可以看出,相较于每次根据网络层的输入矩阵(基于输入图像得到的)和卷积核进行卷积计算均需要神经网络处理器根据卷积步长从网络层的输入矩阵中选取运算矩阵,以及对运算矩阵和卷积核进行卷积计算,在本申请实施例中,对于不同的卷积步长来说,由于卷积步长用于从网络层A的输入矩阵(基于目标种类图像得到的)中筛选卷积计算所需的多个行输入数据,进而神经网络处理器对卷积计算所需的多个行输入数据和卷积核进行卷积计算,得到网络层A的输出矩阵(用于表征目标种类图像包括的多个特征),这样有助于提高在不同卷积步长下神经网络处理器的计算效率。由于获得目标种类图像包括的多个特征的时间大大缩短,因此能够更快的基于目标种类图像包括的多个特征执行目标种类图像对应的目标预设操作,进而间接提高图像分析的效率。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是本申请实施例提供的一种基于卷积神经网络的图像分析系统的架构示意图;
图2A是本申请实施例提供的一种基于卷积神经网络的图像分析方法的流程示意图;
图2B是本申请实施例提供的一种填充的示意图;
图2C是本申请实施例提供的一种确定对网络层A的第一输入矩阵进行卷积计算需要的P1个处理元件的示意图;
图2D是本申请实施例提供的一种多个处理元件的示意图;
图2E是本申请实施例提供的一种确定网络层A的输出矩阵的第i行输出数据的示意图;
图2F是本申请实施例提供的一种确定网络层A的输出矩阵的示意图;
图2G是本申请实施例提供的一种对第1行输入数据至第3层输入数据筛选的示意图;
图2H是本申请实施例提供的另一种确定网络层A的输出矩阵的示意图;
图3是本申请实施例提供的另一种基于卷积神经网络的图像分析方法的流程示意图;
图4是本申请实施例提供的一种基于卷积神经网络的图像分析装置的功能单元组成框图;
图5是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
请参见图1,图1是本申请实施例提供的一种基于卷积神经网络的图像分析系统的架构示意图,该基于卷积神经网络的图像分析系统包括神经网络处理器,其中:
神经网络处理器,用于获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的;
神经网络处理器,还用于获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
神经网络处理器,还用于根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;
神经网络处理器,还用于根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;
神经网络处理器,还用于根据目标种类图像包括的多个特征执行目标预设操作。
请参见图2A,图2A是本申请实施例提供的一种基于卷积神经网络的图像分析方法的流程示意图,应用于神经网络处理器,该基于卷积神经网络的图像分析方法包括步骤201-205,具体如下:
201、神经网络处理器获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的。
N个网络层包括输入层、卷积层、池化层、全连接层和输出层。
网络层A的输入矩阵可以是基于人脸图像得到的,也可以是基于车牌图像得到的,在此不作限定。其中,人脸图像或车牌图像均是通过摄像头采集得到的。
202、神经网络处理器获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长。
在一个可能的示例中,神经网络处理器获得网络层A对应的目标卷积核和目标卷积步长,包括:
神经网络处理器根据网络层与卷积核的映射关系获得网络层A对应的目标卷积核;
神经网络处理器根据网络层与卷积步长的映射关系获得网络层A对应的目标卷积步长。
网络层与卷积核的映射关系预先存储于神经网络处理器中,网络层与卷积核的映射关系如下表1所示:
表1
网络层 卷积核
输入层 第一卷积核
卷积层 第二卷积核
池化层 第三卷积核
全连接层 第四卷积核
输出层 第五卷积核
网络层与卷积步长的映射关系预先存储于神经网络处理器中,网络层与卷积步长的映射关系如下表2所示:
表2
网络层 卷积步长
输入层 第一卷积步长
卷积层 第二卷积步长
池化层 第三卷积步长
全连接层 第四卷积步长
输出层 第五卷积步长
神经网络处理器获得网络层A对应的目标卷积步长还可以通过向中央处理器发送携带网络层A的卷积步长获取请求,卷积步长获取请求用于指示中央处理器反馈网络层A的卷积步长;接收中央处理器针对卷积步长获取请求发送的网络层A的目标卷积步长。
203、神经网络处理器根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征。
在一个可能的示例中,神经网络处理器包括P2个处理元件,P2个处理元件中的每个处理元件包括Q个乘法累加单元,P2和Q均为大于1的整数,神经网络处理器根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵之前,方法还包括:
神经网络处理器根据目标卷积核和目标卷积步长对网络层A的输入矩阵进行填充,得到网络层A的第一输入矩阵;
神经网络处理器根据网络层A的第一输入矩阵、P2和Q确定对网络层A的第一输入矩阵进行卷积计算需要的P1个处理元件。
在一个可能的示例中,目标卷积核的大小为F×F,目标卷积步长为S5×S6,神经网络处理器根据目标卷积核和目标卷积步长对网络层A的输入矩阵进行填充,得到网络层A的第一输入矩阵,包括:
神经网络处理器获取网络层A的输入矩阵的大小R5×R6;
神经网络处理器计算(R5-F)/S6,得到第一余数,以及根据第一余数和S6确定网络层A的输入矩阵对应的行填充数据;
神经网络处理器计算(R6-F)/S5,得到第二余数,以及根据第二余数和S5确定网络层A的输入矩阵对应的列填充数据;
神经网络处理器根据行填充数据和列填充数据对网络层A的输入矩阵执行填充操作,得到网络层A的第一输入矩阵。
目标卷积步长包括水平卷积步长和垂直卷积步长,水平卷积步长为S5,垂直卷积步长为S6。
网络层A的输入矩阵包括R5行输入数据和R6列输入数据。
具体地,神经网络处理器根据第一余数和S6确定网络层A的输入矩阵对应的行填充数据的实施方式可以为:
若第一余数为0且S6=1或2,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第0行输入数据和第(R5+1)行输入数据;
若第一余数为0且S6为大于1的奇数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第[-(S6+1)/2+2]行输入数据至第0行输入数据和第(R5+1)行输入数据至第[R5+(S6+1)/2]行输入数据;
若第一余数为0且S6为大于2的偶数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第(-S6/2+1)行输入数据至第0行输入数据和第(R5+1)行输入数据至第(R5+S6/2)行输入数据。
若第一余数不为0且S6=2,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第0行输入数据、第(R5+1)行输入数据和第(R5+2)行输入数据;
若第一余数不为0、S6与第一余数的差值T1为1且S6为大于2的奇数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第[-(T1+1) /2+1]行输入数据至第0行输入数据和第(R5+1)行输入数据至第[R5+(T1+1)/2]行输入数据;
若第一余数不为0、S6与第一余数的差值T1为1且S6为大于2的偶数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第[-(T1+2)/2+2]行输入数据至第0行输入数据和第(R5+1)行输入数据至第[R5+(T1+2)/2]行输入数据;
若第一余数不为0且S6与第一余数的差值T1为大于1的奇数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第[-(T1+1)/2+2]行输入数据至第0行输入数据和第(R5+1)行输入数据至第[R5+(T1+1)/2]行输入数据;
若第一余数不为0且S6与第一余数的差值T1为大于1的偶数,则神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第(-T1/2+1)行输入数据至第0行输入数据和第(R5+1)行输入数据至第(R5+T1/2)行输入数据。
具体地,神经网络处理器根据第二余数和S5确定网络层A的输入矩阵对应的列填充数据的实施方式可以为:
若第二余数为0且S5=1或2,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第0列输入数据和第(R6+1)列输入数据;
若第二余数为0且S5为大于1的奇数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第[-(S5+1)/2+2]列输入数据至第0列输入数据和第(R6+1)列输入数据至第[R6+(S5+1)/2]列输入数据;
若第二余数为0且S5为大于2的偶数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第(-S5/2+1)列输入数据至第0列输入数据和第(R6+1)列输入数据至第(R6+S5/2)列输入数据;
若第二余数不为0且S5=2,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第0列输入数据、第(R6+1)列输入数据和第(R6+2)列输入数据;
若第二余数不为0、S5与第二余数的差值T2为1且S5为大于2的奇数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第[-(S5+1)/2+1]列输入数据至第0列输入数据和第(R6+1)列输入数据至第[R6+(S5+1)/2]列输入数据;
若第二余数不为0、S5与第二余数的差值T2为1且S5为大于2的偶数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第[-(S5+2)/2+2]列输入数据至第0列输入数据和第(R6+1)列输入数据至第[R6+(S5+2)/2]列输入数据;
若第二余数不为0且S5与第二余数的差值T2为大于1的奇数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第[-(T2+1)/2+2]列输入数据至第0列输入数据和第(R6+1)列输入数据至第[R6+(T2+1)/2]列输入数据;
若第二余数不为0且S5与第二余数的差值T2为大于1的偶数,则神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第(-T2/2+1)列输入数 据至第0列输入数据和第(R6+1)列输入数据至第(R6+T2/2)列输入数据。
举例来说,如图2B所示,图2B是本申请实施例提供的一种填充的示意图,网络层A的输入矩阵的大小为8×8,目标卷积核的大小为3×3,目标卷积步长为2×2,神经网络处理器确定网络层A的输入矩阵对应的行填充数据为第0行输入数据、第9行输入数据和第10行输入数据,神经网络处理器确定网络层A的输入矩阵对应的列填充数据为第0列输入数据、第9列填充数据和第10列输入数据,神经网络处理器根据网络层A的输入矩阵对应的行填充数据和列填充数据对网络层A的输入矩阵进行填充,得到网络层A的第一输入矩阵。
在一个可能的示例中,神经网络处理器根据网络层A的第一输入矩阵、P2和Q确定对网络层A的第一输入矩阵进行卷积计算需要的P1个处理元件,包括:
神经网络处理器获取网络层A的第一输入矩阵的大小R7×R8;
神经网络处理器计算R8/Q,得到商和第三余数;
若第三余数为0,则神经网络处理器将商确定为P1,P1为对网络层A的第一输入矩阵进行卷积计算需要的处理元件的数量,以及确定P1个处理元件中的每个处理元件包括Q个乘法累加单元;
若第三余数不为0,则神经网络处理器将商加1确定为P1,P1为对网络层A的第一输入矩阵进行卷积计算需要的处理元件的数量,以及确定P1个处理元件中的第1个处理元件至第(P1-1)个处理元件中的每个处理元件包括Q个乘法累加单元和第P1个处理元件包括的乘法累加单元的数量为第三余数。
举例来说,如图2C所示,图2C是本申请实施例提供的一种确定对网络层A的第一输入矩阵进行卷积计算需要的P1个处理元件的示意图,网络层A的第一输入矩阵包括140列输入数据,P2=32,Q=14,神经网络处理器计算140/14,得到商为10和第三余数均为0,神经网络处理器确定对网络层A的第一输入矩阵进行卷积计算需要10个处理元件,10个处理元件中的每个处理元件包括14个乘法累加单元。
网络层A的输出矩阵的大小为R9×R10,R9=(R7-F)/S6+1,R10=(R8-F)/S5+1。
在一个可能的示例中,目标卷积步长为S1×S2,网络层A的输入矩阵的大小为R1×R2,目标卷积核的大小为F×F,神经网络处理器根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,包括:
当S1和S2均为1时,神经网络处理器获取网络层A的输入矩阵的第i行输入数据至第(i+F-1)行输入数据,i为1至(R1-F+1)中的任意一个;
神经网络处理器对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据;
神经网络处理器根据(R1-F+1)行输出数据获得网络层A的输出矩阵,(R1-F+1)行输出数据包括第i行输出数据。
具体地,目标卷积核包括的9个元素值为(c、b、a、f、e、d、i、h、g),神经网络处理器使用P1个处理元件对第i行输入数据至第(i+F-1)行输入数据 和目标卷积核执行卷积计算操作,得到网络层A的输出矩阵的第i行输出数据的实施方式可以为:
B1:神经网络处理器利用P1个处理元件将第i行输入数据与b相乘,得到R2个第一中间值;
B2:神经网络处理器利用P1个处理元件对第i行输入数据进行左移,将左移后的第i行输出数据与a相乘,得到R2个第二中间值,以及将R2个第二中间值分别与R2个第一中间值累加,得到R2个第三中间值;
B3:神经网络处理器利用P1个处理元件对第i行输入数据进行右移,将右移后的第i行输出数据与c相乘,得到R2个第四中间值,以及将R2个第四中间值分别与R2个第三中间值累加,得到R2个第五中间值;
B4:神经网络处理器利用P1个处理元件将第(i+1)行输入数据与e相乘,得到R2个第六中间值,以及将R2个第六中间值分别与R2个第五中间值累加,得到R2个第七中间值;
B5:神经网络处理器利用P1个处理元件对第(i+1)行输入数据进行左移,将左移后的第(i+1)行输出数据与d相乘,得到R2个第八中间值,以及将R2个第八中间值分别与R2个第七中间值累加,得到R2个第九中间值;
B6:神经网络处理器利用P1个处理元件对第(i+1)行输入数据进行右移,将右移后的第(i+1)行输出数据与f相乘,得到R2个第十中间值,以及将R2个第十中间值分别与R2个第九中间值累加,得到R2个第十一中间值;
B7:神经网络处理器利用P1个处理元件将第i行输入数据与h相乘,得到R2个第十二中间值,以及将R2个第十二中间值分别与R2个第十一中间值累加,得到R2个第十三中间值;
B8:神经网络处理器利用P1个处理元件对第(i+F-1)行输入数据进行左移,将左移后的第(i+F-1)行输出数据与g相乘,得到R2个第十四中间值,以及将R2个第十四中间值分别与R2个第十三中间值累加,得到R2个第十五中间值;
B9:神经网络处理器利用P1个处理元件对第(i+F-1)行输入数据进行右移,将右移后的第(i+F-1)行输出数据与i相乘,得到R2个第十六中间值,以及将R2个第十六中间值分别与R2个第十五中间值累加,得到R2个第十七中间值。
P1个处理元件中的每个处理元件包括的至少一个乘法累加单元并行运算。
第i行输入数据至第(i+F-1)行输入数据中的每行输入数据进行左移是通过左移程序实现的,第i行输入数据至第(i+F-1)行输入数据中的每行输入数据进行右移是通过右移程序实现的,左移程序和右移程序预先存储于神经网络处理器中。
参见图2D,图2D是本申请实施例提供的一种多个处理元件的示意图,多个处理元件包括1个High(高位)PE、多个Middle(中间)PE和1个Low(低位)PE,对于每个PE中任意两个相邻的MAC来说,左侧MAC为高位MAC,右侧MAC为低位MAC;
处理元件为Processing Elements,简称PE;乘法累加单元为Multiply  Accumulate unit,简称MAC。
对于High PE来说,左移的时候,高位MAC从PE内右侧MAC获取数据,最低位MAC从右侧PE的高位MAC获取数据;右移的时候,最高位MAC从填充总线获取数据,低位MAC从PE内左侧MAC获取数据,填充总线用于填充数据与处理元件之间的数据传输;
对于Low PE来说,左移的时候,高位MAC从PE内右侧MAC获取数据,最低位MAC从填充总线获取数据;右移的时候,最高位MAC从左侧PE的低位MAC获取数据,低位MAC从PE内的左侧MAC获取数据;
对于Middle PE来说,左移的时候,高位MAC从PE内右侧MAC获取数据,最低位MAC从右侧PE高位MAC获取数据;右移的时候,最高位MAC从左侧PE的低位MAC获取数据,低位MAC从PE内左侧MAC获取数据。
在申请实施例中,一个处理件组包括的P1个处理元件并行处理一行输入数据,相邻处理元件之间数据可以相互的左移或右移,然而,现有的卷积计算方式每次使用处理元件组对输入矩阵和卷积核矩阵进行卷积计算时,相邻处理元件之间数据不可移动。
其中,当F不为3时,神经网络处理器使用P1个处理元件对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据的实施方式参照当F为3时,神经网络处理器使用P1个处理元件对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据的实施方式,在此不再叙述。
举例来说,如图2E所示,图2E是本申请实施例提供的一种确定网络层A的输出矩阵的第i行输出数据的示意图,P1=1,Q=7,R2=7,F=3,S1=1,i=1,网络层A的输入矩阵的第1行输入数据至第3行输入数据中的每行输入数据包括7个元素值,第一行输入数据中0和8均为填充数据,第二行输入数据中16和12均为填充数据,第三行输入数据中32和40均为填充数据,目标卷积核包括的9个元素值为(c、b、a、f、e、d、i、h、g),神经网络处理器使用7个乘法累加单元对网络层A的输入矩阵的第1行输入数据至第3行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第1行输出数据为(U1、U2、U3、U4、U5、U6、U7)。
可见,在本示例中,神经网络处理器使用P1个处理元件对网络层A的输入矩阵的第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据。由于P1个处理元件并行运算,这样有助于提高神经网络处理器的计算效率。
具体地,目标卷积核还包括偏置值,神经网络处理器根据(R1-F+1)行输出数据获得网络层A的输出矩阵的实施方式可以为:神经网络处理器根据(R3-F+1)行输出数据确定网络层A的待输出矩阵;神经网络处理器将网络层A的待输出矩阵的第m行第n列的元素值、偏置值之和确定为网络层A的输出矩阵的第m行第n列的元素值;神经网络处理器对网络层A的待输出矩阵的[(R1-F+1)×(R2-F+1)]个元素值中除第m行第n列的元素值之外的[(R1-F+1)×(R2-F+1)-1]个元素值执行相同操作,得到网络层A的输出矩阵的[(R1-F+1) ×(R2-F+1)-1]个元素值,网络层A的输出矩阵的[(R1-F+1)×(R2-F+1)-1]个元素值与网络层A的待输出矩阵的[(R1-F+1)×(R2-F+1)]个元素值中除第m行第n列的元素值之外的[(R1-F+1)×(R2-F+1)-1]个元素值一一对应;神经网络处理器根据网络层A的输出矩阵的第m行第n列的元素值和网络层A的输出矩阵的[(R1-F+1)×(R2-F+1)-1]个元素值确定网络层A的输出矩阵。
举例来说,如图2F所示,图2F是本申请实施例提供的一种确定网络层A的输出矩阵的示意图,R1=5,F=3,网络层A的输出矩阵的第1行输出数据包括的3个元素值为(2、4、3),网络层A的输出矩阵的第2行输出数据包括的3个元素值为(5、7、8),网络层A的输出矩阵的第3行输出数据包括的3个元素值为(9、1、6),偏置值为1,神经网络处理器确定网络层A的输出矩阵包括的9个元素值为(3、5、4、6、8、9、10、2、7)。
当S1为1时,网络层A的输出矩阵的第i行输出数据为(U1、U2、U3、U4、U5、U6、U7、U8、U9),当S1为3时,网络层A的输出矩阵的第i行输出数据为(U1、U4、U7)。
当S1为1时,网络层A的输出矩阵的第i行输出数据为(U1、U2、U3、U4、U5、U6、U7、U8、U9、U10、U11、U12、U13、U14、U15),当S1为5时,网络层A的输出矩阵的第i行输出数据为(U1、U6、U11)。
当S1为1时,网络层A的输出矩阵的第i行输出数据为(U1、U2、U3、U4、U5、U6、U7、U8、U9、U10、U11、U12、U13、U14、U15、U16、U17、U18、U19、U20、U21),当S1为7时,网络层A的输出矩阵的第i行输出数据为(U1、U8、U15)。
在一个可能的示例中,目标卷积步长为S3×S4,网络层A的输入矩阵的大小为R3×R4,目标卷积核的大小为F×F,神经网络处理器根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,包括:
当S3和S4均为2时,神经网络处理器获取网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,j为1至[(R3-F)/S4+1]中的任意一个;
根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据;
神经网络处理器对筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据;
神经网络处理器根据[(R3-F)/S4+1]行输出数据获得网络层A的输出矩阵,[(R3-F)/S4+1]行输出数据包括第j行输出数据。
在一个可能的示例中,神经网络处理器根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据,包括:
神经网络处理器根据S3对第(2j-1)行输入数据进行F次筛选,得到筛选后的第(2j-1)行输入数据,筛选后的第(2j-1)行输入数据包括F个第(2j-1)行子输入数据,每个第(2j-1)行子输入数据的数据个数为第(2j-1)行输入数据的数据个数的一半;
神经网络处理器根据S3对第2j行输入数据进行F次筛选,得到筛选后的第2j行输入数据,筛选后的第2j行输入数据包括F个第2j行子输入数据,每个第2j行子输入数据的数据个数为第2j行输入数据的数据个数的一半;
神经网络处理器根据S3对第(2j+1)行输入数据进行F次筛选,得到筛选后的第(2j+1)行输入数据,筛选后的第(2j+1)行输入数据包括F个第(2j+1)行子输入数据,每个第(2j+1)行子输入数据的数据个数为第(2j+1)行输入数据的数据个数的一半。
举例来说,如图2G所示,图2G是本申请实施例提供的一种对第1行输入数据至第3行输入数据筛选的示意图,第1行输入数据至第3行输入数据中的每行输入数据的数据个数均为15,第1行输入数据中0和14均为填充数据,第2行输入数据中16和30均为填充数据,第3行输入数据中32和46均为填充数据,水平卷积步长S3为2,对第1行输入数据进行3次筛选,得到3个第1行子输入数据;对第2行输入数据进行3次筛选,得到3个第2行子输入数据;对第3行输入数据进行3次筛选,得到3个第3行子输入数据。
具体地,目标卷积核包括的9个元素值为(c、b、a、f、e、d、i、h、g),神经网络处理器使用P1个处理元件对第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核执行卷积计算操作,得到网络层A的输出矩阵的第j行输出数据的实施方式可以为:
C1:神经网络处理器根据S3从第(2j-1)行输入数据中选取R4/S3个第一待相乘元素值,使用P1个处理元件将R4/S3个第一待相乘元素值分别与b相乘,得到R4/S3个第十八中间值;
C2:神经网络处理器根据S3从(2j-1)行输入数据中选取R4/S3个第二待相乘元素值,使用P1个处理元件将R4/S3个第二待相乘元素值分别与a相乘,得到R4/S3个第十九中间值,以及将R4/S3个第十九中间值分别与R4/S3个第十八中间值累加,得到R4/S3个第二十中间值;
C3:神经网络处理器根据S3从第(2j-1)行输入数据中选取R4/S3个第三待相乘元素值,使用P1个处理元件将R4/S3个第三待相乘元素值分别与c相乘,得到R4/S3个第二十一中间值,以及将R4/S3个第二十一中间值分别与R4/S3个第二十中间值累加,得到R4/S3个第二十二中间值;
C4:神经网络处理器根据S3从第2j行输入数据中选取R4/S3个第四待相乘元素值,使用P1个处理元件将R4/S3个第四待相乘元素值分别与e相乘,得到R4/S3个第二十三中间值,以及将R4/S3个第二十三中间值分别与R4/S3个第二十二中间值累加,得到R4/S3个第二十四中间值;
C5:神经网络处理器根据S3从第2j行输入数据中选取R4/S3个第五待相乘元素值,使用P1个处理元件将R4/S3个第五待相乘元素值分别与d相乘,得到R4/S3个第十二五中间值,以及将R4/S3个第二十五中间值分别与R4/S3个第二十四中间值累加,得到R4/S3个第二十六中间值;
C6:神经网络处理器根据S3从第2j行输入数据中选取R4/S3个第六待相乘元素值,使用P1个处理元件将R4/S3个第六待相乘元素值分别与f相乘,得到R4/S3个第二十七中间值,以及将R4/S3个第二十七中间值分别与R4/S3个 第二十六中间值累加,得到R4/S3个第二十八中间值;
C7:神经网络处理器根据S3从第(2j+1)行输入数据中选取R4/S3个第七待相乘元素值,使用P1个处理元件将R4/S3个第七待相乘元素值分别与h相乘,得到R4/S3个第二十九中间值,以及将R4/S3个第二十九中间值分别与R4/S3个第二十八中间值累加,得到R4/S3个第三十中间值;
C8:神经网络处理器根据S3从第(2j+1)行输入数据中选取R4/S3个第八待相乘元素值,使用P1个处理元件将R4/S3个第八待相乘元素值分别与g相乘,得到R4/S3个第三十一中间值,以及将R4/S3个第三十一中间值分别与R4/S3个第三十中间值累加,得到R4/S3个第三十二中间值;
C9:神经网络处理器根据S3从第(2j+1)行输入数据中选取R4/S3个第九待相乘元素值,使用P1个处理元件将R4/S3个第九待相乘元素值分别与i相乘,得到R4/S3个第三十三中间值,以及将R4/S3个第三十三中间值分别与R4/S3个第三十二中间值累加,得到R4/S3个第三十四中间值。
当F不为3时,神经网络处理器使用P1个处理元件组对第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据的实施方式参照当F为3时,神经网络处理器使用P1个处理元件对第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据的实施方式,在此不再叙述。
当S1=2时,网络层A的输出矩阵的第j行输出数据为(V1、V2、V3、V4、V5、V6、V7、V8),当S1=4时,网络层A的输出矩阵的第j行输出数据为(V1、V3、V5、V7)。
当S1=2时,网络层A的输出矩阵的第j行输出数据为(V1、V2、V3、V4、V5、V6、V7、V8、V9),当S1=6时,网络层A的输出矩阵的第j行输出数据为(V1、V4、V7)。
举例来说,如图2H所示,图2H是本申请实施例提供的另一种确定网络层A的输出矩阵的示意图,P1=1,Q=7,R4=13,F=3,S3=2,i=1,网络层A的输入矩阵的第1行输入数据至第3行输入数据中的每行输入数据包括13个元素值,第一行输入数据中0和14均为填充数据,第二行输入数据中16和30均为填充数据,第三行输入数据中32和46均为填充数据,目标卷积核包括的9个元素值为(c、b、a、f、e、d、i、h、g),神经网络处理器使用7个乘法累加单元对网络层A的输入矩阵的第1行输入数据至第3行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第1行输出数据。
可见,在本示例中,神经网络处理器使用P1个处理元件对网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据。由于P1个处理元件并行运算,这样有助于提高神经网络处理器的计算效率。
神经网络处理器根据[(R3-F)/S4+1]行输出数据和偏置值确定网络层A的输出矩阵的实施方式参照神经网络处理器根据(R1-F+1)行输出数据和偏置值确定网络层A的输出矩阵的实施方式,在此不再叙述。
204、神经网络处理器根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作。
种类图像与预设操作一一对应;如果种类图像为人脸图像,那么预设操作为基于人脸图像获得人物信息;如果种类图像为车牌图像,那么预设操作为基于车牌图像获得车牌登记信息。
205、神经网络处理器根据目标种类图像包括的多个特征执行目标预设操作。
可以看出,相较于每次根据网络层的输入矩阵(基于输入图像得到的)和卷积核进行卷积计算均需要神经网络处理器根据卷积步长从网络层的输入矩阵中选取运算矩阵,以及对运算矩阵和卷积核进行卷积计算,在本申请实施例中,对于不同的卷积步长来说,由于卷积步长用于从网络层A的输入矩阵(基于目标种类图像得到的)中筛选卷积计算所需的多个行输入数据,进而神经网络处理器对卷积计算所需的多个行输入数据和卷积核进行卷积计算,得到网络层A的输出矩阵(用于表征目标种类图像包括的多个特征),这样有助于提高在不同卷积步长下神经网络处理器的计算效率。由于获得目标种类图像包括的多个特征的时间大大缩短,因此能够更快的基于目标种类图像包括的多个特征执行目标种类图像对应的目标预设操作,进而间接提高图像分析的效率。
在一个可能的示例中,目标种类图像为人脸图像,目标种类图像包括的多个特征为多个人脸特征,神经网络处理器根据网络层A的输出矩阵执行目标预设操作,包括:
神经网络处理器判断多个人脸特征组成的人脸特征集合是否与人脸特征库匹配;
若多个人脸特征组成的人脸特征集合与目标人脸特征集合匹配,则神经网络处理器根据预存的人脸特征集合与人物信息的映射关系确定目标人脸特征集合对应的目标人物信息,目标人脸特征集合属于人脸特征库;
神经网络处理器对目标人物信息执行输出操作。
人脸特征集合与人物信息的映射关系预先存储于神经网络处理器中,人脸特征集合与人物信息的映射关系如下表3所示:
表3
人脸特征集合 人物信息
第一人脸特征集合 第一人物信息
第二人脸特征集合 第二人物信息
第三人脸特征结合 第三人物信息
…… ……
人脸特征集合与人物信息一一对应;如果人脸特征集合为第一人脸特征集合,那么人物信息为第一人物信息。
可见,在本示例中,由于获得多个人脸特征的时间大大缩短,因此能够更快的确定多个人脸特征组成的人脸特征集合对应的人物信息,进而间接提高基于人脸图像分析得到相应人物信息的效率。
在一个可能的示例中,目标种类图像为车牌图像,目标种类图像包括的多 个特征为目标车牌号码,神经网络处理器根据网络层A的输出矩阵执行目标预设操作,包括:
神经网络处理器判断目标车牌号码是否与车牌号码库匹配;
若目标车牌号码与车牌号码库匹配,则神经网络处理器根据预存的车牌号码与车辆登记信息的映射关系确定目标车牌号码对应的目标车牌登记信息;
神经网络处理器对目标车牌登记信息执行输出操作。
车牌号码与车辆登记信息的映射关系预先存储于神经网络处理器中,车牌号码与车辆登记信息的映射关系如下表4所示:
表4
车牌号码 车辆登记信息
第一车牌号码 第一车辆登记信息
第二车牌号码 第二车辆登记信息
第三车牌号码 第三车辆登记信息
…… ……
车牌号码与车辆登记信息一一对应;如果车牌号码为第一车牌号码,那么车辆登记信息为第一车辆登记信息。
可见,在本示例中,由于获得车牌号码的时间大大缩短,因此能够更快的确定车牌号码对应的车辆登记信息,进而间接提高基于车牌图像分析得到相应车牌登记信息的效率。
与上述图2A所示的实施例一致的,请参见图3,图3是本申请实施例提供的另一种基于卷积神经网络的图像分析方法的流程示意图,应用于神经网络处理器,该基于卷积神经网络的图像分析方法包括步骤301-311,具体如下:
301、神经网络处理器获得网络层A的输入矩阵,网络层A的输入矩阵的大小为R3×R4,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于人脸图像得到的。
302、神经网络处理器根据网络层与卷积核的映射关系确定网络层A对应的目标卷积核,目标卷积核的大小为F×F。
303、神经网络处理器根据网络层与卷积步长的映射关系获得网络层A对应的目标卷积步长,目标卷积步长为S3×S4,不同的网络层对应不同的卷积步长。
304、当S3和S4均为2时,神经网络处理器获取网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,j为1至[(R3-F)/S4+1]中的任意一个。
305、神经网络处理器根据S3对第(2j-1)行输入数据进行F次筛选,得到筛选后的第(2j-1)行输入数据,筛选后的第(2j-1)行输入数据包括F个第(2j-1)行子输入数据,每个第(2j-1)行子输入数据的数据个数为第(2j-1)行输入数据的数据个数的一半。
306、神经网络处理器根据S3对第2j行输入数据进行F次筛选,得到筛选后的第2j行输入数据,筛选后的第2j行输入数据包括F个第2j行子输入数据,每个第2j行子输入数据的数据个数为第2j行输入数据的数据个数的一半。
307、神经网络处理器根据S3对第(2j+1)行输入数据进行F次筛选,得到筛选后的第(2j+1)行输入数据,筛选后的第(2j+1)行输入数据包括F个第(2j+1)行子输入数据,每个第(2j+1)行子输入数据的数据个数为第(2j+1)行输入数据的数据个数的一半。
308、神经网络处理器对筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据。
309、神经网络处理器根据[(R3-F)/S4+1]行输出数据获得网络层A的输出矩阵,[(R3-F)/S4+1]行输出数据包括第j行输出数据,网络层A的输出矩阵用于表征多个人脸特征。
310、若多个人脸特征组成的人脸特征集合与人脸特征库中的目标人脸特征集合匹配,则神经网络处理器根据预存的人脸特征集合与人物信息的映射关系确定目标人脸特征集合对应的目标人物信息。
311、神经网络处理器对目标人物信息执行输出操作。
需要说明的是,图3所示的方法的各个步骤的具体实现过程可参见上述方法的具体实现过程,在此不再叙述。
请参见图4,图4为本申请实施例提供的一种基于卷积神经网络的图像分析装置的功能单元组成框图,应用于神经网络处理器,该基于卷积神经网络的图像分析装置400包括:
第一获得单元401,用于获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的;
第二获得单元402,用于获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
计算单元403,用于根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;
确定单元404,用于根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;
执行单元405,用于根据目标种类图像包括的多个特征执行目标预设操作。
可以看出,相较于每次根据网络层的输入矩阵(基于输入图像得到的)和卷积核进行卷积计算均需要神经网络处理器根据卷积步长从网络层的输入矩阵中选取运算矩阵,以及对运算矩阵和卷积核进行卷积计算,在本申请实施例中,对于不同的卷积步长来说,由于卷积步长用于从网络层A的输入矩阵(基于目标种类图像得到的)中筛选卷积计算所需的多个行输入数据,进而神经网络处理器对卷积计算所需的多个行输入数据和卷积核进行卷积计算,得到网络层A的输出矩阵(用于表征目标种类图像包括的多个特征),这样有助于提高在不同卷积步长下神经网络处理器的计算效率。由于获得目标种类图像包括的多个 特征的时间大大缩短,因此能够更快的基于目标种类图像包括的多个特征执行目标种类图像对应的目标预设操作,进而间接提高图像分析的效率。
在一个可能的示例中,目标种类图像为人脸图像,目标种类图像包括的多个特征为多个人脸特征,在根据网络层A的输出矩阵执行目标预设操作方面,上述执行单元405具体用于:
判断多个人脸特征组成的人脸特征集合是否与人脸特征库匹配;
若多个人脸特征组成的人脸特征集合与目标人脸特征集合匹配,则根据预存的人脸特征集合与人物信息的映射关系确定目标人脸特征集合对应的目标人物信息,目标人脸特征集合属于人脸特征库;
对目标人物信息执行输出操作。
在一个可能的示例中,目标种类图像为车牌图像,目标种类图像包括的多个特征为目标车牌号码,在根据网络层A的输出矩阵执行目标预设操作方面,上述执行单元405具体用于:
判断目标车牌号码是否与车牌号码库匹配;
若目标车牌号码与车牌号码库匹配,则根据预存的车牌号码与车辆登记信息的映射关系确定目标车牌号码对应的目标车牌登记信息;
对目标车牌登记信息执行输出操作。
在一个可能的示例中,在获得网络层A对应的目标卷积核和目标卷积步长方面,上述第二获得单元402具体用于:
根据网络层与卷积核的映射关系获得网络层A对应的目标卷积核;
根据网络层与卷积步长的映射关系获得网络层A对应的目标卷积步长。
在一个可能的示例中,目标卷积步长为S1×S2,网络层A的输入矩阵的大小为R1×R2,目标卷积核的大小为F×F,在根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵方面,上述计算单元403具体用于:
当S1和S2均为1时,获取网络层A的输入矩阵的第i行输入数据至第(i+F-1)行输入数据,i为1至(R1-F+1)中的任意一个;
对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据;
根据(R1-F+1)行输出数据获得网络层A的输出矩阵,(R1-F+1)行输出数据包括第i行输出数据。
在一个可能的示例中,目标卷积步长为S3×S4,网络层A的输入矩阵的大小为R3×R4,目标卷积核的大小为F×F,在根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵方面,上述计算单元403具体用于:
当S3和S4均为2时,获取网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,j为1至[(R3-F)/S4+1]中的任意一个;
根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据;
对筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进 行卷积计算,得到网络层A的输出矩阵的第j行输出数据;
根据[(R3-F)/S4+1]行输出数据获得网络层A的输出矩阵,[(R3-F)/S4+1]行输出数据包括第j行输出数据。
在一个可能的示例中,在根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据方面,上述计算单元403具体用于:
根据S3对第(2j-1)行输入数据进行F次筛选,得到筛选后的第(2j-1)行输入数据,筛选后的第(2j-1)行输入数据包括F个第(2j-1)行子输入数据,每个第(2j-1)行子输入数据的数据个数为第(2j-1)行输入数据的数据个数的一半;
根据S3对第2j行输入数据进行F次筛选,得到筛选后的第2j行输入数据,筛选后的第2j行输入数据包括F个第2j行子输入数据,每个第2j行子输入数据的数据个数为第2j行输入数据的数据个数的一半;
根据S3对第(2j+1)行输入数据进行F次筛选,得到筛选后的第(2j+1)行输入数据,筛选后的第(2j+1)行输入数据包括F个第(2j+1)行子输入数据,每个第(2j+1)行子输入数据的数据个数为第(2j+1)行输入数据的数据个数的一半。
与上述图2A和图3所示的实施例一致的,请参见图5,图5是本申请实施例提供的一种电子设备的结构示意图,该电子设备500包括处理器、存储器、通信接口以及一个或多个程序,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行以下步骤的指令:
获得网络层A的输入矩阵,网络层A为卷积神经网络模型包括的多个网络层中的其中一个,网络层A的输入矩阵是基于目标种类图像得到的;
获得网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
根据目标卷积步长定对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵,目标卷积步长用于从网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,网络层A的输出矩阵用于表征目标种类图像包括的多个特征;
根据预存的种类图像与预设操作的映射关系确定目标种类图像对应的目标预设操作;
根据目标种类图像包括的多个特征执行目标预设操作。
可以看出,相较于每次根据网络层的输入矩阵(基于输入图像得到的)和卷积核进行卷积计算均需要神经网络处理器根据卷积步长从网络层的输入矩阵中选取运算矩阵,以及对运算矩阵和卷积核进行卷积计算,在本申请实施例中,对于不同的卷积步长来说,由于卷积步长用于从网络层A的输入矩阵(基于目标种类图像得到的)中筛选卷积计算所需的多个行输入数据,进而神经网络处理器对卷积计算所需的多个行输入数据和卷积核进行卷积计算,得到网络层A的输出矩阵(用于表征目标种类图像包括的多个特征),这样有助于提高在不 同卷积步长下神经网络处理器的计算效率。由于获得目标种类图像包括的多个特征的时间大大缩短,因此能够更快的基于目标种类图像包括的多个特征执行目标种类图像对应的目标预设操作,进而间接提高图像分析的效率。
在一个可能的示例中,目标种类图像为人脸图像,目标种类图像包括的多个特征为多个人脸特征,在根据网络层A的输出矩阵执行目标预设操作方面,上述程序包括具体用于执行以下步骤的指令:
判断多个人脸特征组成的人脸特征集合是否与人脸特征库匹配;
若多个人脸特征组成的人脸特征集合与目标人脸特征集合匹配,则根据预存的人脸特征集合与人物信息的映射关系确定目标人脸特征集合对应的目标人物信息,目标人脸特征集合属于人脸特征库;
对目标人物信息执行输出操作。
在一个可能的示例中,目标种类图像为车牌图像,目标种类图像包括的多个特征为目标车牌号码,在根据网络层A的输出矩阵执行目标预设操作方面,上述程序包括具体用于执行以下步骤的指令:
判断目标车牌号码是否与车牌号码库匹配;
若目标车牌号码与车牌号码库匹配,则根据预存的车牌号码与车辆登记信息的映射关系确定目标车牌号码对应的目标车牌登记信息;
对目标车牌登记信息执行输出操作。
在一个可能的示例中,在获得网络层A对应的目标卷积核和目标卷积步长方面,上述程序包括具体用于执行以下步骤的指令:
根据网络层与卷积核的映射关系获得网络层A对应的目标卷积核;
根据网络层与卷积步长的映射关系获得网络层A对应的目标卷积步长。
在一个可能的示例中,目标卷积步长为S1×S2,网络层A的输入矩阵的大小为R1×R2,目标卷积核的大小为F×F,在根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵方面,上述程序包括具体用于执行以下步骤的指令:
当S1和S2均为1时,获取网络层A的第一输入矩阵的第i行输入数据至第(i+F-1)行输入数据,i为1至(R1-F+1)中的任意一个;
对第i行输入数据至第(i+F-1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第i行输出数据;
根据(R1-F+1)行输出数据获得网络层A的输出矩阵,(R1-F+1)行输出数据包括第i行输出数据。
在一个可能的示例中,目标卷积步长为S3×S4,网络层A的输入矩阵的大小为R3×R4,目标卷积核的大小为F×F,在根据目标卷积步长对网络层A的输入矩阵和目标卷积核进行卷积计算,得到网络层A的输出矩阵方面,上述程序包括具体用于执行以下步骤的指令:
当S3和S4均为2时,获取网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,j为1至[(R3-F)/S4+1]中的任意一个;
根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据;
对筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据和目标卷积核进行卷积计算,得到网络层A的输出矩阵的第j行输出数据;
根据[(R3-F)/S4+1]行输出数据获得网络层A的输出矩阵,[(R3-F)/S4+1]行输出数据包括第j行输出数据。
在一个可能的示例中,在根据目标卷积步长对第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的第(2j-1)行输入数据至第(2j+1)行输入数据方面,上述程序包括具体用于执行以下步骤的指令:
根据S3对第(2j-1)行输入数据进行F次筛选,得到筛选后的第(2j-1)行输入数据,筛选后的第(2j-1)行输入数据包括F个第(2j-1)行子输入数据,每个第(2j-1)行子输入数据的数据个数为第(2j-1)行输入数据的数据个数的一半;
根据S3对第2j行输入数据进行F次筛选,得到筛选后的第2j行输入数据,筛选后的第2j行输入数据包括F个第2j行子输入数据,每个第2j行子输入数据的数据个数为第2j行输入数据的数据个数的一半;
根据S3对第(2j+1)行输入数据进行F次筛选,得到筛选后的第(2j+1)行输入数据,筛选后的第(2j+1)行输入数据包括F个第(2j+1)行子输入数据,每个第(2j+1)行子输入数据的数据个数为第(2j+1)行输入数据的数据个数的一半。
本申请实施例还提供一种神经网络处理器,该神经网络处理器用于实现如上述方法实施例中记载的任一方法的部分或全部步骤。
本申请实施例还提供一种神经网络处理器,该神经网络处理器包括如上述装置实施例中记载的任一卷积计算装置。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质用于存储计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括电子设备。
本申请实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括电子设备。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。

Claims (12)

  1. 一种基于卷积神经网络的图像分析方法,其特征在于,应用于神经网络处理器,包括:
    获得网络层A的输入矩阵,所述网络层A为卷积神经网络模型包括的多个网络层中的其中一个,所述网络层A的输入矩阵是基于目标种类图像得到的;
    获得所述网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
    根据所述目标卷积步长对所述网络层A的输入矩阵和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵,所述目标卷积步长用于从所述网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,所述网络层A的输出矩阵用于表征所述目标种类图像包括的多个特征;
    根据预存的种类图像与预设操作的映射关系确定所述目标种类图像对应的目标预设操作;
    根据所述目标种类图像包括的多个特征执行所述目标预设操作。
  2. 根据权利要求1所述的方法,其特征在于,所述目标种类图像为人脸图像,所述目标种类图像包括的多个特征为多个人脸特征,所述根据所述网络层A的输出矩阵执行所述目标预设操作,包括:
    判断所述多个人脸特征组成的人脸特征集合是否与人脸特征库匹配;
    若所述多个人脸特征组成的人脸特征集合与目标人脸特征集合匹配,则根据预存的人脸特征集合与人物信息的映射关系确定所述目标人脸特征集合对应的目标人物信息,所述目标人脸特征集合属于所述人脸特征库;
    对所述目标人物信息执行输出操作。
  3. 根据权利要求1所述的方法,其特征在于,所述目标种类图像为车牌图像,所述目标种类图像包括的多个特征为目标车牌号码,所述根据所述网络层A的输出矩阵执行所述目标预设操作,包括:
    判断所述目标车牌号码是否与车牌号码库匹配;
    若所述目标车牌号码与所述车牌号码库匹配,则根据预存的车牌号码与车辆登记信息的映射关系确定所述目标车牌号码对应的目标车牌登记信息;
    对所述目标车牌登记信息执行输出操作。
  4. 根据权利要求2或3所述的方法,其特征在于,所述获得所述网络层A对应的目标卷积核和目标卷积步长,包括:
    根据网络层与卷积核的映射关系获得所述网络层A对应的目标卷积核;
    根据网络层与卷积步长的映射关系获得所述网络层A对应的目标卷积步长。
  5. 根据权利要求4所述的方法,其特征在于,所述目标卷积步长为S1×S2,所述网络层A的输入矩阵的大小为R1×R2,所述目标卷积核的大小为F×F,所述根据所述目标卷积步长对所述网络层A的输入矩阵和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵,包括:
    当所述S1和所述S2均为1时,获取所述网络层A的输入矩阵的第i行输入数据至第(i+F-1)行输入数据,所述i为1至(R1-F+1)中的任意一个;
    对所述第i行输入数据至第(i+F-1)行输入数据和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵的第i行输出数据;
    根据(R1-F+1)行输出数据获得所述网络层A的输出矩阵,所述(R1-F+1)行输出数据包括所述第i行输出数据。
  6. 根据权利要求4所述的方法,其特征在于,所述目标卷积步长为S3×S4,所述网络层A的输入矩阵的大小为R3×R4,所述目标卷积核的大小为F×F,所述根据所述目标卷积步长对所述网络层A的输入矩阵和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵,包括:
    当所述S3和所述S4均为2时,获取所述网络层A的输入矩阵的第(2j-1)行输入数据至第(2j+1)行输入数据,所述j为1至[(R3-F)/S4+1]中的任意一个;
    根据所述目标卷积步长对所述第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的所述第(2j-1)行输入数据至第(2j+1)行输入数据;
    对所述筛选后的所述第(2j-1)行输入数据至第(2j+1)行输入数据和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵的第j行输出数据;
    根据[(R3-F)/S4+1]行输出数据获得所述网络层A的输出矩阵,所述[(R3-F)/S4+1]行输出数据包括所述第j行输出数据。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述目标卷积步长对所述第(2j-1)行输入数据至第(2j+1)行输入数据进行筛选,得到筛选后的所述第(2j-1)行输入数据至第(2j+1)行输入数据,包括:
    根据所述S3对所述第(2j-1)行输入数据进行F次筛选,得到筛选后的所述第(2j-1)行输入数据,所述筛选后的所述第(2j-1)行输入数据包括F个第(2j-1)行子输入数据,每个第(2j-1)行子输入数据的数据个数为所述第(2j-1)行输入数据的数据个数的一半;
    根据所述S3对所述第2j行输入数据进行F次筛选,得到筛选后的所述第2j行输入数据,所述筛选后的所述第2j行输入数据包括F个第2j行子输入数据,每个第2j行子输入数据的数据个数为所述第2j行输入数据的数据个数的一半;
    根据所述S3对所述第(2j+1)行输入数据进行F次筛选,得到筛选后的所述第(2j+1)行输入数据,所述筛选后的所述第(2j+1)行输入数据包括F个第(2j+1)行子输入数据,每个第(2j+1)行子输入数据的数据个数为所述第(2j+1)行输入数据的数据个数的一半。
  8. 一种基于卷积神经网络的图像分析装置,其特征在于,应用于神经网络处理器,包括:
    第一获得单元,用于获得网络层A的输入矩阵,所述网络层A为卷积神经网络模型包括的多个网络层中的其中一个,所述网络层A的输入矩阵是基于目标种类图像得到的;
    第二获得单元,用于获得所述网络层A对应的目标卷积核和目标卷积步长,不同的网络层对应不同的卷积步长;
    计算单元,用于根据所述目标卷积步长对所述网络层A的输入矩阵和所述目标卷积核进行卷积计算,得到所述网络层A的输出矩阵,所述目标卷积步长 用于从所述网络层A的输入矩阵中筛选卷积计算所需的多个行输入数据,所述网络层A的输出矩阵用于表征所述目标种类图像包括的多个特征;
    确定单元,用于根据预存的种类图像与预设操作的映射关系确定所述目标种类图像对应的目标预设操作;
    执行单元,用于根据所述目标种类图像包括的多个特征执行所述目标预设操作。
  9. 一种神经网络处理器,其特征在于,所述神经网络处理器用于实现如权利要求1-7任一项所述的方法的部分或全部步骤。
  10. 一种神经网络处理器,其特征在于,所述神经网络处理器包括如权利要求8所述的卷积计算装置。
  11. 一种电子设备,其特征在于,包括处理器、存储器、通信接口以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-7任一项所述的方法中的部分或全部步骤的指令。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序被处理器执行,以实现如权利要求1-7任一项所述的方法。
PCT/CN2020/109062 2019-11-07 2020-08-14 卷积计算方法及相关设备 WO2021139156A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/623,605 US11551438B2 (en) 2019-11-07 2020-08-14 Image analysis method and related device

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201911080933 2019-11-07
CN202010015744.6 2020-01-07
CN202010015744.6A CN111222465B (zh) 2019-11-07 2020-01-07 基于卷积神经网络的图像分析方法及相关设备

Publications (1)

Publication Number Publication Date
WO2021139156A1 true WO2021139156A1 (zh) 2021-07-15

Family

ID=70831037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/109062 WO2021139156A1 (zh) 2019-11-07 2020-08-14 卷积计算方法及相关设备

Country Status (3)

Country Link
US (1) US11551438B2 (zh)
CN (1) CN111222465B (zh)
WO (1) WO2021139156A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222465B (zh) * 2019-11-07 2023-06-13 深圳云天励飞技术股份有限公司 基于卷积神经网络的图像分析方法及相关设备
US11972348B2 (en) 2020-10-30 2024-04-30 Apple Inc. Texture unit circuit in neural network processor
CN112734827B (zh) * 2021-01-07 2024-06-18 京东鲲鹏(江苏)科技有限公司 一种目标检测方法、装置、电子设备和存储介质
US11823490B2 (en) * 2021-06-08 2023-11-21 Adobe, Inc. Non-linear latent to latent model for multi-attribute face editing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886164A (zh) * 2017-12-20 2018-04-06 东软集团股份有限公司 一种卷积神经网络训练、测试方法及训练、测试装置
CN108108711A (zh) * 2017-12-29 2018-06-01 深圳云天励飞技术有限公司 人脸布控方法、电子设备及存储介质
US20180276527A1 (en) * 2017-03-23 2018-09-27 Hitachi, Ltd. Processing Method Using Convolutional Neural Network, Convolutional Neural Network Learning Method, and Processing Device Including Convolutional Neural Network
CN108765319A (zh) * 2018-05-09 2018-11-06 大连理工大学 一种基于生成对抗网络的图像去噪方法
CN109359726A (zh) * 2018-11-27 2019-02-19 华中科技大学 一种基于winograd算法的卷积神经网络优化方法
CN110580522A (zh) * 2019-11-07 2019-12-17 深圳云天励飞技术有限公司 卷积计算方法及相关设备
CN111222465A (zh) * 2019-11-07 2020-06-02 深圳云天励飞技术有限公司 基于卷积神经网络的图像分析方法及相关设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10860879B2 (en) * 2016-05-16 2020-12-08 Raytheon Technologies Corporation Deep convolutional neural networks for crack detection from image data
US10567248B2 (en) * 2016-11-29 2020-02-18 Intel Corporation Distributed assignment of video analytics tasks in cloud computing environments to reduce bandwidth utilization
CN107578055B (zh) * 2017-06-20 2020-04-14 北京陌上花科技有限公司 一种图像预测方法和装置
US11176403B1 (en) * 2018-09-06 2021-11-16 Amazon Technologies, Inc. Filtering detected objects from an object recognition index according to extracted features
CN109146000B (zh) * 2018-09-07 2022-03-08 电子科技大学 一种基于冰冻权值改进卷积神经网络的方法及装置
US11823033B2 (en) * 2018-09-13 2023-11-21 Intel Corporation Condense-expansion-depth-wise convolutional neural network for face recognition
CN109493287B (zh) * 2018-10-10 2022-03-15 浙江大学 一种基于深度学习的定量光谱数据分析处理方法
US20210279603A1 (en) * 2018-12-13 2021-09-09 SparkCognition, Inc. Security systems and methods
CN109784372B (zh) * 2018-12-17 2020-11-13 北京理工大学 一种基于卷积神经网络的目标分类方法
CN109977793B (zh) * 2019-03-04 2022-03-04 东南大学 基于变尺度多特征融合卷积网络的路侧图像行人分割方法
CN110414305A (zh) * 2019-04-23 2019-11-05 苏州闪驰数控系统集成有限公司 人工智能卷积神经网络人脸识别系统
CN110119805B (zh) * 2019-05-10 2022-06-21 东南大学 基于回声状态网络分类的卷积神经网络算法
US11361552B2 (en) * 2019-08-21 2022-06-14 Micron Technology, Inc. Security operations of parked vehicles

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276527A1 (en) * 2017-03-23 2018-09-27 Hitachi, Ltd. Processing Method Using Convolutional Neural Network, Convolutional Neural Network Learning Method, and Processing Device Including Convolutional Neural Network
CN107886164A (zh) * 2017-12-20 2018-04-06 东软集团股份有限公司 一种卷积神经网络训练、测试方法及训练、测试装置
CN108108711A (zh) * 2017-12-29 2018-06-01 深圳云天励飞技术有限公司 人脸布控方法、电子设备及存储介质
CN108765319A (zh) * 2018-05-09 2018-11-06 大连理工大学 一种基于生成对抗网络的图像去噪方法
CN109359726A (zh) * 2018-11-27 2019-02-19 华中科技大学 一种基于winograd算法的卷积神经网络优化方法
CN110580522A (zh) * 2019-11-07 2019-12-17 深圳云天励飞技术有限公司 卷积计算方法及相关设备
CN111222465A (zh) * 2019-11-07 2020-06-02 深圳云天励飞技术有限公司 基于卷积神经网络的图像分析方法及相关设备

Also Published As

Publication number Publication date
CN111222465B (zh) 2023-06-13
US20220215655A1 (en) 2022-07-07
CN111222465A (zh) 2020-06-02
US11551438B2 (en) 2023-01-10

Similar Documents

Publication Publication Date Title
WO2021139156A1 (zh) 卷积计算方法及相关设备
Xia et al. Automatic generation method of test scenario for ADAS based on complexity
CN111078488A (zh) 数据采集方法、装置、存储介质及系统
Mirzaee et al. Solving singularly perturbed differential-difference equations arising in science and engineering with Fibonacci polynomials
CN103561123B (zh) Ip段归属确定方法和装置
CN105493152A (zh) 图像处理装置以及图像处理程序
CN107958349A (zh) 任务分配方法、装置、计算机设备和存储介质
JP2020517002A5 (zh)
CN113572697A (zh) 一种基于图卷积神经网络与深度强化学习的负载均衡方法
CN109002544B (zh) 一种数据处理方法、装置和计算机可读介质
CN110580522A (zh) 卷积计算方法及相关设备
JP2020204894A5 (zh)
DE10357661A1 (de) Modularer Montgomery-Multiplizierer und zugehöriges Multiplikationsverfahren
CN109359542B (zh) 基于神经网络的车辆损伤级别的确定方法及终端设备
CN114461858A (zh) 一种因果关系分析模型构建及因果关系分析方法
CN111859267B (zh) 基于bgw协议的隐私保护机器学习激活函数的运算方法
CN106569734B (zh) 数据洗牌时内存溢出的修复方法及装置
CN116167425B (zh) 一种神经网络加速方法、装置、设备及介质
Mercan et al. Computing sequence covering arrays using unified combinatorial interaction testing
CN116611812A (zh) 基于人工智能的车辆配件定损方法、装置、设备及介质
CN111431977A (zh) 区块链系统中作恶节点的处理方法及系统
CN111324433A (zh) 一种数据计算的方法及相关设备
CN115938477A (zh) 多性状育种值的测定方法、装置、设备及存储介质
CN113254996B (zh) 图神经网络训练方法、装置、计算设备及存储介质
CN106022909B (zh) 一种账户信息维护方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911905

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911905

Country of ref document: EP

Kind code of ref document: A1