WO2018232615A1 - 一种信号处理方法及装置 - Google Patents

一种信号处理方法及装置 Download PDF

Info

Publication number
WO2018232615A1
WO2018232615A1 PCT/CN2017/089302 CN2017089302W WO2018232615A1 WO 2018232615 A1 WO2018232615 A1 WO 2018232615A1 CN 2017089302 W CN2017089302 W CN 2017089302W WO 2018232615 A1 WO2018232615 A1 WO 2018232615A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
real
input signal
complex
convolution kernel
Prior art date
Application number
PCT/CN2017/089302
Other languages
English (en)
French (fr)
Inventor
许若圣
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/089302 priority Critical patent/WO2018232615A1/zh
Priority to CN201780094036.2A priority patent/CN110998610B/zh
Publication of WO2018232615A1 publication Critical patent/WO2018232615A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of signal processing technologies, and in particular, to a signal processing method and apparatus.
  • ANN Artificial neural networks
  • N neural networks
  • the basic unit of a neural network is a "neuron" that can be thought of as a computation and storage unit.
  • the calculation is the calculation of the input signal of the neuron.
  • Storage is the result of the neuron temporary storage calculation and passed to the next layer of neurons.
  • the basic structure of a neural network is to link the signals of many input neurons as an output signal of an output neuron. The output signal of this output neuron can also be the input of another "neuron".
  • the input signal of the neural network is processed by the parameter sharing method, that is, the smaller parameter template is used for sliding filtering on the input signal spatial domain, which is similar to convolution of the input signal by using a convolution template. Therefore, this artificial neural network is also called a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the mainstream convolutional neural network is large in scale and computationally intensive. The huge computational requirements are the main obstacles to the implementation of CNN-based artificial intelligence algorithms. Therefore, how to improve related algorithms, improve the processing efficiency of CNN algorithm, and reduce the computational complexity of CNN convolutional layer is a hot issue of current concern and research.
  • the technical problem to be solved by the embodiments of the present application is to provide a signal processing method and device, which can reduce the amount of signal processing operations to a certain extent, thereby improving signal processing efficiency.
  • the embodiment of the present application provides a signal processing method, the method includes: first acquiring at least two real input signal matrices, and splicing the at least two real input signal matrices into a complex input signal matrix; a complex convolution kernel matrix of the complex input signal matrix, and then performing a Fourier transform on the complex input signal matrix and the complex convolution kernel matrix respectively to obtain a first matrix of the complex input signal matrix, the complex convolution kernel matrix And a second matrix, and multiplying the first matrix and the second matrix by a complex matrix to obtain a third matrix; finally, performing a Fourier inverse transform on the third matrix to obtain a real output signal matrix.
  • the real input signal matrix includes a plurality of real elements, each real element being a computer-processable initial signal;
  • the complex convolution kernel matrix includes a plurality of complex elements, each complex element being a complex convolution kernel a coefficient, the complex convolution kernel coefficient is obtained by splicing the real number convolution kernel coefficients, and the complex convolution kernel matrix is in one-to-one correspondence with the complex input signal matrix;
  • the real output signal matrix is a convolution operation result, and includes multiple A computer-processable output signal.
  • the embodiment of the present application performs Fourier transform on the complex input matrix matrix and the complex convolution kernel matrix corresponding to the complex input signal matrix Obtaining a real output signal matrix, and obtaining a real output signal moment by directly performing convolution operation on the at least two real input signal matrices and the real convolution kernel matrix corresponding to the at least two real input signal matrices
  • the calculation amount of the signal processing can be effectively reduced, thereby saving software and hardware resources to a certain extent and improving signal processing efficiency.
  • acquiring at least two real input signal matrices comprises: first receiving at least two real input signal matrices, and then grouping the at least two real input signal matrices, each packet comprising two real numbers Input signal matrix.
  • the splicing the at least two real input signal matrices into a complex input signal matrix includes: splicing two real input signal matrices included in each group into a complex input signal matrix for each group, and each group correspondingly obtains one Complex input signal matrix.
  • the real part of the complex input signal matrix is a real input signal matrix of two real input signal matrices included in each group, and the imaginary part of the complex input signal matrix is in the matrix of two real input signals included in each group Another real input signal matrix.
  • the real part of the complex input signal matrix is a real input signal matrix of two real input signal matrices included in each packet, and the imaginary part of the complex input signal matrix is included for each packet
  • the real input signal matrix obtained by inverting another real input signal matrix symbol in the two real input signal matrices.
  • acquiring the complex convolution kernel matrix of the complex input signal matrix comprises: first acquiring at least two real convolution kernel matrices, and then splicing the at least two real convolution kernel matrices into a complex volume The matrix of accumulation.
  • the complex convolution kernel matrix is in one-to-one correspondence with the complex input signal matrix
  • the real part of the complex convolution kernel matrix is a real convolution kernel matrix corresponding to the real part of the complex input signal matrix
  • the part is a real convolution kernel matrix obtained by inverting the real convolution kernel matrix symbol corresponding to the imaginary part of the complex input signal matrix.
  • the real part of the complex convolution kernel matrix is a real convolution kernel matrix corresponding to the real part of the complex input signal matrix
  • the imaginary part of the complex convolution kernel matrix is the complex input signal matrix The real imaginary kernel matrix corresponding to the imaginary part.
  • the received matrix of at least two real input signals is two real input signal matrices
  • only the two real input signal matrices can be spliced into a complex input signal matrix, correspondingly only Obtaining a third matrix, so obtaining a real output signal matrix by performing inverse Fourier transform on the third matrix, comprising: first performing inverse Fourier transform on the third matrix to obtain a complex output signal matrix; and then acquiring the matrix The real part of the complex output signal matrix is obtained as a matrix of the real output signals.
  • more than two real input signal matrices may be spliced into multiple complex input signals.
  • the obtaining the real output signal matrix by performing inverse Fourier transform on the third matrix comprises: first adding the third matrix of each group to obtain a sum matrix; then performing Fourier inverse on the sum matrix Transforming to obtain a complex output signal matrix; finally obtaining the real part of the complex output signal matrix to obtain the real output signal matrix.
  • the initial signal is at least one of an image signal, an audio signal, a sensor signal, or a communication signal
  • the real input signal matrix is a real output signal matrix of a previous stage, and the real input signal matrix passes through a circuit interface. Or a software logic interface input
  • the real convolution kernel matrix is obtained according to a preset convolution kernel coefficient, and the convolution kernel coefficient is stored in reverse order.
  • an embodiment of the present application provides a signal processing apparatus, where the apparatus includes: a first acquiring module, and a spelling The module, the second obtaining module, the first processing module, the second processing module, and the third obtaining module are used to execute any one of the methods described in the above first aspect.
  • an embodiment of the present application provides a data processing apparatus, including: a processor, a memory, the memory, where the memory is connected by a bus, where the memory stores executable program code, and the processor is configured to invoke the executable
  • the program code performs the signal processing method according to any one of claims 1 to 10.
  • an embodiment of the present application provides a computer readable storage medium having instructions stored therein that, when run on a computer, cause the computer to perform the methods described in the above aspects.
  • an embodiment of the present application provides a computer program product comprising instructions, which when executed on a computer, cause the computer to perform the method described in the above aspects.
  • FIG. 1 is a schematic structural diagram of a fully connected neural network according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a neural network implementation scenario provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an FFT accelerated convolution algorithm according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a signal processing method according to an embodiment of the present application.
  • FIG. 5a is a schematic diagram of another FFT acceleration convolution algorithm according to an embodiment of the present application.
  • FIG. 5b is a schematic diagram of still another FFT accelerated convolution algorithm according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a method for implementing signal processing according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of another signal processing apparatus according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a full-connect (FC) neural network according to an embodiment of the present application.
  • a node of a neural network that is, a calculation and storage unit of a neuron, is represented by a circle, and its stored value is the signal value of the node.
  • the circle labeled "+1" in the figure is the offset node.
  • the layer formed by all the leftmost nodes of the neural network is called the input layer Layer L 1 , and the computer-processable input signal is input through the input layer, and the input signal includes an image signal, an audio signal, a sensor signal, a communication signal, etc., and the input signal can pass Circuit interface or software logic interface input.
  • the layer formed by all the nodes in the middle of the neural network is called the hidden layer Layer L 2 .
  • Figure 1 only takes the hidden layer as only one layer of nodes. In fact, it can also be composed of multiple hidden layers.
  • the layer composed of all the nodes on the far right of the neural network is called the output layer Layer L 3 .
  • Figure 1 only takes the output layer as one node as an example, and actually can be multiple nodes.
  • the neural network links the signals of multiple input neurons as an output signal of the output neuron, and the output signal of this output neuron can also be the input of another neuron. Any layer in the neural network can be thought of as a first-level logical processing of the signal.
  • x 1 , x 2 , x 3 are input signals, which are input through the input layer of the neural network
  • H W,b (X) are output signals, which are output through the output layer of the neural network.
  • the neural network non-linearly transforms the input signal through a large number of internal nodes to achieve the purpose of processing information.
  • the input signal is processed by using a nonlinear activation function f(x), for example, the function f(x) may be, for example, a modified linear unit (Rectified Linear Units).
  • the output signal can be directly output, or the output signal can be used as an input signal of the neuron of the next stage of the neural network, thereby continuing to process the signal.
  • the above process is an embodiment of the forward propagation of a neural network.
  • the forward propagation of the neural network can be used to determine the signal, and the application of the convolutional neural network is typically used for forward propagation.
  • the number of parameters of the fully connected neural network is relatively large, which may cause the neural network to be too large in size. Therefore, the embodiment of the present application provides a method for parameter sharing, that is, sliding filtering is performed on a spatial domain of an input signal by using a smaller parameter template, which is similar to convolution of an input signal by using a convolution template.
  • This neural network is therefore also referred to as a convolutional neural network CNN, which may comprise multiple convolutional layers. Its operation can be described by the following formula.
  • N is the number of input signals
  • n is the number of convolution kernels; the linear convolution of the two is:
  • the convolutional neural network can be used for functions such as image processing, such as image object detection, image object classification, and the like.
  • image processing such as image object detection, image object classification, and the like.
  • the current mainstream convolutional neural network is still large. How to accelerate the convolutional layer operation is the key to deploying CNN.
  • the embodiment of the present application provides a method for implementing convolution operation by using Fast Fourier Transform (FFT), which can utilize the efficiency of the FFT operation to accelerate the convolution operation and ensure the prediction performance of the neural network is not change.
  • FFT Fast Fourier Transform
  • Circular convolution of two functions such as It can be implemented by FFT and Inverse Fast Fourier Transform (IFFT):
  • the symbol Represents the complex matrix point multiplication.
  • the input signal of the neural network may be various forms of signals such as a voice signal, a text signal, an image signal, and a temperature signal
  • the voice signal may be a voice signal recorded by a recording device, a mobile phone, or a fixed signal.
  • the text signal may be a TXT text signal, a Word text signal, and a PDF text signal, etc.
  • the image signal may be a landscape signal captured by the camera, and monitored.
  • the image signal of the community environment captured by the device and the facial signal of the face acquired by the access control system, etc., the input signals of the neural network include other various computer-processable engineering signals, which are not enumerated here.
  • the embodiment of the present application provides a specific implementation scenario of the neural network 100.
  • the mobile smart phone client 201 initiates a voice call to the mobile smart phone client 205, and the voice signal is sent by the smart phone 202 and forwarded by the base station 203.
  • the smart phone 204 due to a sudden rainstorm and a strong lightning thunder, when the voice call is initiated, causes the input signal 206 to be severely weakened and contains a large amount of noise.
  • the input signal can be, for example, a one-dimensional digital voice signal, and the smart phone 204 It is equipped with a neural network 100, which may be implemented in a chip in the form of a dedicated circuit, or may be a program instruction running in a central processing unit (CPU) or other processor.
  • CPU central processing unit
  • the input signal 206 is processed in a neural network in the smartphone 204, the process including noise removal and effective signal enhancement, etc., resulting in an output signal 207 that completely preserves the voice information transmitted by the calling user, avoiding harsh natural The interference of the environment on the signal.
  • An embodiment of the present application further provides an FFT accelerated convolution algorithm for each convolutional layer of a convolutional neural network.
  • FIG. 3 only takes two real input signals as an example, and so on. .
  • the input signal in l and the convolution kernel f l,k are first expanded into a complex matrix whose imaginary part is 0, and then the input signal and the convolution kernel are respectively subjected to FFT, and the FFT results of the two are subjected to complex point multiplication.
  • the point multiplication results of the L input channels are accumulated, that is, the complex matrix is summed, and the accumulated result is subjected to IFFT to obtain the result of the kth output channel of the convolutional layer:
  • the fast Fourier transform is implemented based on the complex operation.
  • the efficiency of the FFT operation can be used to further accelerate the convolution operation, and the prediction performance of the neural network is unchanged.
  • the embodiment of the present application provides a signal processing method, which can solve the problem that the input signal is a two-dimensional real matrix convolutional neural network, and the FFT is used to accelerate the convolution operation when the complex operation is underutilized, thereby further reducing signal processing.
  • the amount of calculation increases the efficiency of signal processing.
  • a signal processing method provided by an embodiment of the present application can be applied to a convolutional neural network in the field of artificial intelligence.
  • the method can be applied to include one or more arithmetic units (such as a CPU), one or more storage units (such as a hard disk) On the computer system.
  • the computer system includes, but is not limited to, a PC, a server, a graphics processing unit (GPU), a mobile phone, and a dedicated computing processing chip (eg, an artificial intelligence processing chip).
  • the neural network can be a program running on a CPU or other processor, or it can be implemented in a chip in the form of a dedicated circuit.
  • a signal processing method provided by an embodiment of the present application is described by taking a certain convolutional layer of a convolutional neural network as an example, and the signal processing process of other convolutional layers of the convolutional neural network is deduced by analogy.
  • FIG. 4 is another signal processing method provided by the embodiment of the present application, that is, the operation processing of the foregoing FIG. 3 is further optimized to improve the operation efficiency, and the method includes but is not limited to the following step:
  • the real input signal matrix includes a plurality of real elements, each real element being a computer-processable initial signal, and the initial signal is at least one of an image signal, an audio signal, a sensor signal, or a communication signal.
  • the real-numbered input signal matrix may be an array, a one-dimensional vector, a two-dimensional vector, or the like, which is not limited in the embodiment of the present application.
  • the at least two real input signal matrices may be an initial input signal matrix of the convolutional neural network, that is, a signal value stored or received by an input layer of the convolutional neural network, or may be a real output signal of a previous stage of the convolutional neural network. matrix.
  • the real input signal matrix may be input through a circuit interface or may be input through a software logic interface.
  • the specific manner of obtaining at least two real input signal matrices includes: for the current convolutional layer of the convolutional neural network, if the current convolutional layer has only two input channels, directly receiving two input channels Input real input signal matrix; if the current convolution layer has only one input channel or more than two input channels, first receive at least two real input signal matrices input by each input channel, and then receive at least two received
  • the real input signal matrix is grouped, and each packet includes two real input signal matrices.
  • the real input signal matrix corresponds to the input channel, and the grouping process may be grouped according to the channel identifier corresponding to the real input signal matrix.
  • the at least two real input signal matrices are spliced into a complex input signal matrix.
  • each packet after grouping the at least two real input signal matrices, each packet includes two real input signal matrices.
  • the two real input signal matrices included in each group are spliced into a complex input signal matrix, and each group correspondingly obtains a complex input signal matrix, including: inputting a real number in the two real number input signal matrices
  • the signal matrix is the real part of the complex input signal matrix, and the other real input signal matrix of the two real input signal matrices is taken as the imaginary part of the complex input signal matrix.
  • two real input signal matrices included in each packet are spliced into a complex input signal matrix, and each packet corresponds to a complex input signal matrix.
  • the real part of the complex input signal matrix is a real input signal matrix of two real input signal matrices included in each group, and the imaginary part of the complex input signal matrix is two real input signal matrices included in each group The real input signal matrix obtained by inverting another real input signal matrix symbol in the matrix.
  • the received at least two real input signals are not required to be grouped, and the input order of the real input signal matrix may be from the two inputs.
  • the real input signal matrix input to one of the input channels of the channel as the real part of the complex input signal matrix, from which the two The real input signal matrix input to the other input channel in the input channel is used as the imaginary part of the complex input signal matrix. It may also be a real input signal matrix input from one of the two input channels as a real part of the complex input signal matrix, and a real input signal matrix input from the other input channel of the two input channels.
  • the real input signal matrix obtained after the sign is inverted is used as the imaginary part of the complex input signal matrix.
  • the remaining real input signal matrix cannot be compared with other The real input signal matrix is spliced.
  • the remaining real input signal matrix may be used as the real part of the complex input signal matrix, and the imaginary part of the complex input signal matrix is set to 0; or the remaining real number may be The input signal matrix is directly subjected to a convolution operation. After the at least two real input signal matrices are spliced into a complex input signal matrix, only one complex input signal matrix or a plurality of complex input signal matrices may be obtained as an output result.
  • the complex convolution kernel matrix includes a plurality of complex elements, each complex element is a complex convolution kernel coefficient, and the complex convolution kernel coefficients are obtained by splicing according to real convolution kernel coefficients.
  • the complex convolution kernel matrix has a one-to-one correspondence with the complex input signal matrix.
  • At least two real convolution kernel matrices are obtained, where the real convolution kernel matrix includes a plurality of real elements, each real element is a real convolution kernel coefficient, and the at least two real convolution kernel matrices and the at least two The real input signal matrices are in one-to-one correspondence, for example, may include the same channel identifier; then the at least two real convolution kernel matrices are spliced into a complex convolution kernel matrix, wherein the real part of the complex convolution kernel matrix corresponds to The real number convolution kernel matrix corresponding to the real part of the complex input signal matrix, and the imaginary part of the complex convolution kernel matrix is the real volume obtained by inverting the sign of the real convolution kernel matrix corresponding to the imaginary part of the corresponding complex input signal matrix The matrix of accumulation.
  • the merging the at least two real convolution kernel matrices into a complex convolution kernel matrix includes: using a real convolution kernel matrix corresponding to a real part of the complex input signal matrix as the complex convolution The real part of the kernel matrix, the real convolution kernel matrix corresponding to the imaginary part of the complex input signal matrix is used as the imaginary part of the complex convolution kernel matrix.
  • the complex convolution kernel matrix may be preset by the related device according to a preset rule, and the complex convolution kernel matrix corresponding to the spliced complex input signal matrix may be directly obtained. Therefore, the operation of merging the real convolution kernel matrix to obtain the complex convolution kernel matrix can be avoided, and the computational complexity of the convolutional neural network can be reduced to some extent, and the efficiency of signal processing is improved.
  • performing Fourier transform on the complex input signal matrix and the complex convolution kernel matrix respectively includes: performing discrete Fourier transform on the complex input signal matrix and the complex convolution kernel matrix respectively (discrete Fourier transform, DFT) or fast Fourier transform.
  • DFT discrete Fourier transform
  • Each complex input signal matrix is subjected to Fourier transform to obtain a first matrix
  • each complex convolution kernel matrix is subjected to Fourier transform to obtain a second matrix
  • the first matrix and the second matrix are plural matrix.
  • the real output signal matrix is a convolution operation result of the current convolutional layer of the convolutional neural network, and includes a plurality of computer-processable output signals.
  • the two real input signal matrices can only be spliced into a complex input signal matrix, and the first matrix and the second matrix are multiplied by a complex matrix, and only one third matrix can be obtained.
  • obtaining a matrix of real output signals includes: first performing inverse Fourier transform on the third matrix to obtain a matrix of complex output signals; and then obtaining a matrix of the complex output signal matrix And obtain the matrix of the real output signals. If the received at least two real input signal matrices include more than two real input signal matrices, more than two real input signal matrices may be spliced into a plurality of complex input signal matrices, corresponding to multiple thirds. matrix.
  • the multi-matrix multiplication of the first matrix and the second matrix to obtain a third matrix includes: performing, for each packet, multi-matrix multiplication of the first matrix and the second matrix of each packet, each The group correspondence results in a third matrix.
  • Obtaining the real output signal matrix by performing inverse Fourier transform on the third matrix includes: first adding the third matrix of each group, that is, adding the elements of the corresponding positions in the third matrix respectively, and obtaining And a matrix; then performing an inverse Fourier transform on the sum matrix to obtain a complex output signal matrix; finally obtaining a real part of the complex output signal matrix to obtain the real output signal matrix.
  • the specific implementation manner of determining the real output signal matrix of the current convolutional layer of the convolutional neural network is to determine the specific steps of the real output signal matrix of an output channel of the current convolutional layer of the convolutional neural network.
  • the specific manner of the real output signal matrix of the other output channels of the current convolutional layer of the convolutional neural network can be deduced by analogy, and will not be described here.
  • the real output signal matrix may be directly output to the convolutional neural network, or the real output signal matrix may be used as the next convolutional layer of the convolutional neural network.
  • the real input signal matrix is to determine the specific steps of the real output signal matrix of an output channel of the current convolutional layer of the convolutional neural network.
  • the specific manner of the real output signal matrix of the other output channels of the current convolutional layer of the convolutional neural network can be deduced by analogy, and will not be described here.
  • the real output signal matrix may be directly output to the convolutional neural network, or the real output signal
  • the convolutional neural network is used for image processing as an example.
  • the input signal of each convolutional layer of the convolutional neural network is a combination of L 2D images (or 2D real matrices).
  • an input number of channels L is an even number when inputting other types of images.
  • FFT(in_C) FFT(in 0 )+FFT(in 1 ) ⁇ i (1);
  • FFT(f_C) FFT(f 0,k )-FFT(f 1,k ) ⁇ i (2);
  • FIG. 5b also takes only two real input signals as an example.
  • FFT(in_C) FFT(in 0 )-FFT(in 1 ) ⁇ i (4);
  • FFT(f_C) FFT(f 0,k )+FFT(f 1,k ) ⁇ i (5);
  • the signal processing method provided by the embodiment of the present application implements two convolution summations, and only requires two FFTs and one IFFT.
  • the other FFT-accelerated convolution algorithm mentioned above (Fig. 3) requires 4 FFTs and 1 IFFT for 2 convolutional summations.
  • the signal processing method provided by the embodiment of the present application only needs 50% of the FFT operation amount and the same complex point multiplier, so that the FFT convolution can be realized. Therefore, the signal processing method provided by the embodiment of the present application can effectively reduce the FFT operation amount and improve the signal processing efficiency.
  • the signal processing method provided by the embodiment of the present application uses a complex FFT to unify the processed data type, which can avoid the real FFT in the implementation of an application specific integrated circuit (ASIC) using the symmetry of the Hermitian Hermitian matrix.
  • ASIC application specific integrated circuit
  • the complexity of the processing flow is increased, which is beneficial to the module implementation.
  • the at least two real input signal matrices in(n 0 , n 1 , l) are first grouped according to an input channel identifier, Take the grouping in the order of channel index as an example to get:
  • in_C is smaller than the size of the FFT, and zero is added to the right and below the matrix in_C, so that the size of in_C is consistent with the size of the FFT.
  • 2D FFT is performed on the complex input signal matrix of each group, and each group obtains a first matrix, and the result is saved for standby; respectively, 2D FFT is performed on the complex convolution kernel matrix of each group, and each group obtains a first Two matrix. Further, the first matrix of each packet is multiplied by a complex matrix of the corresponding second matrix, each packet correspondingly obtains a third matrix, and the third matrix of each packet is accumulated to obtain a sum matrix, that is, convolution
  • the at least two real input signal matrices may be grouped in groups according to the order of the channel index, or the at least two real input signal matrices may be grouped according to the parity of the channel index.
  • the at least two real input signal matrices may be grouped in groups according to other rules, which are not limited in this embodiment.
  • the signal processing method provided by the embodiment of the present application combines the real input signal matrix of two input channels into a complex input signal matrix, and splices its corresponding real convolution kernel matrix into a complex convolution kernel matrix, and then inputs the complex input signal.
  • the matrix and the complex convolution kernel matrix perform FFT operations, thereby maximizing the complex computing performance of the FFT and reducing the amount of FFT used in the convolutional layer; thereby reducing the time for the processor to run the convolutional neural network and reducing system power consumption. , saving hardware and software resources.
  • the processor comprises a general purpose processor (such as a CPU) or a logic circuit processor.
  • a signal processing method provided by an embodiment of the present application can be applied to a convolutional neural network based on CPU operation, and can also be applied to a convolutional neural network based on ASIC implementation, and can also be applied to a convolutional neural network based on GPU operation.
  • the model file of the convolutional neural network is stored in an external storage medium, such as double rate synchronous dynamic random access memory (DDR) memory.
  • DDR double rate synchronous dynamic random access memory
  • the computing unit of the CPU or GPU or ASIC When the convolutional neural network performs a certain layer of convolutional layer operations, the computing unit of the CPU or GPU or ASIC first reads the convolution kernel coefficients of the current convolutional layer from the model file in the storage medium, and the current convolutional layer Parameters include: number of input channels, number of output channels, convolution kernel size, convolution step size, etc., and then perform CNN operations according to the convolution kernel coefficients and parameters described above.
  • the intermediate data during the processing of the arithmetic unit can also be temporarily stored in the storage medium and read back when it needs to be read. In some feasible implementation manners, please refer to FIG.
  • the embodiment of the present application may perform model preprocessing on the original model file of the convolutional neural network, and store the model file preprocessed by the model in internal or external storage.
  • the operation unit When the convolutional neural network performs a layer of convolution layer operation, the operation unit first reads the model coefficients of the current convolution layer from the model preprocessed model file in the storage medium, and the parameters of the current convolution layer, and then Perform CNN operations.
  • the intermediate data during the processing of the arithmetic unit can also be temporarily stored in the storage medium and read back when it needs to be read.
  • the convolution coefficient in CNN is the reverse order of the classical convolution coefficient. Therefore, preprocessing the original model file includes: storing the original real convolution kernel coefficients in reverse order and storing them in the model file. Therefore, it is possible to avoid the reverse order operation of the real convolution kernel matrix when the convolutional neural network performs real-time processing, and reduce the computational complexity of the convolutional neural network.
  • the real convolution kernel matrix may be determined according to the real convolution kernel coefficients stored in the reverse order, and the real convolution kernel matrix is stored in the storage medium. Therefore, the operation of determining the real convolution kernel matrix according to the real convolution kernel coefficient when the convolutional neural network performs real-time processing can be avoided, and the calculation amount of the convolutional neural network is further reduced.
  • the real convolution kernel matrix of the convolutional neural network is stored in a storage medium in the following format:
  • pre-processing the original model file further includes: as a complex convolution kernel matrix determined according to a preset rule The real convolution kernel matrix of the part is inverted and stored in the storage medium.
  • the real convolution kernel matrix may be spliced into a complex convolution kernel matrix according to a preset rule, and the complex convolution kernel matrix is stored in the storage medium.
  • the storage format of the real convolution kernel matrix in the storage medium can be adjusted according to the complex format of the processor, so as to better read the real convolution kernel matrix.
  • determining whether to use the FFT to implement convolution according to a preset rule For example, it may be that the block size or the rank of the convolution kernel matrix corresponding to the current convolutional layer is greater than or equal to a preset value, and if yes, the signal processing method provided by the embodiment of the present application is used to perform the signal of the convolutional neural network. Processing; if not, the convolutional neural network signal is processed using a conventional convolution operation.
  • At least two real input signal matrices are first obtained, and the at least two real input signal matrices are spliced into a complex input signal matrix, and then a complex convolution kernel matrix of the complex input signal matrix is obtained, and The complex input signal matrix and the complex convolution kernel matrix are respectively subjected to Fourier transform to obtain a first matrix of the complex input signal matrix, a second matrix of the complex convolution kernel matrix, and finally the first matrix and the second Matrix complex
  • the number matrix is multiplied to obtain a third matrix, and the inverse matrix Fourier transform is performed to obtain a real output signal matrix, which can reduce the amount of signal processing operations to a certain extent, thereby improving signal processing efficiency.
  • FIG. 7 is a schematic structural diagram of a signal processing apparatus according to an embodiment of the present application.
  • the signal processing device shown in FIG. 7 may include a first obtaining module 701, a splicing module 702, a second obtaining module 703, a first processing module 704, a second processing module 705, and a third obtaining module 706, wherein each module
  • the detailed description is as follows.
  • the first obtaining module 701 is configured to obtain at least two real input signal matrices, where the real input signal matrix includes a plurality of real elements, each of the real elements being a computer-processable initial signal.
  • the splicing module 702 is configured to splicing the at least two real input signal matrices into a complex input signal matrix.
  • the second obtaining module 703 is configured to obtain a complex convolution kernel matrix of the complex input signal matrix, where the complex convolution kernel matrix includes a plurality of complex elements, and each complex element is a complex convolution kernel coefficient.
  • the first processing module 704 is configured to perform Fourier transform on the complex input signal matrix and the complex convolution kernel matrix respectively to obtain a first matrix of the complex input signal matrix, where the complex convolution kernel matrix The second matrix.
  • the second processing module 705 is further configured to perform complex matrix dot multiplication on the first matrix and the second matrix to obtain a third matrix.
  • the third obtaining module 706 is further configured to obtain a real output signal matrix by performing inverse Fourier transform on the third matrix, where the real output signal matrix is a convolution operation result, and includes multiple computer processable outputs. signal.
  • the first obtaining module 701 specifically includes:
  • the receiving unit 7011 is configured to receive at least two real input signal matrices.
  • the grouping unit 7012 is configured to group the at least two real input signal matrices, and each group includes two real input signal matrices.
  • the splicing module 702 is specifically configured to splicing the two real input signal matrices into the complex input signal matrix for each of the groups.
  • the real part of the complex input signal matrix is a real input signal matrix of the two real input signal matrices
  • the imaginary part of the complex input signal matrix is another real input of the two real input signal matrices Signal matrix.
  • a real part of the complex input signal matrix is a real input signal matrix of the two real input signal matrices
  • an imaginary part of the complex input signal matrix is the two real inputs The real input signal matrix obtained by inverting another real input signal matrix symbol in the signal matrix.
  • the second obtaining module 703 specifically includes:
  • the first obtaining unit 7031 is configured to acquire at least two real convolution kernel matrices.
  • the splicing unit 7032 is further configured to splicing the at least two real convolution kernel matrices into the complex convolution kernel matrix.
  • the real part of the complex convolution kernel matrix is a real convolution kernel matrix corresponding to the real part of the complex input signal matrix, and the imaginary part of the complex convolution kernel matrix is corresponding to the imaginary part of the complex input signal matrix
  • the real convolution kernel matrix obtained by negating the real convolution kernel matrix symbol.
  • a real part of the complex convolution kernel matrix is a real convolution kernel matrix corresponding to a real part of the complex input signal matrix
  • an imaginary part of the complex convolution kernel matrix is the complex number The imaginary part of the input signal matrix The real number convolution kernel matrix.
  • the third obtaining module 706 specifically includes:
  • the processing unit 7061 is configured to perform inverse Fourier transform on the third matrix to obtain a complex output signal matrix.
  • the second obtaining unit 7062 is configured to obtain a real part of the complex output signal matrix to obtain the real output signal matrix.
  • the second processing module 705 is specifically configured to perform complex matrix dot multiplication on the first matrix and the second matrix for each of the packets to obtain a third matrix.
  • the third obtaining module 706 specifically includes:
  • the adding unit 7063 is further configured to add the third matrix of each group to obtain a sum matrix.
  • the processing unit 7061 is further configured to perform inverse Fourier transform on the sum matrix to obtain a complex output signal matrix.
  • the second obtaining unit 7062 is configured to obtain a real part of the complex output signal matrix to obtain the real output signal matrix.
  • the initial signal is at least one of an image signal, an audio signal, a sensor signal, or a communication signal.
  • the real input signal matrix is a matrix of real output signals of the previous stage, and the real input signal matrix is input through a circuit interface or a software logic interface.
  • the real convolution kernel matrix is obtained according to a preset convolution kernel coefficient, and the convolution kernel coefficients are stored in reverse order.
  • At least two real input signal matrices are first obtained, and the at least two real input signal matrices are spliced into a complex input signal matrix, and then a complex convolution kernel matrix of the complex input signal matrix is obtained, and The complex input signal matrix and the complex convolution kernel matrix are respectively subjected to Fourier transform to obtain a first matrix of the complex input signal matrix, a second matrix of the complex convolution kernel matrix, and finally the first matrix and the second
  • the matrix performs point multiplication of the complex matrix to obtain a third matrix, and obtains a real output signal matrix by performing inverse Fourier transform on the third matrix, which can reduce the amount of signal processing operations to a certain extent, thereby improving signal processing efficiency.
  • FIG. 8 is a schematic structural diagram of a signal processing apparatus according to an embodiment of the present disclosure.
  • the signal processing apparatus described in the embodiment of the present application includes: a processor 801, a communication interface 802, and a memory 803.
  • the processor 801, the communication interface 802, and the memory 803 can be connected by using a bus or other means.
  • the embodiment of the present application is exemplified by a bus connection.
  • the processor 801 can be a central processing unit (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP), a graphics processor (English: graphics processing unit, abbreviation: GPU), or A combination of CPU, GPU, and NP.
  • the processor 801 can also be a core for implementing communication identity binding in a multi-core CPU, a multi-core GPU, or a multi-core NP.
  • the processor 801 described above may be a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (abbreviated as PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), field programmable logic gate array (English: field-programmable Gate array, abbreviation: FPGA), general array logic (English: general array logic, abbreviation: GAL) or any combination thereof.
  • the above communication interface 802 can be used for transceiving information or signaling interactions, as well as receiving and transmitting signals.
  • the memory 803 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a storage program required for at least one function (such as a text storage function, a location storage function, etc.); the storage data area may be stored according to The data created by the use of the device (such as image data, text data), etc., and may include an application storage program or the like.
  • the memory 803 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the above memory 803 is also used to store program instructions.
  • the processor 801 is a processor other than the hardware chip, the program instructions stored in the memory 803 can be invoked to implement the signal processing method as shown in the embodiment of the present application.
  • the processor 801 calls the program instructions stored in the memory 803 to perform the following steps:
  • the real input signal matrix comprising a plurality of real elements, each of the real elements being a computer-processable initial signal
  • the complex convolution kernel matrix comprising a plurality of complex elements, each complex element being a complex convolution kernel coefficient;
  • a real output signal matrix is obtained by inverse Fourier transforming the third matrix, the real output signal matrix being a convolution operation result, and comprising a plurality of computer processable output signals.
  • the method performed by the processor in the embodiment of the present application is described from the perspective of a processor. It can be understood that the processor in the embodiment of the present application needs to cooperate with other hardware structures to perform the foregoing method. The specific implementation process is not described and limited in detail in the embodiments of the present application.
  • the communication interface 802 is configured to receive at least two real input signal matrices.
  • the processor 801 is configured to group the at least two real input signal matrices, and each packet includes two real input signal matrices.
  • the processor 801 is specifically configured to splicing the two real input signal matrices into the complex input signal matrix for each of the packets.
  • the real part of the complex input signal matrix is a real input signal matrix of the two real input signal matrices, and the imaginary part of the complex input signal matrix is the other of the two real input signal matrices Real input signal matrix.
  • a real part of the complex input signal matrix is a real input signal matrix of the two real input signal matrices
  • an imaginary part of the complex input signal matrix is the two real inputs The real input signal matrix obtained by inverting another real input signal matrix symbol in the signal matrix.
  • the processor 801 is further configured to acquire at least two real convolution kernel matrices.
  • the processor 801 is further configured to splicing the at least two real convolution kernel matrices into the complex convolution kernel matrix.
  • the real part of the complex convolution kernel matrix is a real convolution kernel corresponding to the real part of the complex input signal matrix a matrix
  • an imaginary part of the complex convolution kernel matrix is a real convolution kernel matrix obtained by inverting a real convolution kernel matrix symbol corresponding to an imaginary part of the complex input signal matrix.
  • a real part of the complex convolution kernel matrix is a real convolution kernel matrix corresponding to a real part of the complex input signal matrix
  • an imaginary part of the complex convolution kernel matrix is the complex number The real convolution kernel matrix corresponding to the imaginary part of the input signal matrix.
  • the processor 801 is further configured to perform inverse Fourier transform on the third matrix to obtain a complex output signal matrix.
  • the processor 801 is further configured to obtain a real part of the complex output signal matrix to obtain the real output signal matrix.
  • the processor 801 is specifically configured to perform complex matrix dot multiplication on the first matrix and the second matrix for each of the packets to obtain a third matrix.
  • the processor 801 is further configured to add a third matrix of each group to obtain a sum matrix.
  • the processor 801 is further configured to perform inverse Fourier transform on the sum matrix to obtain a complex output signal matrix.
  • the processor 801 is further configured to obtain a real part of the complex output signal matrix to obtain the real output signal matrix.
  • the initial signal is at least one of an image signal, an audio signal, a sensor signal, or a communication signal.
  • the real input signal matrix is a matrix of real output signals of the previous stage, and the real input signal matrix is input through a circuit interface or a software logic interface.
  • the real convolution kernel matrix is obtained according to a preset convolution kernel coefficient, and the convolution kernel coefficients are stored in reverse order.
  • the processor 801, the communication interface 802, and the memory 803, which are described in the embodiments of the present application, may be implemented in a signal processing method provided by the embodiment of the present application, and may also be implemented in the embodiment of the present application.
  • the implementation described in a signal processing apparatus provided by 8 is not described herein.
  • At least two real input signal matrices are first obtained, and the at least two real input signal matrices are spliced into a complex input signal matrix, and then a complex convolution kernel matrix of the complex input signal matrix is obtained, and The complex input signal matrix and the complex convolution kernel matrix are respectively subjected to Fourier transform to obtain a first matrix of the complex input signal matrix, a second matrix of the complex convolution kernel matrix, and finally the first matrix and the second
  • the matrix performs point multiplication of the complex matrix to obtain a third matrix, and obtains a real output signal matrix by performing inverse Fourier transform on the third matrix, which can reduce the amount of signal processing operations to a certain extent, thereby improving signal processing efficiency.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the available medium It may be a magnetic medium (such as a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)).

Abstract

一种信号处理方法及装置,其中方法包括:获取至少两个实数输入信号矩阵;将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵;获取所述复数输入信号矩阵的复数卷积核矩阵;对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵;对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵。通过本申请实施例可以在一定程度上降低信号处理运算量,从而提高信号处理效率。

Description

一种信号处理方法及装置 技术领域
本申请涉及信号处理技术领域,尤其涉及一种信号处理方法及装置。
背景技术
人工神经网络(artificial neural networks,ANN),也简称为神经网络(NN),是一种模仿动物神经网络行为特征进行信息处理的网络结构。将待处理信号输入这种网络,通过内部大量节点对信号进行非线性变换,从而达到处理信息的目的。
神经网络的基本单元是“神经元”,可以看作一个计算与存储单元。计算是神经元对其的输入信号进行计算。存储是神经元暂存计算结果,并传递到下一层神经元。神经网络的基本结构是将许多个输入神经元的信号联结起来,作为一个输出神经元的输出信号。而这个输出神经元的输出信号也可以是另一个“神经元”的输入。
现有技术采用参数共享的方法对神经网络的输入信号进行处理,即采用较小参数模板在输入信号空间域上滑动滤波,类似于采用卷积模板对输入信号进行卷积。因此这种人工神经网络又称之为卷积神经网络(convolutional neural network,CNN)。主流的卷积神经网络规模庞大,运算量非常大,巨大的运算需求是基于CNN的人工智能算法实现的主要障碍。故而如何改进相关算法,提高CNN算法的处理效率,从而降低CNN卷积层的运算量是目前关注和研究的热点问题。
发明内容
本申请实施例所要解决的技术问题在于,提供一种信号处理方法及装置,可以在一定程度上降低信号处理运算量,从而提高信号处理效率。
第一方面,本申请实施例提供了一种信号处理方法,该方法包括:首先获取至少两个实数输入信号矩阵,并将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵;然后获取该复数输入信号矩阵的复数卷积核矩阵,接着对该复数输入信号矩阵以及该复数卷积核矩阵分别进行傅里叶变换,得到该复数输入信号矩阵的第一矩阵,该复数卷积核矩阵的第二矩阵,并对该第一矩阵以及该第二矩阵进行复数矩阵点乘,得到第三矩阵;最后通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵。
本申请实施例中,该实数输入信号矩阵包括多个实数元素,每个实数元素为计算机可处理的初始信号;该复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数,该复数卷积核系数是根据实数卷积核系数拼接得到的,该复数卷积核矩阵与该复数输入信号矩阵一一对应;该实数输出信号矩阵为卷积运算结果,且包括多个计算机可处理的输出信号。
本申请实施例在将接收到的至少两个实数输入信号矩阵拼接为复数输入信号矩阵之后,通过对该复数输入信号矩阵以及与该复数输入信号矩阵对应的复数卷积核矩阵进行傅里叶变换得到实数输出信号矩阵,相对于通过对该至少两个实数输入信号矩阵以及与该至少两个实数输入信号矩阵对应的实数卷积核矩阵直接进行卷积运算得到实数输出信号矩 阵,本申请实施例可以有效降低信号处理的运算量,从而在一定程度上节省软硬件资源,提高信号处理效率。
在一种可能的实施例中,获取至少两个实数输入信号矩阵,包括:首先接收至少两个实数输入信号矩阵,然后对该至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵。其中,将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵,包括:针对每个分组,将每个分组包括的两个实数输入信号矩阵拼接为复数输入信号矩阵,每个分组对应得到一个复数输入信号矩阵。该复数输入信号矩阵的实部为每个分组包括的两个实数输入信号矩阵中的一个实数输入信号矩阵,该复数输入信号矩阵的虚部为每个分组包括的两个实数输入信号矩阵中的另一个实数输入信号矩阵。
在一种可能的实施例中,该复数输入信号矩阵的实部为每个分组包括的两个实数输入信号矩阵中的一个实数输入信号矩阵,该复数输入信号矩阵的虚部为每个分组包括的两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
在一种可能的实施例中,获取该复数输入信号矩阵的复数卷积核矩阵,包括:首先获取至少两个实数卷积核矩阵,然后将该至少两个实数卷积核矩阵拼接为复数卷积核矩阵。该复数卷积核矩阵与该复数输入信号矩阵一一对应,该复数卷积核矩阵的实部为该复数输入信号矩阵的实部对应的实数卷积核矩阵,该复数卷积核矩阵的虚部为该复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
在一种可能的实施例中,该复数卷积核矩阵的实部为该复数输入信号矩阵的实部对应的实数卷积核矩阵,该复数卷积核矩阵的虚部为该复数输入信号矩阵的虚部对应的实数卷积核矩阵。
在一种可能的实施例中,若接收到的至少两个实数输入信号矩阵为两个实数输入信号矩阵,则只能将该两个实数输入信号矩阵拼接为一个复数输入信号矩阵,对应只能得到一个第三矩阵,故而通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:首先对该第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵;然后获取该复数输出信号矩阵的实部,得到该实数输出信号矩阵。
在一种可能的实施例中,若接收到的至少两个实数输入信号矩阵包括多于两个的实数输入信号矩阵,则可以将多于两个的实数输入信号矩阵拼接为多个复数输入信号矩阵,对应可以得到多个第三矩阵。故而对该第一矩阵以及该第二矩阵进行复数矩阵点乘,得到第三矩阵,包括:针对每个分组,对每个分组的第一矩阵以及第二矩阵进行复数矩阵点乘,每个分组对应得到一个第三矩阵。其中,通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:首先将每个分组的第三矩阵进行相加,得到和矩阵;然后对该和矩阵进行傅里叶逆变换,得到复数输出信号矩阵;最后获取该复数输出信号矩阵的实部,得到该实数输出信号矩阵。
本申请实施例中,该初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项;该实数输入信号矩阵为前一级的实数输出信号矩阵,该实数输入信号矩阵通过电路接口或者软件逻辑接口输入;该实数卷积核矩阵是根据预置的卷积核系数得到的,该卷积核系数是反序存储的。
第二方面,本申请实施例提供了一种信号处理装置,该装置包括:第一获取模块、拼 接模块、第二获取模块、第一处理模块、第二处理模块、第三获取模块,上述各个模块用于执行上述第一方面所述的任一种方法。
第三方面,本申请实施例提供了一种数据处理装置,包括:处理器、存储器,该处理器、该存储器通过总线连接,该存储器存储有可执行程序代码,该处理器用于调用该可执行程序代码,执行如权利要求1~10中任一项所述的信号处理方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
第五方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种全连接神经网络的结构示意图;
图2是本申请实施例提供的一种神经网络实施场景的示意图;
图3是本申请实施例提供的一种FFT加速卷积算法的示意图;
图4是本申请实施例提供的一种信号处理方法的流程示意图;
图5a是本申请实施例提供的另一种FFT加速卷积算法的示意图;
图5b是本申请实施例提供的又一种FFT加速卷积算法的示意图;
图6是本申请实施例提供的一种实现信号处理方法的示意图;
图7是本申请实施例提供的一种信号处理装置的结构示意图;
图8是本申请实施例提供的另一种信号处理装置的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
本申请实施例描述的一种信号处理方法应用于人工智能领域的神经网络中,神经网络的基本单元是“神经元”,可以将“神经元”看作一个软件的或硬件的计算与存储单元。请参见图1,图1是本申请实施例提供的一种全连接(full-connect,FC)神经网络的结构示意图。如图1所示,用圆圈来表示神经网络的一个节点,即神经元的计算与存储单元,其存储值即为该节点的信号值。图中标上“+1”的圆圈为偏置节点。神经网络最左边所有节点组成的一层叫做输入层Layer L1,计算机可处理的输入信号通过输入层输入,该输入信号包括图像信号、音频信号、传感器信号、通信信号等,该输入信号可以通过电路接口或者软件逻辑接口输入。神经网络中间所有节点组成的一层叫做隐藏层Layer L2,图1仅以隐藏层只包括一层节点为例,实际上也可以由多个隐藏层。神经网络最右边所有节点组成的一层叫做输出层Layer L3,图1仅以输出层只包括一个节点为例,实际上可以是多个节点。神经网络将多个输入神经元的信号联结起来,作为一个输出神经元的输出信号,而这个输出 神经元的输出信号也可以是另一个神经元的输入。神经网络中的任一层可以被认为是对信号进行了一级逻辑上的运算处理。其中,x1,x2,x3为输入信号,通过神经网络的输入层输入;HW,b(X)为输出信号,通过神经网络的输出层输出。神经网络通过内部大量节点对输入信号进行非线性变换,从而达到处理信息的目的。本申请实施例以采用非线性激活函数f(x)对输入信号进行处理为例,函数f(x)例如可以是修正线性单元(Rectified Linear Units,
Figure PCTCN2017089302-appb-000001
Figure PCTCN2017089302-appb-000002
Figure PCTCN2017089302-appb-000003
最后根据
Figure PCTCN2017089302-appb-000004
确定输出信号HW,b(X),例如可以是将
Figure PCTCN2017089302-appb-000005
相加或者取最大值。神经网络确定输出信号后,可以将输出信号直接输出,也可以将输出信号作为神经网络下一级的神经元的输入信号,从而继续对其进行处理。
上述处理过程为神经网络的前向传播的一个实施例,神经网络的前向传播可以用于对信号进行判决,而且卷积神经网络的应用通常用于进行前向传播。全连接神经网络的参数数量比较多,可能导致神经网络尺寸过于庞大。因此本申请实施例提供了一种参数共享的方法,即采用较小参数模板在输入信号空间域上滑动滤波,类似于采用卷积模板对输入信号进行卷积。因此这种神经网络又称为卷积神经网络CNN,卷积神经网络可以包括多个卷积层。其运算可以用以下公式来描述。
令待卷积输入信号为f(u),u=0~N-1,卷积核为h(v),v=0~n-1,n≤N;其中,N为输入信号的个数,n为卷积核的个数;两者的线性卷积为:
Figure PCTCN2017089302-appb-000006
其中,h(i)=0,i<0or i>n-1。
对于神经网络的某一个卷积层,
输入通道channell的输入信号为inl(n0,n1)∈R,n0,n1=0~N-1,l=0~L-1;L为输入通道数量,可以对应输入节点或输入信号数量。
输出channelk的输出信号为hk(n0,n1)∈R,n0,n1=0~N-1,k=0~K-1;K为输出通道数量,可以对应输出节点或输入信号数量。
卷积核为fl,k(n0,n1)∈R,n0,n1=0~N-1,l=0~L-1,k=0~K-1;
则,
Figure PCTCN2017089302-appb-000007
卷积神经网络可以用于进行图像处理等功能,例如图像物体检测,图像物体分类等。目前主流的卷积神经网络规模依旧很大。如何对卷积层运算加速,是部署CNN的关键。本申请实施例提供了一种利用快速傅里叶变换(Fast Fourier Transform,FFT)来实现卷积运算的方法,可以利用FFT运算的高效性来加速卷积运算,且保证神经网络的预测性能不变。具体运算过程可以用以下公式来描述:
将上述f(u)填0扩展为u=0~N+n-1,h(u)填0扩展为u=0~N+n-1;并计算f(u)与h(u)的循环卷积:
Figure PCTCN2017089302-appb-000008
其中,fN+n(u),hN+n(u)为周期N+n的函 数;yc(i)=y(i),i=0~N+n-1。
两个函数的循环卷积,如
Figure PCTCN2017089302-appb-000009
可以通过FFT和快速傅里叶逆变换(Inverse Fast Fourier Transform,IFFT)实现:
Figure PCTCN2017089302-appb-000010
Figure PCTCN2017089302-appb-000011
其中,符号
Figure PCTCN2017089302-appb-000012
表示复数矩阵点乘。
以上所述的神经网络可应用于各类通信、语音、图像处理、计算处理等应用场景。在一些可行的实施例中,该神经网络的输入信号可以是语音信号、文本信号、图像信号、温度信号等各种形式的信号,该语音信号可以是录音设备录制的语音信号、移动手机或固定电话在通话过程中接收的语音信号、以及收音机接收的电台发送的语音信号等,文本信号可以是TXT文本信号、Word文本信号、以及PDF文本信号等,图像信号可以是相机拍摄的风景信号、监控设备捕捉的社区环境的图像信号以及门禁系统获取的人脸的面部信号等,该神经网络的输入信号包括其他各种计算机可处理的工程信号,在此不再一一列举。
本申请实施例提供一种该神经网络100具体的实施场景,如图2所示,移动智能手机客户201向移动智能手机客户205发起语音呼叫,语音信号经智能手机202发出,经基站203转送给智能手机204,由于发起语音呼叫时暴雨骤起且伴有强烈的电闪雷鸣,导致输入信号206被严重削弱且含有较大的噪声,该输入信号可以例如为一维数字语音信号,智能手机204中配备有神经网络100,该神经网络可以是以专用电路的形式在芯片中实现,也可以是运行在中央处理单元(Central Processing Unit,CPU)或其他处理器中的程序指令。输入信号206在智能手机204中的神经网络中经过处理,该处理包括噪声去除以及有效信号增强等,得到输出信号207,该输出信号完整的保留了主叫用户传送的语音信息,避免了恶劣自然环境对信号的干扰。
如之前实施例所述,提高卷积运算能力对神经网络很重要。本申请实施例进一步提供了一种针对卷积神经网络的每个卷积层的FFT加速卷积算法,如图3所示,图3仅以两个实数输入信号为例,其他情况以此类推。具体地,首先将输入信号inl,卷积核fl,k扩充为虚部为0的复数矩阵,然后将输入信号以及卷积核分别进行FFT,并对两者的FFT结果进行复数点乘,最后对L个输入channel的点乘结果进行累加,即复数矩阵求和,并将累加结果进行IFFT,得到卷积层第k个输出channel的结果:
Figure PCTCN2017089302-appb-000013
上述方法中,快速傅里叶变化是基于复数运算实现的,采用上述方法,可以利用FFT运算的高效性进一步加速卷积运算,且保证神经网络的预测性能不变。
传统卷积神经网络多用于图像处理,卷积神经网络用于图像处理时的输入信号为实数二维矩阵。因此上述方法将输入实数矩阵扩充为虚部为0的复数矩阵后进行FFT运算,存在一定的运算冗余。本申请实施例提供了一种信号处理方法,可以解决输入信号为二维实数矩阵的卷积神经网络,使用FFT对卷积运算进行加速时复数运算利用率过低的问题,从而进一步降低信号处理运算量,提高信号处理效率。
本申请实施例提供的一种信号处理方法,可以应用于人工智能领域的卷积神经网络中。该方法可以应用在包括一个或多个运算单元(例如CPU),一个或多个存储单元(例如硬盘) 的计算机系统上。所述计算机系统包括但不限于PC,服务器,图形处理器(graphics processing unit,GPU),手机,专用的计算处理芯片(例如人工智能处理芯片)。神经网络可以是运行在CPU或者其它处理器上的一段程序,也可以是以专用电路的形式在芯片中实现。
本申请实施例提供的一种信号处理方法以应用于卷积神经网络的某一卷积层为例进行说明,卷积神经网络的其他卷积层的信号处理过程则以此类推。
图3所针对的方法其实还有进一步优化的空间。有鉴于此,请参见图4,是本申请实施例提供的另一种信号处理方法,即对前面的图3的运算处理过程进行进一步的优化处理以提高运算效率,该方法包括但不限于如下步骤:
S401、获取至少两个实数输入信号矩阵。
本申请实施例中,该实数输入信号矩阵包括多个实数元素,每个实数元素为计算机可处理的初始信号,该初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项。该实数输入信号矩阵可以为数组,也可以为一维向量,还可以为二维向量等,本申请实施例不作限定。该至少两个实数输入信号矩阵可以为卷积神经网络的初始输入信号矩阵,即卷积神经网络的输入层存储或者接收到的信号值,也可以为卷积神经网络前一级的实数输出信号矩阵。其中,该实数输入信号矩阵可以是通过电路接口输入的,也可以是通过软件逻辑接口输入的。
在一些可行的实施方式中,获取至少两个实数输入信号矩阵的具体方式包括:针对卷积神经网络的当前卷积层,若当前卷积层只有两个输入通道,则直接接收两个输入通道输入的实数输入信号矩阵;若当前卷积层只有一路输入通道或者有多于两个的输入通道,则首先接收各个输入通道输入的至少两个实数输入信号矩阵,然后对接收到的至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵。其中,实数输入信号矩阵与其输入的通道相对应,分组过程可以是根据实数输入信号矩阵对应的通道标识进行分组的。
S402、将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵。
本申请实施例中,在对该至少两个实数输入信号矩阵进行分组之后,每个分组包括两个实数输入信号矩阵。针对每个分组,将每个分组包括的两个实数输入信号矩阵拼接为复数输入信号矩阵,每个分组对应得到一个复数输入信号矩阵,包括:将该两个实数输入信号矩阵中的一个实数输入信号矩阵作为复数输入信号矩阵的实部,将该两个实数输入信号矩阵中的另一个实数输入信号矩阵作为复数输入信号矩阵的虚部。
在一些可行的实施方式中,针对每个分组,将每个分组包括的两个实数输入信号矩阵拼接为复数输入信号矩阵,每个分组对应得到一个复数输入信号矩阵。其中,该复数输入信号矩阵的实部为每个分组包括的两个实数输入信号矩阵中的一个实数输入信号矩阵,该复数输入信号矩阵的虚部为每个分组包括的两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
在一些可行的实施方式中,若当前卷积层只有两个输入通道,则不需对接收到的至少两个实数输入信号进行分组,可以根据实数输入信号矩阵的输入顺序将从该两个输入通道中的某一个输入通道输入的实数输入信号矩阵作为复数输入信号矩阵的实部,将从该两个 输入通道中的另一个输入通道输入的实数输入信号矩阵作为复数输入信号矩阵的虚部。也可以是将从该两个输入通道中的某一个输入通道输入的实数输入信号矩阵作为复数输入信号矩阵的实部,将从该两个输入通道中的另一个输入通道输入的实数输入信号矩阵符号取反后得到的实数输入信号矩阵作为复数输入信号矩阵的虚部。
本申请实施例中,若该至少两个实数输入信号矩阵的个数为奇数,则在将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵时,会剩余一个实数输入信号矩阵无法与其他实数输入信号矩阵进行拼接。针对该剩余的一个实数输入信号矩阵,可以将该剩余的一个实数输入信号矩阵作为复数输入信号矩阵的实部,且将该复数输入信号矩阵的虚部置0;也可以对该剩余的一个实数输入信号矩阵直接进行卷积运算。其中,将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵之后,可以只得到一个复数输入信号矩阵或多个复数输入信号矩阵作为输出结果。
S403、获取所述复数输入信号矩阵的复数卷积核矩阵。
本申请实施例中,该复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数,该复数卷积核系数是根据实数卷积核系数拼接得到的。该复数卷积核矩阵与该复数输入信号矩阵一一对应。具体地,首先获取至少两个实数卷积核矩阵,该实数卷积核矩阵包括多个实数元素,每个实数元素为实数卷积核系数,该至少两个实数卷积核矩阵与该至少两个实数输入信号矩阵一一对应,例如可以是包括相同的通道标识;然后将该至少两个实数卷积核矩阵拼接为复数卷积核矩阵,其中,复数卷积核矩阵的实部为与其对应的复数输入信号矩阵的实部对应的实数卷积核矩阵,复数卷积核矩阵的虚部为与其对应的复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
在一些可行的实施方式中,将该至少两个实数卷积核矩阵拼接为复数卷积核矩阵,包括:将与该复数输入信号矩阵的实部对应的实数卷积核矩阵作为该复数卷积核矩阵的实部,将与该复数输入信号矩阵的虚部对应的实数卷积核矩阵作为该复数卷积核矩阵的虚部。
在一些可行的实施方式中,该复数卷积核矩阵可以是相关设备根据预设规则预置的,可以直接获取与拼接得到的复数输入信号矩阵对应的复数卷积核矩阵。从而避免了通过对实数卷积核矩阵进行拼接得到复数卷积核矩阵的操作,可以在一定程度上降低卷积神经网络的运算量,提高信号处理的效率。
S404、对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵。
本申请实施例中,对该复数输入信号矩阵以及该复数卷积核矩阵分别进行傅里叶变换,包括:对该复数输入信号矩阵以及该复数卷积核矩阵分别进行离散傅里叶变换(discrete fourier transform,DFT)或者快速傅里叶变换。每个复数输入信号矩阵进行傅里叶变换后对应得到一个第一矩阵,每个复数卷积核矩阵进行傅里叶变换后对应得到一个第二矩阵,该第一矩阵以及该第二矩阵为复数矩阵。
S405、对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵。
S406、通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵。
本申请实施例中,该实数输出信号矩阵为卷积神经网络当前卷积层的卷积运算结果,且包括多个计算机可处理的输出信号。其中,若获取到的至少两个实数输入信号矩阵为两 个实数输入信号矩阵,则只能将该两个实数输入信号矩阵拼接为一个复数输入信号矩阵,对该第一矩阵以及该第二矩阵进行复数矩阵点乘,对应只能得到一个第三矩阵。故而通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:首先对该第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵;然后获取该复数输出信号矩阵的实部,得到该实数输出信号矩阵。若接收到的至少两个实数输入信号矩阵包括多于两个的实数输入信号矩阵,则可以将多于两个的实数输入信号矩阵拼接为多个复数输入信号矩阵,对应可以得到多个第三矩阵。其中,对该第一矩阵以及该第二矩阵进行复数矩阵点乘,得到第三矩阵,包括:针对每个分组,对每个分组的第一矩阵以及第二矩阵进行复数矩阵点乘,每个分组对应得到一个第三矩阵。通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:首先将每个分组的第三矩阵进行相加,即各个第三矩阵中对应位置的元素分别进行相加,得到和矩阵;然后对该和矩阵进行傅里叶逆变换,得到复数输出信号矩阵;最后获取该复数输出信号矩阵的实部,得到该实数输出信号矩阵。
需要说明的是,上述确定卷积神经网络当前卷积层的实数输出信号矩阵的具体实现方式,为确定卷积神经网络当前卷积层的某一输出通道的实数输出信号矩阵的具体步骤,确定卷积神经网络当前卷积层的其他输出通道的实数输出信号矩阵的具体方式则可以以此类推,在此不再赘述。获取到卷积神经网络当前卷积层的实数输出信号矩阵后,可以将该实数输出信号矩阵直接输出卷积神经网络,也可以将该实数输出信号矩阵作为卷积神经网络的下一卷积层的实数输入信号矩阵。
下面将通过举例子的方式对本申请实施例提供的信号处理方法进行说明,以卷积神经网络用于进行图像处理为例。卷积神经网络用于进行图像处理时,卷积神经网络每个卷积层的输入信号为L个2D图像(或者说2D实数矩阵)的组合。卷积神经网络的卷积层除了输入红绿蓝RGB图像时输入channel数L=3为奇数,输入其它类型的图像时一般输入channel数L为偶数。对于卷积神经网络当前卷积层的某个输出channel,选择与其对应的两个输入channel l=0和l=1为例进行分析说明。请一并参见图5a,图5a仅以两个实数输入信号为例。首先将两个输入channel的实数输入信号2D矩阵按以下规则拼接为复数输入信号矩阵:in_C(n0,n1)=in0(n0,n1)+in1(n0,n1)·i;其中,n0,n1为实数输入信号2D矩阵的坐标,in0(n0,n1)/in1(n0,n1)对应实数输入信号2D矩阵上的某个点。
然后,将相应的两个实数卷积核2D矩阵按以下规则拼接为复数卷积核矩阵:f_C(n0,n1)=f0,k(n0,n1)-f1,k(n0,n1)·i;其中,实数卷积核矩阵f0,k(n0,n1)对应实数输入信号2D矩阵in0(n0,n1),实数卷积核矩阵f1,k(n0,n1)对应实数输入信号2D矩阵in1(n0,n1)。
接着,分别对上述拼接得到的复数输入信号矩阵和复数卷积核矩阵分别进行傅里叶变换,下面以傅里叶变换为FFT为例,得到下式:
FFT(in_C)=FFT(in0)+FFT(in1)·i        (1);
FFT(f_C)=FFT(f0,k)-FFT(f1,k)·i     (2);
进一步地,将式(1)以及式(2)进行复数点乘得到:
Figure PCTCN2017089302-appb-000014
最后对式(3)进行快速傅里叶逆变换IFFT,得到:
Figure PCTCN2017089302-appb-000015
即复数输出信号矩阵。最后获取复数输出信号矩阵的实部,得到实数输出信号矩阵。
可以证明,
Figure PCTCN2017089302-appb-000016
即等于两个实数输入信号矩阵的FFT分别 与对应的实数卷积核的FFT进行点乘后相加,最后进行IFFT,即:
Figure PCTCN2017089302-appb-000017
在一些可行的实施方式中,请一并参见图5b,图5b也仅以两个实数输入信号为例。首先将两个输入channel的实数输入信号2D矩阵按以下规则拼接为复数输入信号矩阵:in_C(n0,n1)=in0(n0,n1)-in1(n0,n1)·i;然后,将相应的两个实数卷积核2D矩阵按以下规则拼接为复数卷积核矩阵:f_C(n0,n1)=f0,k(n0,n1)+f1,k(n0,n1)·i。接着,分别对上述拼接得到的复数输入信号矩阵和复数卷积核矩阵分别进行傅里叶变换,下面以傅里叶变换为FFT为例,得到下式:
FFT(in_C)=FFT(in0)-FFT(in1)·i       (4);
FFT(f_C)=FFT(f0,k)+FFT(f1,k)·i        (5);
进一步地,将式(4)以及式(5)进行复数点乘得到:
Figure PCTCN2017089302-appb-000018
最后对式(6)进行快速傅里叶逆变换IFFT,得到:
Figure PCTCN2017089302-appb-000019
即复数输出信号矩阵。最后获取复数输出信号矩阵的实部,得到实数输出信号矩阵。
可以证明,
Figure PCTCN2017089302-appb-000020
即等于两个实数输入信号矩阵的FFT分别与对应的实数卷积核的FFT进行点乘后相加,最后进行IFFT,即:
Figure PCTCN2017089302-appb-000021
本申请实施例提供的信号处理方法实现2个卷积求和,只需要2次FFT,一次IFFT。而上述提到的另一种FFT加速卷积算法(如图3)实现2个卷积求和需要4次FFT,一次IFFT。在输入channel数较大的情况下,本申请实施例提供的信号处理方法只需要百分之五十的FFT运算量以及相同的复数点乘数,即可实现FFT卷积。故而本申请实施例提供的信号处理方法可以有效降低FFT运算量,提高信号处理效率。同时,本申请实施例提供的信号处理方法采用复数FFT统一了处理的数据类型,可以避免实数FFT在专用集成电路(application specific integrated circuit,ASIC)实现中采用厄米特Hermitian矩阵的对称性带来的处理流程复杂度增加,有利于模块实现。
进一步举例说明,在获取到至少两个实数输入信号矩阵之后,首先将该至少两个实数输入信号矩阵in(n0,n1,l)按输入通道标识(channel index)进行两两分组,下面以按照channel index的顺序进行分组为例,得到:
l=0/1;2/3;...;(In_chan_num-2)/(In_chan_num-1);共
Figure PCTCN2017089302-appb-000022
个分组,每个分组包括两个实数输入信号矩阵inl(n0,n1)。
与该至少两个实数输入信号矩阵对应的In_chan_num个实数卷积核矩阵也采用同样的方法进行分组。然后将每个分组包括的两个inl(n0,n1)拼接为复数输入信号矩阵,以l=0和l=1举例,令in_C(n0,n1)=in0(n0,n1)+in1(n0,n1)·i;in_C为复数输入信号2D矩阵。此时in_C比FFT的尺寸(size)要小,在矩阵in_C的右侧和下方填零,使得in_C的size与FFT的size一致。与每个分组包括的两个inl(n0,n1)对应的两个实数卷积核矩阵也拼接为复数卷积核矩阵,f_C(n0,n1)=f0,k(n0,n1)-f1,k(n0,n1)·i。接着,分别对每个分组的复数输入信号矩阵进行2D FFT,每个分组得到一个第一矩阵,结果保存备用;分别对每个分组的复数卷积核矩阵进行2D FFT,每个分组得到一个第二矩阵。进一步,将每个分组的第一矩阵 与相应的第二矩阵进行复数矩阵点乘,每个分组对应得到一个第三矩阵,将每个分组的第三矩阵进行累加,得到和矩阵,即卷积神经网络当前卷积层的某一输出channel k对应的FFT矩阵Freq_Sum:
Figure PCTCN2017089302-appb-000023
其中,
Figure PCTCN2017089302-appb-000024
即为复数输入信号矩阵FFT(in_Cl)和复数卷积核矩阵FFT(f_Cl,k)中对应位置的元素分别相乘。最后对Freq_Sum进行IFFT,得到复数输出信号矩阵,获取复数输出信号矩阵的实部,得到实数输出信号矩阵,即该输出channel k的实数输出信号矩阵。
在一些可行的实施方式中,将每个分组包括的两个inl(n0,n1)拼接为复数输入信号矩阵也可以是,令in_C(n0,n1)=in0(n0,n1)-in1(n0,n1)·i;将与每个分组包括的两个inl(n0,n1)对应的两个实数卷积核矩阵拼接为复数卷积核矩阵,也可以是,令f_C(n0,n1)=f0,k(n0,n1)+f1,k(n0,n1)·i;其余步骤可以参考上述描述,此处不再赘述。
需要说明的是,本申请实施例可以按照channel index的顺序对该至少两个实数输入信号矩阵进行两两分组,也可以按照channel index的奇偶性对该至少两个实数输入信号矩阵进行两两分组,当然也可以按照其他规则对该至少两个实数输入信号矩阵进行两两分组,本申请实施例不作限定。
本申请实施例提供的信号处理方法通过将两个输入channel的实数输入信号矩阵拼接成复数输入信号矩阵,并将其相应的实数卷积核矩阵拼接成复数卷积核矩阵之后,将复数输入信号矩阵以及复数卷积核矩阵进行FFT运算,从而最大限度的利用了FFT的复数运算性能,降低了卷积层中FFT的使用量;从而减少处理器运行卷积神经网络的时间,降低系统功耗,节省了软硬件资源。其中,该处理器包括通用处理器(例如CPU)或者逻辑电路处理器。
本申请实施例提供的一种信号处理方法可以应用于基于CPU进行运算的卷积神经网络,也可以应用于基于ASIC实现的卷积神经网络,还可以应用于基于GPU进行运算的卷积神经网络。传统的基于CPU或者GPU或者ASIC实现CNN的方案中,卷积神经网络的模型文件存储在外接的存储介质,例如双倍速率同步动态随机存储器DDR内存中。当卷积神经网络进行某一层卷积层的运算时,CPU或者GPU或者ASIC的运算单元首先从存储介质中的模型文件读取当前卷积层的卷积核系数,以及当前卷积层的参数,包括:输入通道数,输出通道数,卷积核大小,卷积步长等,然后根据上述卷积核系数以及参数进行CNN运算。运算单元处理过程中的中间数据也可以在存储介质中暂存,并在需要读取时进行回读。在一些可行的实施方式中,请一并参见图6,本申请实施例可以将卷积神经网络的原始模型文件进行模型预处理后,将经过模型预处理的模型文件存储在内部或外接的存储介质中。当卷积神经网络进行某一层卷积层的运算时,运算单元首先从存储介质中的经过模型预处理的模型文件读取当前卷积层的模型系数,以及当前卷积层的参数,然后进行CNN运算。运算单元处理过程中的中间数据也可以在存储介质中暂存,并在需要读取时进行回读。
下面对卷积神经网络的原始模型文件的模型预处理过程进行详细描述。本申请实施例 中,卷积神经网络参数的定义与传统的卷积运算略有差异,经典卷积运算的公式为:
Figure PCTCN2017089302-appb-000025
而CNN中卷积层的运算公式为:
Figure PCTCN2017089302-appb-000026
可以看出,CNN中的卷积系数是经典卷积系数的反序。因此,对原始模型文件进行预处理包括:将原始实数卷积核系数反序后存放至模型文件中。从而可以避免在卷积神经网络进行实时处理时对实数卷积核矩阵进行倒序操作,减少卷积神经网络的运算量。进一步地,可以根据反序存放的实数卷积核系数确定实数卷积核矩阵,并将实数卷积核矩阵存放在存储介质中。从而可以避免在卷积神经网络进行实时处理时根据实数卷积核系数确定实数卷积核矩阵的操作,进一步减少卷积神经网络的运算量。
例如,令卷积神经网络的实数卷积核矩阵为:
Figure PCTCN2017089302-appb-000027
经过模型预处理过程之后,卷积神经网络的实数卷积核矩阵在存储介质中的存放格式为:
Figure PCTCN2017089302-appb-000028
另外,由于将获取到的至少两个实数卷积核矩阵拼接为复数卷积核矩阵的具体操作为:f_C(n0,n1)=f0,k(n0,n1)-f1,k(n0,n1)·i,可以看出复数卷积核矩阵虚部的实数卷积核矩阵需要符号取反。为了避免在卷积神经网络进行实时运算时进行符号取反操作,在一些可行的实施方式中,对原始模型文件进行预处理还包括:对于根据预设规则确定的将作为复数卷积核矩阵虚部的实数卷积核矩阵,将其符号取反后存放至存储介质中。进一步地,可以根据预设规则将实数卷积核矩阵拼接为复数卷积核矩阵,并将复数卷积核矩阵存放在存储介质中。从而可以进一步提高卷积神经网络的运算效率。需要说明的是,实数卷积核矩阵在存储介质中的存储格式可以根据处理器的复数格式进行相应调整,以便于更好的读取实数卷积核矩阵。
在一些可行的实施方式中,从存储介质中的模型文件读取到卷积神经网络当前卷积层的模型系数以及当前卷积层的其他参数之后,根据预设规则确定是否使用FFT实现卷积。例如可以是检测与当前卷积层对应的卷积核矩阵的分块大小或者秩是否大于或者等于预设值,若是,则采用本申请实施例提供的信号处理方法对卷积神经网络的信号进行处理;若否,则采用传统卷积运算对卷积神经网络的信号进行处理。
本申请实施例中,首先获取至少两个实数输入信号矩阵,并将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵,然后获取该复数输入信号矩阵的复数卷积核矩阵,并对该复数输入信号矩阵以及该复数卷积核矩阵分别进行傅里叶变换,得到该复数输入信号矩阵的第一矩阵,该复数卷积核矩阵的第二矩阵;最后对该第一矩阵以及该第二矩阵进行复 数矩阵点乘,得到第三矩阵,并通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,可以在一定程度上降低信号处理运算量,从而提高信号处理效率。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。
请参见图7图7是本申请实施例提供的一种信号处理装置的结构示意图。其中,图7所示的信号处理装置可以包括第一获取模块701、拼接模块702、第二获取模块703、第一处理模块704、第二处理模块705、第三获取模块706,其中,各个模块的详细描述如下。
第一获取模块701,用于获取至少两个实数输入信号矩阵,所述实数输入信号矩阵包括多个实数元素,每个所述实数元素为计算机可处理的初始信号。
拼接模块702,用于将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵。
第二获取模块703,用于获取所述复数输入信号矩阵的复数卷积核矩阵,所述复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数。
第一处理模块704,用于对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵。
第二处理模块705,还用于对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵。
第三获取模块706,还用于通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,所述实数输出信号矩阵为卷积运算结果,且包括多个计算机可处理的输出信号。
在一些可行的实施方式中,所述第一获取模块701,具体包括:
接收单元7011,用于接收至少两个实数输入信号矩阵。
分组单元7012,用于对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵。
其中,所述拼接模块702,具体用于针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵。
所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵。
在一些可行的实施方式中,所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
在一些可行的实施方式中,所述第二获取模块703,具体包括:
第一获取单元7031,用于获取至少两个实数卷积核矩阵。
拼接单元7032,还用于将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵。
所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
在一些可行的实施方式中,所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对 应的实数卷积核矩阵。
在一些可行的实施方式中,所述第三获取模块706,具体包括:
处理单元7061,用于对所述第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵。
第二获取单元7062,用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
在一些可行的实施方式中,所述第二处理模块705,具体用于针对所述每个分组,对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵。
其中,所述第三获取模块706,具体包括:
相加单元7063,还用于将每个分组的第三矩阵进行相加,得到和矩阵。
处理单元7061,还用于对所述和矩阵进行傅里叶逆变换,得到复数输出信号矩阵。
第二获取单元7062,用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
本申请实施例中,所述初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项。所述实数输入信号矩阵为前一级的实数输出信号矩阵,所述实数输入信号矩阵通过电路接口或者软件逻辑接口输入。所述实数卷积核矩阵是根据预置的卷积核系数得到的,所述卷积核系数是反序存储的。
需要说明的是,本申请实施例的数据处理装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
本申请实施例中,首先获取至少两个实数输入信号矩阵,并将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵,然后获取该复数输入信号矩阵的复数卷积核矩阵,并对该复数输入信号矩阵以及该复数卷积核矩阵分别进行傅里叶变换,得到该复数输入信号矩阵的第一矩阵,该复数卷积核矩阵的第二矩阵;最后对该第一矩阵以及该第二矩阵进行复数矩阵点乘,得到第三矩阵,并通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,可以在一定程度上降低信号处理运算量,从而提高信号处理效率。
请参见图8,图8是本申请实施例提供的一种信号处理装置的结构示意图,本申请实施例中所描述的信号处理装置包括:处理器801、通信接口802、存储器803。其中,处理器801、通信接口802、存储器803可通过总线或其他方式连接,本申请实施例以通过总线连接为例。
处理器801可以是中央处理器(英文:central processing unit,缩写:CPU),网络处理器(英文:network processor,缩写:NP),图形处理器(英文:graphics processing unit,缩写:GPU),或者CPU、GPU和NP的组合。处理器801也可以是多核CPU、多核GPU或多核NP中用于实现通信标识绑定的核。
上述处理器801可以是硬件芯片。上述硬件芯片可以是专用集成电路(英文:application-specific integrated circuit,缩写:ASIC),可编程逻辑器件(英文:programmable logic device,缩写:PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(英文:complex programmable logic device,缩写:CPLD),现场可编程逻辑门阵列(英文:field-programmable  gate array,缩写:FPGA),通用阵列逻辑(英文:generic array logic,缩写:GAL)或其任意组合。
上述通信接口802可用于收发信息或信令的交互,以及信号的接收和传递。上述存储器803可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的存储程序(比如文字存储功能、位置存储功能等);存储数据区可存储根据装置的使用所创建的数据(比如图像数据、文字数据)等,并可以包括应用存储程序等。此外,存储器803可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
上述存储器803还用于存储程序指令。当上述处理器801是非硬件芯片的处理器时,可以调用上述存储器803存储的程序指令,实现如本申请实施例所示的信号处理方法。
具体的,上述处理器801调用存储在上述存储器803存储的程序指令执行以下步骤:
获取至少两个实数输入信号矩阵,所述实数输入信号矩阵包括多个实数元素,每个所述实数元素为计算机可处理的初始信号;
将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵;
获取所述复数输入信号矩阵的复数卷积核矩阵,所述复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数;
对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵;
对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;
通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,所述实数输出信号矩阵为卷积运算结果,且包括多个计算机可处理的输出信号。
本申请实施例中处理器执行的方法均从处理器的角度来描述,可以理解的是,本申请实施例中处理器要执行上述方法需要其他硬件结构的配合。本申请实施例对具体的实现过程不作详细描述和限制。
在一些可行的实施方式中,上述通信接口802,用于接收至少两个实数输入信号矩阵。
上述处理器801,用于对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵。
上述处理器801,具体用于针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵。
其中,所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵。
在一些可行的实施方式中,所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
在一些可行的实施方式中,上述处理器801,还用于获取至少两个实数卷积核矩阵。
上述处理器801,还用于将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵。
其中,所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核 矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
在一些可行的实施方式中,所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵。
在一些可行的实施方式中,上述处理器801,还用于对所述第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵。
上述处理器801,还用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
在一些可行的实施方式中,上述处理器801,具体用于针对所述每个分组,对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵。
上述处理器801,还用于将每个分组的第三矩阵进行相加,得到和矩阵。
上述处理器801,还用于对所述和矩阵进行傅里叶逆变换,得到复数输出信号矩阵。
上述处理器801,还用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
本申请实施例中,所述初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项。所述实数输入信号矩阵为前一级的实数输出信号矩阵,所述实数输入信号矩阵通过电路接口或者软件逻辑接口输入。所述实数卷积核矩阵是根据预置的卷积核系数得到的,所述卷积核系数是反序存储的。
具体实现中,本申请实施例中所描述的处理器801、通信接口802、存储器803可执行本申请实施例提供的一种信号处理方法中所描述的实现方式,也可执行本申请实施例图8提供的一种信号处理装置中所描述的实现方式,在此不再赘述。
本申请实施例中,首先获取至少两个实数输入信号矩阵,并将该至少两个实数输入信号矩阵拼接为复数输入信号矩阵,然后获取该复数输入信号矩阵的复数卷积核矩阵,并对该复数输入信号矩阵以及该复数卷积核矩阵分别进行傅里叶变换,得到该复数输入信号矩阵的第一矩阵,该复数卷积核矩阵的第二矩阵;最后对该第一矩阵以及该第二矩阵进行复数矩阵点乘,得到第三矩阵,并通过对该第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,可以在一定程度上降低信号处理运算量,从而提高信号处理效率。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质 可以是磁性介质(例如软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
综上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (21)

  1. 一种信号处理方法,其特征在于,所述方法包括:
    获取至少两个实数输入信号矩阵,所述实数输入信号矩阵包括多个实数元素,每个所述实数元素为计算机可处理的初始信号;
    将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵;
    获取所述复数输入信号矩阵的复数卷积核矩阵,所述复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数;
    对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵;
    对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;
    通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,所述实数输出信号矩阵为卷积运算结果,且包括多个计算机可处理的输出信号。
  2. 根据权利要求1所述的方法,其特征在于,所述获取至少两个实数输入信号矩阵,包括:
    接收至少两个实数输入信号矩阵;
    对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵;
    其中,所述将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵,包括:
    针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵;
    所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵。
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述复数输入信号矩阵的复数卷积核矩阵,包括:
    获取至少两个实数卷积核矩阵;
    将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵;
    所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
  4. 根据权利要求1所述的方法,其特征在于,所述获取至少两个实数输入信号矩阵,包括:
    接收至少两个实数输入信号矩阵;
    对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵;
    其中,所述将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵,包括:
    针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵;
    所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述复数输入信号矩阵的复数卷积核矩阵,包括:
    获取至少两个实数卷积核矩阵;
    将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵;
    所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:
    对所述第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵;
    获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
  7. 根据权利要求2至5中任一项所述的方法,其特征在于,所述对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵,包括:
    针对所述每个分组,对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;
    其中,所述通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,包括:
    将每个分组的第三矩阵进行相加,得到和矩阵;
    对所述和矩阵进行傅里叶逆变换,得到复数输出信号矩阵;
    获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述实数输入信号矩阵为前一级的实数输出信号矩阵,所述实数输入信号矩阵通过电路接口或者软件逻辑接口输入。
  10. 根据权利要求3或5所述的方法,其特征在于,所述实数卷积核矩阵是根据预置的卷积核系数得到的,所述卷积核系数是反序存储的。
  11. 一种信号处理装置,其特征在于,所述装置包括:
    第一获取模块,用于获取至少两个实数输入信号矩阵,所述实数输入信号矩阵包括多个实数元素,每个所述实数元素为计算机可处理的初始信号;
    拼接模块,用于将所述至少两个实数输入信号矩阵拼接为复数输入信号矩阵;
    第二获取模块,用于获取所述复数输入信号矩阵的复数卷积核矩阵,所述复数卷积核矩阵包括多个复数元素,每个复数元素为复数卷积核系数;
    第一处理模块,用于对所述复数输入信号矩阵以及所述复数卷积核矩阵分别进行傅里叶变换,得到所述复数输入信号矩阵的第一矩阵,所述复数卷积核矩阵的第二矩阵;
    第二处理模块,用于对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;
    第三获取模块,用于通过对所述第三矩阵进行傅里叶逆变换,获取实数输出信号矩阵,所述实数输出信号矩阵为卷积运算结果,且包括多个计算机可处理的输出信号。
  12. 根据权利要求11所述的装置,其特征在于,所述第一获取模块,具体包括:
    接收单元,用于接收至少两个实数输入信号矩阵;
    分组单元,用于对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵;
    其中,所述拼接模块,具体用于针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵;
    所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵。
  13. 根据权利要求12所述的装置,其特征在于,所述第二获取模块,具体包括:
    第一获取单元,用于获取至少两个实数卷积核矩阵;
    拼接单元,用于将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵;
    所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵符号取反后得到的实数卷积核矩阵。
  14. 根据权利要求11所述的方法,其特征在于,所述第一获取模块,具体包括:
    接收单元,用于接收至少两个实数输入信号矩阵;
    分组单元,用于对所述至少两个实数输入信号矩阵进行分组,每个分组包括两个实数输入信号矩阵;
    其中,所述拼接模块,具体用于针对所述每个分组,将所述两个实数输入信号矩阵拼接为所述复数输入信号矩阵;
    所述复数输入信号矩阵的实部为所述两个实数输入信号矩阵中的一个实数输入信号矩阵,所述复数输入信号矩阵的虚部为所述两个实数输入信号矩阵中的另一个实数输入信号矩阵符号取反后得到的实数输入信号矩阵。
  15. 根据权利要求14所述的方法,其特征在于,所述第二获取模块,具体包括:
    第一获取单元,用于获取至少两个实数卷积核矩阵;
    拼接单元,还用于将所述至少两个实数卷积核矩阵拼接为所述复数卷积核矩阵;
    所述复数卷积核矩阵的实部为所述复数输入信号矩阵的实部对应的实数卷积核矩阵,所述复数卷积核矩阵的虚部为所述复数输入信号矩阵的虚部对应的实数卷积核矩阵。
  16. 根据权利要求11至15任一项所述的装置,其特征在于,所述第三获取模块,具体包括:
    处理单元,用于对所述第三矩阵进行傅里叶逆变换,得到复数输出信号矩阵;
    第二获取单元,用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
  17. 根据权利要求12至15任一项所述的装置,其特征在于,
    所述第二处理模块,具体用于针对所述每个分组,对所述第一矩阵以及所述第二矩阵进行复数矩阵点乘,得到第三矩阵;
    其中,所述第三获取模块,具体包括:
    相加单元,用于将每个分组的第三矩阵进行相加,得到和矩阵;
    处理单元,用于对所述和矩阵进行傅里叶逆变换,得到复数输出信号矩阵;
    第二获取单元,用于获取所述复数输出信号矩阵的实部,得到所述实数输出信号矩阵。
  18. 根据权利要求11至17中任一项所述的装置,其特征在于,所述初始信号为图像信号、音频信号、传感器信号或通信信号中的至少一项。
  19. 根据权利要求11至18中任一项所述的装置,其特征在于,所述实数输入信号矩阵为前一级的实数输出信号矩阵,所述实数输入信号矩阵通过电路接口或者软件逻辑接口输入。
  20. 根据权利要求13或15所述的装置,其特征在于,所述实数卷积核矩阵是根据预置的卷积核系数得到的,所述卷积核系数是反序存储的。
  21. 一种信号处理装置,其特征在于,包括:处理器、存储器,所述处理器、所述存储器通过总线连接,所述存储器存储有可执行程序代码,所述处理器用于调用所述可执行程序代码,执行如权利要求1~10中任一项所述的信号处理方法。
PCT/CN2017/089302 2017-06-21 2017-06-21 一种信号处理方法及装置 WO2018232615A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/089302 WO2018232615A1 (zh) 2017-06-21 2017-06-21 一种信号处理方法及装置
CN201780094036.2A CN110998610B (zh) 2017-06-21 2017-06-21 一种信号处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/089302 WO2018232615A1 (zh) 2017-06-21 2017-06-21 一种信号处理方法及装置

Publications (1)

Publication Number Publication Date
WO2018232615A1 true WO2018232615A1 (zh) 2018-12-27

Family

ID=64736220

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/089302 WO2018232615A1 (zh) 2017-06-21 2017-06-21 一种信号处理方法及装置

Country Status (2)

Country Link
CN (1) CN110998610B (zh)
WO (1) WO2018232615A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150170021A1 (en) * 2013-12-18 2015-06-18 Marc Lupon Reconfigurable processing unit
CN106203621A (zh) * 2016-07-11 2016-12-07 姚颂 用于卷积神经网络计算的处理器
CN106250103A (zh) * 2016-08-04 2016-12-21 东南大学 一种卷积神经网络循环卷积计算数据重用的系统
CN106557812A (zh) * 2016-11-21 2017-04-05 北京大学 基于dct变换的深度卷积神经网络压缩与加速方案
US20170103298A1 (en) * 2015-10-09 2017-04-13 Altera Corporation Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator
CN106845635A (zh) * 2017-01-24 2017-06-13 东南大学 基于级联形式的cnn卷积核硬件设计方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3134747B1 (en) * 2014-04-25 2021-12-01 Mayo Foundation for Medical Education and Research Integrated image reconstruction and gradient non-linearity correction for magnetic resonance imaging
US9904874B2 (en) * 2015-11-05 2018-02-27 Microsoft Technology Licensing, Llc Hardware-efficient deep convolutional neural networks
CN105760825A (zh) * 2016-02-02 2016-07-13 深圳市广懋创新科技有限公司 一种基于切比雪夫前向神经网络的手势识别系统和方法
CN106680817B (zh) * 2016-12-26 2020-09-15 电子科技大学 一种实现前视雷达高分辨成像的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150170021A1 (en) * 2013-12-18 2015-06-18 Marc Lupon Reconfigurable processing unit
US20170103298A1 (en) * 2015-10-09 2017-04-13 Altera Corporation Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator
CN106203621A (zh) * 2016-07-11 2016-12-07 姚颂 用于卷积神经网络计算的处理器
CN106250103A (zh) * 2016-08-04 2016-12-21 东南大学 一种卷积神经网络循环卷积计算数据重用的系统
CN106557812A (zh) * 2016-11-21 2017-04-05 北京大学 基于dct变换的深度卷积神经网络压缩与加速方案
CN106845635A (zh) * 2017-01-24 2017-06-13 东南大学 基于级联形式的cnn卷积核硬件设计方法

Also Published As

Publication number Publication date
CN110998610A (zh) 2020-04-10
CN110998610B (zh) 2024-04-16

Similar Documents

Publication Publication Date Title
CN110263909B (zh) 图像识别方法及装置
US11960566B1 (en) Reducing computations for data including padding
CN111416743B (zh) 一种卷积网络加速器、配置方法及计算机可读存储介质
WO2019085709A1 (zh) 一种应用于卷积神经网络的池化处理的方法及系统
CN112884086B (zh) 模型训练方法、装置、设备、存储介质以及程序产品
US10249070B2 (en) Dynamic interaction graphs with probabilistic edge decay
US20190220316A1 (en) Method, device and computer program product for determining resource amount for dedicated processing resources
US20200389182A1 (en) Data conversion method and apparatus
US11842220B2 (en) Parallelization method and apparatus with processing of neural network model for manycore system
US11615607B2 (en) Convolution calculation method, convolution calculation apparatus, and terminal device
JP7452679B2 (ja) 処理システム、処理方法及び処理プログラム
WO2019001323A1 (zh) 信号处理的系统和方法
WO2022041188A1 (zh) 用于神经网络的加速器、方法、装置及计算机存储介质
JP2024508867A (ja) 画像クラスタリング方法、装置、コンピュータ機器及びコンピュータプログラム
CN109844774B (zh) 一种并行反卷积计算方法、单引擎计算方法及相关产品
CN110489955B (zh) 应用于电子设备的图像处理、装置、计算设备、介质
WO2021135572A1 (zh) 神经网络的卷积实现方法、卷积实现装置及终端设备
WO2018232615A1 (zh) 一种信号处理方法及装置
CN116129501A (zh) 人脸位姿估计方法及装置
US11531782B1 (en) Systems and methods for finding a value in a combined list of private values
CN115878949A (zh) 信号处理方法以及相关设备
CN111815654A (zh) 用于处理图像的方法、装置、设备和计算机可读介质
CN113269303A (zh) 用于深度学习推理框架的数据处理方法和数据处理装置
CN113542808B (zh) 视频处理方法、设备、装置以及计算机可读介质
US11689608B1 (en) Method, electronic device, and computer program product for data sharing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17914350

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17914350

Country of ref document: EP

Kind code of ref document: A1