CN110858323A

CN110858323A - Convolution-based image processing method, convolution-based image processing device, convolution-based image processing medium and electronic equipment

Info

Publication number: CN110858323A
Application number: CN201810966605.4A
Authority: CN
Inventors: 陈一凡; 安耀祖
Original assignee: Beijing Jingdong Financial Technology Holding Co Ltd
Current assignee: Beijing Jingdong Financial Technology Holding Co Ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2020-03-03

Abstract

The embodiment of the invention relates to the technical field of image processing, and provides an image processing method based on convolution, an image processing device based on convolution, a computer readable medium and electronic equipment, wherein the method comprises the following steps: acquiring a characteristic matrix and original convolution kernel information of an image to be processed; decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels; and performing convolution calculation on the characteristic matrix by using the decomposed convolution kernel so as to finish compression processing on the image to be processed. The technical scheme of the embodiment of the invention can reduce the identification time and improve the identification efficiency on the basis of sufficient identification effect.

Description

Convolution-based image processing method, convolution-based image processing device, convolution-based image processing medium and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a convolution-based image processing method, device, medium, and electronic apparatus.

Background

With the development of image processing technology and the increasing popularity of intelligent monitoring and mobile devices, object detection is more and more widely applied to various fields such as human-computer interaction, intelligent monitoring, security inspection, digital entertainment, digital cameras and the like. Object detection is a technical solution for detecting an object of interest (e.g. a gesture, a human face, a car, etc.) in an image, and for example, face recognition is becoming more and more popular in daily life. The image referred to herein may be a photograph, a still picture, a video image, or the like.

Recently, the technical solution of using the convolutional neural network method for object detection can compress the features of the image to be detected. However, the convolution method in the current convolution neural network has the problem of slow calculation speed.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present invention and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

Embodiments of the present invention provide an image processing method, an image processing apparatus, an image processing medium, and an electronic device based on convolution, so as to overcome, at least to a certain extent, a problem of a low calculation speed of a convolution method in the related art.

Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.

According to a first aspect of embodiments of the present invention, there is provided a convolution-based image processing method, including:

acquiring a characteristic matrix and original convolution kernel information of an image to be processed;

decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels;

and performing convolution calculation on the characteristic matrix by using the decomposed convolution kernel so as to finish compression processing on the image to be processed.

In some embodiments of the present invention, the feature matrix size of the image to be processed is h × w × c, and the original convolution kernel information includes: the size of the original convolution kernel is d × c, and the number of the original convolution kernels is n;

decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels, including:

decomposing n original convolution kernels with the size of d x c into k first convolution kernels with the size of 1 x d x c and n second convolution kernels with the size of d x 1 x k, wherein h, w, d, c, n and k are positive numbers.

In some embodiments of the present invention, performing convolution calculation on the feature matrix by using the decomposed convolution kernel includes:

performing first convolution on a feature matrix of the to-be-processed image with the size h x w x c by using k first convolution kernels with the size 1 x d x c to obtain a first compressed feature matrix with the size h x w x k;

and performing second convolution on the first compression feature matrix with the size h x w x k by using n second convolution kernels with the size d x 1 x k to obtain a second compression feature matrix with the size h x w x n.

In some embodiments of the present invention, before performing convolution calculation on the feature matrix by using the decomposed convolution kernel, the method further includes:

decomposing n second convolution kernels of size d x 1 x k into n 'third convolution kernels of size d x 1 x k and n fourth convolution kernels of size 1 x n';

performing convolution calculation on the feature matrix by using the decomposed convolution kernel, wherein the convolution calculation comprises the following steps:

performing third convolution on the first compression feature matrix with the size h x w x k by using n 'third convolution kernels with the size d x 1 x k to obtain a third compression feature matrix with the size h x w x n';

and performing fourth convolution on the third compression feature matrix with the size h x w x n ' by using n fourth convolution checks with the size 1 x n ', and obtaining a second compression feature matrix with the size h x w x n, wherein n ' is a positive number.

decomposing k first convolution kernels of size 1 x d c into p fifth convolution kernels of size 1 x d c and k sixth convolution kernels of size 1 x p;

performing fifth convolution on the feature matrix of the image to be processed with the size h x w x c by using p fifth convolution kernels with the size 1 x d x c to obtain a fifth compressed feature matrix with the size h x w x p;

performing sixth convolution on a fifth compression feature matrix with the size h x w x p by using k sixth convolution kernels with the size 1 x p, and obtaining a first compression feature matrix with the size h x w x k, wherein p is a positive number;

decomposing n second convolution kernels of size d x 1 x k into n 'third convolution kernels of size d x 1 x k and n fourth convolution kernels of size 1 x n', and decomposing k first convolution kernels of size 1 x d c into p fifth convolution kernels of size 1 x d c and k sixth convolution kernels of size 1 x p;

performing sixth convolution on a fifth compression feature matrix with the size h x w x p by using k sixth convolution kernels with the size 1 x p to obtain a first compression feature matrix with the size h x w x k;

and performing fourth convolution on the third compression feature matrix with the size h x w x n ' by using n fourth convolution checks with the size 1 x n ', and obtaining a second compression feature matrix with the size h x w x n, wherein p and n ' are positive numbers.

In some embodiments of the present invention, the values of k and n' are related to the compression rate of the image to be processed.

In some embodiments of the present invention, the values of k and p are related to the compression rate of the image to be processed.

According to a second aspect of an example of the present invention, there is provided a convolution-based image processing apparatus including:

the acquisition module is used for acquiring a feature matrix and original convolution kernel information of an image to be processed;

the decomposition module is used for decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels;

and the calculation module is used for carrying out convolution calculation on the characteristic matrix by using the decomposed convolution kernel so as to finish the compression processing on the image to be processed.

According to a third aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which program, when executed by a processor, implements the convolution-based image processing method as described in the first aspect of the embodiments above.

According to a fourth aspect of embodiments of the present invention, there is provided an electronic apparatus, including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a convolution-based image processing method as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the technical solutions provided by some embodiments of the present invention, at least two convolution kernels are obtained by obtaining a feature matrix and original convolution kernel information of an image to be processed and decomposing the original convolution kernel according to the original convolution kernel information, so as to reduce a rank of the original convolution kernel, and further, performing convolution calculation on the feature matrix by using the decomposed convolution kernel. The technical scheme provided by the embodiment of the invention reduces module redundancy, reduces the identification time and improves the identification efficiency on the basis of meeting the identification effect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic flow diagram of a convolution-based image processing method according to an embodiment of the present invention;

FIG. 2 shows a block diagram of a convolution processed image before improvement according to an embodiment of the present invention;

FIG. 3 shows a schematic flow diagram of a convolution-based image processing method according to another embodiment of the present invention;

FIG. 4 is a block diagram illustrating a framework for convolution processing an image based on the embodiment shown in FIG. 3;

FIG. 5 is a flow diagram illustrating a method of decomposing an original convolution kernel in accordance with an embodiment of the present invention;

FIG. 6 is a schematic flow chart diagram illustrating a method for performing convolution calculations using the decomposed convolution kernel obtained by the method shown in FIG. 5;

FIG. 7 is a schematic diagram of a frame of a convolution processed image based on the embodiment shown in FIG. 5;

FIG. 8 is a flow chart of a face comparison method according to the present invention;

FIG. 9 is a flow diagram illustrating a method of decomposing an original convolution kernel in accordance with another embodiment of the present invention;

FIG. 10 is a schematic flow chart diagram illustrating a method of performing convolution calculations using the decomposed convolution kernel obtained by the method of FIG. 9;

FIG. 11 is a flow diagram illustrating a method of decomposing an original convolution kernel in accordance with yet another embodiment of the present invention;

FIG. 12 is a schematic flow chart diagram illustrating a method for performing convolution calculations using the decomposed convolution kernel obtained by the method of FIG. 11;

fig. 13 is a schematic structural diagram showing a convolution-based image processing apparatus according to an embodiment of the present invention;

FIG. 14 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In recent years, rapid development of big data and deep learning techniques has enabled face recognition techniques based on deep learning to surpass human recognition capabilities on a test set. The face recognition model trained on the deep learning networks such as resnet, vgg and acceptance shows good recognition capability. However, the face recognition technology based on deep learning for shoe-fitting has the following problems in practical application: the deep learning network parameters are too many, the time required by each prediction is too long, and the deep learning network parameters are especially obvious on mobile devices and intelligent monitoring devices with low computing power. Therefore, it is important to shorten the time required for network prediction.

Most of the existing mainstream face recognition technical schemes are directly trained by adopting a standard deep learning network model, such as vgg, resnet, acceptance and the like. Training on the verified mature deep learning network models can obtain better generalization ability and recognition effect. However, the purpose of these standard deep learning network models is to provide accurate classification or recognition effect without considering model redundancy, which results in the defect that the face recognition model has a low long-term prediction time efficiency while having a high recognition rate.

Therefore, under the condition that the requirement on the recognition accuracy is not high, for example, whether two face photos are the same person or not is determined, the use of the deep learning network models such as vgg, resnet, acceptance and the like will cause the problems of waste of human and material resources and low recognition efficiency.

Fig. 1 shows a flow diagram of a convolution-based image processing method according to an embodiment of the present invention, identifying, at least to some extent, inefficient problems. Referring to fig. 1, the method comprises the following steps:

step S101, acquiring a feature matrix and original convolution kernel information of an image to be processed;

step S102, decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels;

and step S103, performing convolution calculation on the characteristic matrix by using the decomposed convolution kernel so as to complete compression processing on the image to be processed.

In the technical solution provided in the embodiment shown in fig. 1, at least two convolution kernels are obtained by obtaining a feature matrix and original convolution kernel information of an image to be processed and decomposing the original convolution kernel according to the original convolution kernel information, so as to reduce the rank of the original convolution kernel, and further, the feature matrix is subjected to convolution calculation by using the decomposed convolution kernel. The technical scheme provided by the embodiment of the invention reduces the redundancy of the compression module, reduces the recognition time and improves the recognition efficiency on the basis of meeting the recognition effect.

FIG. 2 shows a block diagram of a convolution processed image before improvement according to an embodiment of the present invention. With reference to fig. 2, a detailed implementation of each step of fig. 1 is described below:

in an exemplary embodiment, referring to fig. 2, in step S101, the size of the feature matrix 21 of the image to be processed is h × w × c, and the original convolution kernel information includes: the original convolution kernel 23 has a size of d × c, and the number of original convolution kernels is n. In addition, the convolution processing image process before improvement is that n original convolution kernels 23 with the size of d x c are subjected to convolution calculation with the feature matrix 21 of the image to be processed to obtain a processed feature matrix 22 with the size of h x w x n.

It should be noted that, if the length of the feature matrix 21 of the image to be processed is h, the width of the feature matrix is w, and the number of channels is c, the size of the feature matrix 21 of the image to be processed is denoted by h × w × c. The original convolution kernel 23 has a length d, a width d, and a number of channels c, and the size of the original convolution kernel 23 is denoted as d × c. The size of each convolution kernel is similar to the size of each feature matrix, and will not be explained below.

In an exemplary embodiment, in step S102, the original convolution kernel with original size d × c in fig. 2 is decomposed, and at least two convolution kernels are obtained after the decomposition. Thereby achieving the purpose of reducing the rank of the original convolution kernel.

In an exemplary embodiment, in step S103, the decomposed convolution kernel is used to perform convolution calculation on the feature matrix with the size h × w × c, so as to complete the compression processing on the image to be processed. Meanwhile, the technical effects of shortening the processing time and improving the processing efficiency are achieved.

Fig. 3 shows a schematic flow diagram of another convolution-based image processing method according to an embodiment of the present invention, at least to some extent also identifying inefficient problems. Referring to fig. 3, the following steps S301 to S304 are included. Fig. 4 shows a schematic diagram of a framework for convolution processing an image based on the embodiment shown in fig. 3, and the embodiment provided in fig. 3 is explained below with reference to fig. 4.

In an exemplary embodiment, the specific implementation process of step S301 is synchronized with step S101, and is not described herein again.

In an exemplary embodiment, step S302 is a specific implementation of step S102, and referring to fig. 2 and 4, n original convolutions 23 of size d × c are decomposed into k first convolution kernels 44 of size 1 × d × c and n second convolution kernels 45 of size d 1 × k, where h, w, d, c, n, k are positive numbers.

In the exemplary embodiment, steps S303-S304 are a specific implementation of step S103. Referring to fig. 3 and 4:

in step S303, performing a first convolution on the feature matrix 41 of the to-be-processed image with the size h × w × c by using k first convolution kernels 44 with the size 1 × d × c to obtain a first compressed feature matrix 42 with the size h × w × k; and

in step S304, the first compressed feature matrix 42 of size h × w × k is subjected to a second convolution using n second convolution kernels 45 of size d × 1 × k, to obtain a second compressed feature matrix 43 of size h × w × n.

Referring to fig. 2 and 4, it can be seen that the feature matrix 22 obtained after the convolution processing of the image before modification has the same size as the second compressed feature matrix 43 obtained after the convolution processing of the image after modification provided by the embodiment of fig. 3.

Meanwhile, in the convolution processing image before improvement shown in fig. 2, the amount of calculation of each original convolution kernel 23 is: d × d × c × h × w, the total amount calculated is: w₁D × d × c × n × h × w. In the pre-modification convolution processing image process shown in fig. 4, the amount of computation of the first convolution kernel 44 is: k × d × c × h × w. The amount of calculation of the second convolution kernel 45 is n × d × 1 × k × h × w. The total amount of computation W in the improved convolution processing of the image shown in fig. 4₂＝(k×d×c×h×w)+(n×d×1×k×h×w)＝d×k×h×w×(c+n)。

Further, the ratio of the calculated amounts before and after the improvement

Taking the VGGNet-16 network first layer convolution layer as an example, that is, the values of the number c of channels and the number n of original convolution kernels are both 64, and the value of the length d of the image to be processed is 3. Then

That is, when the value of k is less than 96, the ratio1 of the calculated amounts before and after improvement is less than 1, and the calculated amount after improvement is less than the calculated amount before improvement. And, it can be seen that the smaller the k value, the greater the model compressibility is available. Meanwhile, the value of k can reduce the calculated amount, improve the processing efficiency and ensure a certain accurate classification or identification effect so as to meet different identification requirements. Specifically, the value of k can be determined by the following procedure.

After the original convolution kernel is decomposed, the original convolution kernel is processed by

Error generated by convolutionThe difference is minimized and the value of k is determined during the error minimization process. Therefore, the characteristic matrix output by convolution calculation according to the decomposed convolution kernel is similar to the characteristic matrix output by convolution calculation of the original convolution kernel.

Wherein the content of the first and second substances,

representing the original convolution kernel or kernels of the original convolution kernel,

and

the first convolution kernel and the second convolution kernel are respectively shown.

Fig. 5 is a flow chart illustrating a method for decomposing an original convolution kernel according to an embodiment of the present invention, and referring to fig. 5, includes steps S501 to S502. Fig. 7 shows a schematic diagram of a frame of a convolution processed image based on the embodiment shown in fig. 5, and the embodiment provided in fig. 5 is explained below with reference to fig. 7.

In an exemplary embodiment, steps S501 to S502 are another specific implementation manner of step S102, and refer to fig. 5 and 7:

in step S501, n original convolutions of size d × c are decomposed into k first convolution kernels of size 1 × d × c and n second convolution kernels of size d × 1 × k; and

in step S502, n second convolution kernels of size d × 1 × k are decomposed into n 'third convolution kernels 74 of size d × 1 × k and n fourth convolution kernels 75 of size 1 × n'.

Referring to fig. 2 and 7, n original convolutions 23 of size d × c are decomposed into k first convolution kernels 44 of size 1 × d × c and n second convolution kernels 45 of size d × 1 × k, and further, n second convolution kernels 45 of size d × 1 × k are decomposed into n 'third convolution kernels 74 of size d × 1 × k and n fourth convolution kernels 75 of size 1 × n'.

Also, referring to fig. 4 and 7, the method of decomposition of the convolution kernel provided in fig. 7 can be viewed as the decomposition of n second convolution kernels 45 of size d x 1 x k in fig. 4.

Fig. 6 is a flow chart illustrating a method for performing convolution calculation using the decomposed convolution kernel obtained by the method shown in fig. 5, and referring to fig. 6, includes steps S601 to S603.

In an exemplary embodiment, steps S601 to S603 are another specific implementation manner of step S103, and refer to fig. 6 and 7:

in step S601, performing a first convolution on a feature matrix of the to-be-processed image with a size h × w × c by using k first convolution kernels with a size 1 × d × c, to obtain a first compressed feature matrix 71 with a size h × w × k;

in step S602, performing a third convolution on the first compressed feature matrix 71 with the size h × w × k by using n 'third convolution kernels 74 with the size d × 1 × k, so as to obtain a third compressed feature matrix 72 with the size h × w × n'; and

in step S603, the third compressed feature matrix 72 of size h × w × n 'is subjected to a fourth convolution using n fourth convolution kernels 75 of size 1 × n', to obtain a second compressed feature matrix 73 of size h × w × n. Wherein n' is a positive number.

Referring to fig. 4 and 7, it can be seen that the size of the second compressed feature matrix 43 obtained after convolution processing the image of fig. 4 is the same as the size of the second compressed feature matrix 73 obtained after improved convolution processing the image provided by the embodiment of fig. 7.

Meanwhile, in the second convolution processing image process shown in fig. 4, the amount of calculation is: w₃N × d × 1 × k × h × w. In the convolution processing of the image shown in fig. 7, the amount of computation of the third convolution kernel 74 is: n' × d × 1 × k × h × w; the amount of calculation of the fourth convolution kernel 75 is n × 1 × 1 × n' × h × w. The total amount of computation W in the convolution processing of the image shown in fig. 7₄＝(n’×d×1×k×h×w)+(n×1×1×n’×h×w)＝h×w×n’(dk+n)。

Further, the ratio of the calculated amount of the second convolution processing image process shown in fig. 4 to the sum of the third convolution process and the fourth convolution process shown in fig. 7

Taking the first layer convolution layer of the VGGNet-16 network as an example, namely the value of the number n of the original convolution kernels is 64, the value of the length d of the image to be processed is 3, and the value of k is less than 96; it can be seen that when n' is smaller, the ratio2 is less than 1, i.e., the amount of computation for the third convolution process and the fourth convolution process shown in fig. 7 after further decomposition of n second convolution kernels 45 with size d × 1 × k is reduced relative to the second convolution process shown in fig. 4. And, it can be seen that the smaller the value of n', the greater the compressibility of the model that can be obtained. Meanwhile, the value of n' can reduce the calculated amount, improve the processing efficiency and ensure a certain accurate classification or identification effect so as to meet different identification requirements. Specifically, the value of n' can be determined by the following procedure.

N second convolution kernels 45 of size d x 1 x k are decomposed and passed through

The error generated by the convolution process shown in fig. 7 is minimized and the value of n' is determined during the error minimization process. Thereby ensuring that the feature matrix output from the convolution calculation shown in fig. 7 after decomposition is similar to the feature matrix output from the second convolution calculation shown in fig. 4.

Wherein r () represents the non-linear function relu, y_iRepresents the output shown in fig. 4 obtained by convolution with n second convolution kernels 45 of size d x 1 x k; m is an n x n matrix with a rank of n and can be decomposed into an n x n 'matrix and an n' × n matrix; my_i+ b represents the utilization shown in fig. 7.

In an exemplary embodiment, the technical solutions provided by the embodiments shown in fig. 5 to 7 can be applied to a face image process. Referring to fig. 8, a live photograph and a certificate photograph (e.g., provincial certificates, etc.) to be compared are acquired in step S81 and step S82, respectively; detecting the faces in the images to be compared and aligning the faces in the two images in step S83; the method provided by the embodiment is applicable to step S84: compressing each layer in the sphere face recognition model by the method provided by the embodiment and retraining the model; used in step S85: obtaining a compressed sphere face recognition model; further, the similarity comparison in step S86 is performed and the comparison result is returned. Through the twice decomposition of the convolution kernel in the step S84, the technical effect of shortening the prediction time on the basis of keeping the accuracy of the original model as much as possible is achieved. Specifically, the prediction time is shortened from 184ms to 93 ms. Meanwhile, the accuracy of the face comparison data set lfw is reduced from 99.37% to 98.58%, and the prediction result is not influenced. Therefore, compared with the time shortened by one time, the accuracy rate is only reduced by less than 1%, and the efficiency of the face comparison model is greatly improved.

Fig. 9 is a flowchart illustrating a method of decomposing an original convolution kernel according to another embodiment of the present invention, and referring to fig. 9, includes steps S901 to S902.

In an exemplary embodiment, steps S901 to S902 are a specific implementation manner of step S102, and refer to fig. 9:

in step S901, decomposing n original convolutions of size d × c into k first convolution kernels of size 1 × d × c and n second convolution kernels of size d × 1 × k; and

in step S902, k first convolution kernels of size 1 × d × c are decomposed into p fifth convolution kernels of size 1 × d × c and k sixth convolution kernels of size 1 × p.

Referring to fig. 2 and 4, n original convolutions 23 of size d × c are decomposed into k first convolution kernels 44 of size 1 × d × c and n second convolution kernels 45 of size d × 1 × k, and further, k first convolution kernels 44 of size 1 × d × c are decomposed into p fifth convolution kernels of size 1 × d × c and k sixth convolution kernels of size 1 × p.

Meanwhile, referring to fig. 4, the method for decomposing the convolution kernel provided in this embodiment can be regarded as decomposing k first convolution kernels 44 with size 1 × d × c in fig. 4.

Fig. 10 is a flowchart illustrating a method of performing convolution calculation using the decomposed convolution kernel obtained by the method illustrated in fig. 9, and referring to fig. 10, includes steps S1001 to S1003.

In an exemplary embodiment, steps S1001 to S1003 are a specific implementation of step S103, and refer to fig. 10:

in step S1001, performing a fifth convolution on the feature matrix of the to-be-processed image with a size h × w × c by using p fifth convolution kernels with a size 1 × d × c, to obtain a fifth compressed feature matrix with a size h × w × p;

in step S1002, performing a sixth convolution on a fifth compressed feature matrix with a size h × w × p using k sixth convolution kernels with a size 1 × p, to obtain a first compressed feature matrix with a size h × w × k; and

in step S1003, a second convolution is performed on the first compressed feature matrix having a size h × w × k using n second convolution kernels having a size d × 1 × k, to obtain a second compressed feature matrix having a size h × w × n. Wherein p is a positive number.

Referring to fig. 4, it can be seen that the first compressed feature matrix 42 obtained after the convolution processing of the image in fig. 4 is the same as the output size obtained through step S1002 in the present embodiment.

Meanwhile, in the second convolution processing image process shown in fig. 4, the amount of calculation is: w₅K × d × 1 × c × h × w. In the convolution process performed in steps S1001 to S1002, the fifth convolution calculation amount is: p × 1 × d × c × h × w; the amount of computation of the sixth convolution is: k × 1 × 1 × p × h × w. The total calculation amount W from step S1001 to step S1902₆＝(p×1×d×c×h×w)+(k×1×1×p×h×w)＝h×w×p(dc+p)。

Further, the second convolution processing image process shown in fig. 4 is a calculation amount ratio to the sum of the fifth convolution process and the sixth convolution process of the convolution performed in steps S1001 to S1002

It can be seen that the smaller the p-value, the greater the obtainable model compression ratio. Meanwhile, the value of p in this embodiment can be determined by referring to the value processes of k and n' in the above embodiments, which are not described herein again.

Fig. 11 shows a flowchart of a method for decomposing an original convolution kernel according to still another embodiment of the present invention, and referring to fig. 11, includes steps S1101-S1102.

In an exemplary embodiment, steps S1101-S1102 are still another specific implementation manner of step S102, and refer to fig. 11:

in step S1101, decomposing n original convolutions of size d × c into k first convolution kernels of size 1 × d × c and n second convolution kernels of size d × 1 × k; and

in step S1102, k first convolution kernels of size 1 × d × c are decomposed into p fifth convolution kernels of size 1 × d × c and k sixth convolution kernels of size 1 × p.

Referring to fig. 2 and 4, n original convolutions 23 of size d × c are decomposed into k first convolution kernels 44 of size 1 × d × c and n second convolution kernels 45 of size d × 1 × k, and further, k first convolution kernels 44 of size 1 × d × c are decomposed into p fifth convolution kernels of size 1 × d × c and k sixth convolution kernels of size 1 × p, and n second convolution kernels 45 of size d × 1 × k are decomposed into n 'third convolution kernels 74 of size d × 1 × k and n fourth convolution kernels 75 of size 1 × 1 n'.

As can be seen, in this embodiment, after the original convolution kernel is decomposed, both of the two decomposed convolution kernels are further decomposed. Thus, step S902 and step S502 are integrated.

Meanwhile, referring to fig. 4, the method for decomposing the convolution kernel provided in this embodiment can be regarded as decomposing k first convolution kernels 44 with a size of 1 × d × c in fig. 4 and decomposing n second convolution kernels 45 with a size of d × 1 × k.

Fig. 12 is a flowchart illustrating a method of performing convolution calculation using the decomposed convolution kernel obtained by the method illustrated in fig. 11, and referring to fig. 12, includes steps S1201 to S1204.

In an exemplary embodiment, steps S1201 to S1204 are a specific implementation manner of step S103, and refer to fig. 11:

in step S1201, performing a fifth convolution on the feature matrix of the to-be-processed image with a size h × w × c using p fifth convolution kernels with a size 1 × d × c, to obtain a fifth compressed feature matrix with a size h × w × p;

in step S1202, performing a sixth convolution on a fifth compressed feature matrix with a size h × w × p by using k sixth convolution kernels with a size 1 × p, to obtain a first compressed feature matrix with a size h × w × k;

in the exemplary embodiment, steps S1201 to S1202 are the same as the specific implementation process of steps S1001 and S1002, and are not described herein again.

In step S1203, performing a third convolution on the first compressed feature matrix with the size h × w × k by using n 'third convolution kernels with the size d × 1 × k, so as to obtain a third compressed feature matrix with the size h × w × n';

in step S1204, a fourth convolution is performed on the third compressed feature matrix having a size h × w × n 'using n fourth convolution kernels having a size 1 × n', to obtain a second compressed feature matrix having a size h × w × n.

In an exemplary embodiment, steps S1203 to S1204 are the same as the specific implementation processes of steps S602 and S603, and are not described herein again.

Embodiments of the apparatus of the present invention will be described below, which can be used to perform the convolution-based image processing method of the present invention described above.

Fig. 13 is a schematic diagram showing a configuration of a convolution-based image processing apparatus according to an embodiment of the present invention, and referring to fig. 13, a convolution-based image processing apparatus 1300 includes: an acquisition module 1301, a decomposition module 1302 and a calculation module 1303.

The obtaining module 1301 is configured to obtain a feature matrix and original convolution kernel information of an image to be processed; the decomposition module 1302 is configured to decompose the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels; the calculating module 1303 is configured to perform convolution calculation on the feature matrix by using the decomposed convolution kernel, so as to complete compression processing on the image to be processed.

In an exemplary embodiment, the feature matrix size of the image to be processed is h × w × c, and the original convolution kernel information includes: the size of the original convolution kernel is d × c, and the number of the original convolution kernels is n;

the decomposition module 1302 is specifically configured to: decomposing n original convolution kernels with the size of d x c into k first convolution kernels with the size of 1 x d x c and n second convolution kernels with the size of d x 1 x k, wherein h, w, d, c, n and k are positive numbers.

In an exemplary embodiment, the computing module 1303 includes: a first convolution unit and a second convolution unit.

Wherein the first convolution unit is configured to: performing first convolution on a feature matrix of the to-be-processed image with the size h x w x c by using k first convolution kernels with the size 1 x d x c to obtain a first compressed feature matrix with the size h x w x k; the second convolution unit is to: and performing second convolution on the first compression feature matrix with the size h x w x k by using n second convolution kernels with the size d x 1 x k to obtain a second compression feature matrix with the size h x w x n.

In an exemplary embodiment, the decomposition module 1302 is further configured to: decomposing n second convolution kernels of size d x 1 x k into n 'third convolution kernels of size d x 1 x k and n fourth convolution kernels of size 1 x n';

the calculation module 1303 further includes: a third convolution unit and a fourth convolution unit.

Wherein the first convolution unit is configured to: performing first convolution on a feature matrix of the to-be-processed image with the size h x w x c by using k first convolution kernels with the size 1 x d x c to obtain a first compressed feature matrix with the size h x w x k; the third convolution unit is to: performing third convolution on the first compression feature matrix with the size h x w x k by using n 'third convolution kernels with the size d x 1 x k to obtain a third compression feature matrix with the size h x w x n'; the fourth convolution unit is to: and performing fourth convolution on the third compression feature matrix with the size h x w x n ' by using n fourth convolution checks with the size 1 x n ', and obtaining a second compression feature matrix with the size h x w x n, wherein n ' is a positive number.

In an exemplary embodiment, the decomposition module 1302 is further configured to: decomposing k first convolution kernels of size 1 x d c into p fifth convolution kernels of size 1 x d c and k sixth convolution kernels of size 1 x p;

the calculation module 1303 further includes: a fifth convolution unit and a sixth convolution unit.

Wherein the fifth convolution unit is to: performing fifth convolution on the feature matrix of the image to be processed with the size h x w x c by using p fifth convolution kernels with the size 1 x d x c to obtain a fifth compressed feature matrix with the size h x w x p; the sixth convolution unit is used for performing sixth convolution on a fifth compression feature matrix with the size h x w x p by using k sixth convolution kernels with the size 1 x p, so as to obtain a first compression feature matrix with the size h x w x k, wherein p is a positive number; the second convolution unit is used for performing second convolution on the first compression feature matrix with the size h x w x k by using n second convolution cores with the size d x 1 x k to obtain a second compression feature matrix with the size h x w x n.

In an exemplary embodiment, the decomposition module 1302 is further configured to: decomposing n second convolution kernels of size d x 1 x k into n 'third convolution kernels of size d x 1 x k and n fourth convolution kernels of size 1 x n', and decomposing k first convolution kernels of size 1 x d c into p fifth convolution kernels of size 1 x d c and k sixth convolution kernels of size 1 x p;

the calculation module 1303 further includes: a fifth convolution unit, a sixth convolution unit, a third convolution unit and a fourth convolution unit.

The fifth convolution unit is used for performing fifth convolution on the feature matrix of the image to be processed with the size h x w x c by using p fifth convolution kernels with the size 1 x d x c to obtain a fifth compressed feature matrix with the size h x w x p; the sixth convolution unit is used for performing sixth convolution on a fifth compression feature matrix with the size h x w x p by using k sixth convolution kernels with the size 1 x p to obtain a first compression feature matrix with the size h x w x k; the third convolution unit is used for performing third convolution on the first compression feature matrix with the size h x w x k by using n 'third convolution cores with the size d x 1 x k to obtain a third compression feature matrix with the size h x w x n'; and the fourth convolution unit is used for performing fourth convolution on the third compression feature matrix with the size h x w x n ' by using n fourth convolution cores with the size 1 x n ', so as to obtain a second compression feature matrix with the size h x w x n, wherein p and n ' are positive numbers.

In an exemplary embodiment, the values of k and n' are related to the compression rate of the image to be processed.

In an exemplary embodiment, the values of k and p are related to the compression rate of the image to be processed.

Since the functional blocks of the convolution-based image processing apparatus according to the exemplary embodiment of the present invention correspond to the steps of the exemplary embodiment of the convolution-based image processing method described above, for details that are not disclosed in the embodiments of the apparatus according to the present invention, please refer to the embodiments of the convolution-based image processing method described above according to the present invention.

Referring now to FIG. 14, shown is a block diagram of a computer system 1400 suitable for use with the electronic device implementing an embodiment of the present invention. The computer system 1400 of the electronic device shown in fig. 14 is only an example, and should not bring any limitations to the function and the scope of the use of the embodiments of the present invention.

As shown in fig. 14, the computer system 1400 includes a Central Processing Unit (CPU)1401, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data necessary for system operation are also stored. The CPU 1401, ROM 1402, and RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.

The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; a storage portion 1408 including a hard disk and the like; and a communication portion 1409 including a network interface card such as a LAN card, a modem, or the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. The above-described functions defined in the system of the present application are executed when the computer program is executed by a Central Processing Unit (CPU) 1401.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the convolution-based image processing method as described in the above-mentioned embodiments.

For example, the electronic device may implement the following as shown in fig. 1: step S101, acquiring a feature matrix and original convolution kernel information of an image to be processed; step S102, decomposing the original convolution kernel according to the original convolution kernel information to obtain at least two convolution kernels; and step S103, performing convolution calculation on the characteristic matrix by using the decomposed convolution kernel so as to complete compression processing on the image to be processed.

As another example, the electronic device may implement the steps shown in fig. 3.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A convolution-based image processing method, comprising:

2. The method of claim 1, wherein the feature matrix size of the image to be processed is h x w c, and wherein the raw convolution kernel information comprises: the size of the original convolution kernel is d × c, and the number of the original convolution kernels is n;

3. The method of claim 2, wherein performing convolution calculations on the feature matrix using the decomposed convolution kernel comprises:

4. The method of claim 2, further comprising, prior to performing convolution calculations on the feature matrix using the decomposed convolution kernel:

5. The method of claim 2, further comprising, prior to performing convolution calculations on the feature matrix using the decomposed convolution kernel:

6. The method of claim 2, further comprising, prior to performing convolution calculations on the feature matrix using the decomposed convolution kernel:

7. The method according to claim 4 or 6, wherein the values of k and n' are related to the compression rate of the image to be processed.

8. The method according to claim 5 or 6, wherein the values of k and p are related to the compression rate of the image to be processed.

9. A convolution-based image processing apparatus, comprising:

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a convolution-based image processing method according to any one of claims 1 to 8.

11. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the convolution-based image processing method of any one of claims 1 to 8.