WO2020233010A1

WO2020233010A1 - Image recognition method and apparatus based on segmentable convolutional network, and computer device

Info

Publication number: WO2020233010A1
Application number: PCT/CN2019/117743
Authority: WO
Inventors: 王健宗; 师燕妮; 王威; 韩茂琨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-05-23
Filing date: 2019-11-13
Publication date: 2020-11-26
Also published as: CN110298346A

Abstract

Disclosed are an image recognition method and apparatus based on a segmentable convolutional network, and a computer device and a storage medium. The method comprises: receiving original image data; inputting a pixel matrix corresponding to the original image data into a pre-constructed first convolutional network in a convolutional layer for convolution, so as to obtain a first output matrix; inputting the first output matrix into a pre-constructed second convolutional network in the convolutional layer for convolution, so as to obtain a second output matrix; inputting the second output matrix into a pooling layer for pooling, so as to obtain a pooling result; and inputting the pooling result into a fully connected layer to obtain a recognition result corresponding to the original image data, and sending the recognition result to an upload end corresponding to the original image data.

Description

Image recognition method, device and computer equipment based on separable convolutional network

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 23, 2019, the application number is 201910433281.2, and the application title is "Image recognition method, device and computer equipment based on a separable convolutional network", all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of image recognition technology, and in particular to an image recognition method, device, computer equipment and storage medium based on a separable convolutional network.

Background technique

At present, when performing image recognition, when a standard convolutional network is used, the input data is generally convolved and then input to the pooling layer for pooling. After one or more sets of convolutional pooling processing, the dimensionality-reduced pooling is obtained As a result, follow-up calculations are performed, but the calculation amount of the standard convolutional network is large, and the training time of the data set is long, which can no longer meet the requirements for better and faster model training and use.

Summary of the invention

The embodiments of the application provide an image recognition method, device, computer equipment, and storage medium based on a separable convolutional network, aiming to solve the problem of using a standard convolutional network for image recognition in the prior art, which requires a large amount of calculation and a data set The problem of long training time.

In the first aspect, an embodiment of the present application provides an image recognition method based on a separable convolutional network, which includes:

Receive original image data;

Input the pixel matrix corresponding to the original image data to the first convolutional network pre-built in the convolutional layer for convolution to obtain the first output matrix;

Inputting the first output matrix to a second convolutional network pre-built in the convolutional layer for convolution to obtain a second output matrix;

Input the second output matrix to the pooling layer for pooling to obtain a pooling result; and

The pooling result is input to the fully connected layer to obtain the recognition result corresponding to the original image data, and the recognition result is sent to the upload terminal corresponding to the original image data.

In the second aspect, an embodiment of the present application provides an image recognition device based on a separable convolutional network, which includes:

Picture receiving unit for receiving original image data;

The shallow convolution unit is configured to input the pixel matrix corresponding to the original image data to the first convolution network constructed in the convolution layer for convolution to obtain the first output matrix;

A deep convolution unit, configured to input the first output matrix into a second convolution network constructed in the convolution layer for convolution to obtain a second output matrix;

A pooling unit for inputting the second output matrix to the pooling layer for pooling, and obtaining a pooling result; and

The recognition result obtaining unit is configured to input the pooling result to the fully connected layer to obtain the recognition result corresponding to the original image data, and send the recognition result to the uploader corresponding to the original image data.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer The program implements the image recognition method based on the separable convolutional network described in the first aspect.

In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned An image recognition method based on a separable convolutional network described in one aspect.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of an application scenario of an image recognition method based on a segmentable convolutional network provided by an embodiment of the application;

2 is a schematic flowchart of an image recognition method based on a segmentable convolutional network provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a sub-process of an image recognition method based on a separable convolutional network provided by an embodiment of the application;

FIG. 4 is a schematic diagram of another sub-process of the image recognition method based on a separable convolutional network provided by an embodiment of the application;

FIG. 5 is a schematic diagram of another sub-flow of the image recognition method based on a separable convolutional network provided by an embodiment of the application;

6 is a schematic block diagram of an image recognition device based on a separable convolutional network provided by an embodiment of the application;

FIG. 7 is a schematic block diagram of subunits of an image recognition device based on a separable convolutional network provided by an embodiment of the application;

FIG. 8 is a schematic block diagram of another subunit of the image recognition device based on a separable convolutional network according to an embodiment of the application;

9 is a schematic block diagram of another sub-unit of the image recognition device based on a separable convolutional network provided by an embodiment of the application;

FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be understood that when used in this specification and the appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.

It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

Please refer to FIGS. 1 and 2. FIG. 1 is a schematic diagram of an application scenario of an image recognition method based on a separable convolutional network provided by an embodiment of the application; FIG. 2 is an image recognition based on a separable convolutional network provided by an embodiment of the application A schematic flow chart of the method. The image recognition method based on a segmentable convolutional network is applied to a server, and the method is executed by application software installed in the server.

As shown in Fig. 2, the method includes steps S110 to S150.

S110. Receive original image data.

In this embodiment, when the user needs to obtain the image recognition result of the target image, the user terminal (ie uploader) is operated to upload the original image data through the user interaction interface of the interface provided by the server, and the image recognition model in the server will The original image data is recognized to obtain the recognition result.

S120. Input a pixel matrix corresponding to the original image data to a first convolutional network pre-built in the convolutional layer for convolution to obtain a first output matrix.

In this embodiment, after the original image data is acquired, it needs to be correspondingly converted into a pixel matrix, and subsequent processing is performed on the pixel matrix. In the prior art, when a convolutional neural network is used, the pixel matrix of the original image data is directly input to the convolutional layer for convolution, and then input to the pooling layer for pooling, and finally the pooling result is input to the fully connected layer Get the recognition result. However, since the pixel matrix of the original image is directly input to the convolutional layer for convolution, the degree of compression may not be sufficient. Therefore, a separable convolutional network based on the improvement of the standard convolutional network is adopted in this application, which is not limited to Perform a convolution.

In an embodiment, as shown in FIG. 3, step S120 includes:

S121: Perform convolution on the pixel matrix through a 3*3 depth convolution kernel to obtain a first convolution result;

S122: Perform normalization processing on each value included in the first convolution result to obtain a first normalization result;

S123: Activate the first normalized result through a first activation function to obtain a first output matrix.

In this embodiment, the 3*3 deep convolution kernel is Depthwise Convolution (Depthwise Convolution is deep convolution, which is a basic idea for constructing a model and can effectively reduce the computational complexity of a deep neural network). The process of convolution can be understood as using a filter (convolution kernel) to filter each small area of the image, so as to obtain the feature value of these small areas. After the first convolution result is normalized and the activation function is activated, a shallow convolution is realized, and the convolution of the depth dimension in the pixel matrix is realized.

For each input channel, one D_k*D_k*1 convolution kernel is used for convolution. A total of M convolution kernels are used, and M operations are performed to obtain M D_f*D_f*1 feature maps (first The output matrix can be regarded as a feature map). These feature maps are learned from different input channels and are independent of each other.

In an embodiment, step S121 includes:

Obtain the number of input channels in the pixel matrix, and traverse the pixel matrix to perform convolution through a 3*3 depth convolution kernel with the same number as the number of input channels to obtain a first convolution result.

In this embodiment, when the convolution is performed by the depth convolution kernel, for example, the input picture is Dk*Dk*M (Dk is the picture size, M is the number of input channels), then there are M depth volumes of Dw*Dw The product kernel is to perform convolution with M channels respectively, and output the D_f*D_f*M result. That is, for each input channel, one D_k*D_k*1 convolution kernel is used for convolution, a total of M convolution kernels are used, and M operations are performed to obtain M D_f*D_f*1 feature maps.

In the deep convolution calculation, compared with the standard convolution, each channel is independent, so the sum subscript does not need M. Here, M operations are expressed as a formula. The standard convolution calculation amount is: D_k*D_k*M*N*D_f*D_f. That is to say, to calculate D_f*D_f values, calculating each value requires multiplying the values of all corresponding sliding windows, and then adding the values of all channels. In this embodiment, the calculation amount of convolution needs to calculate D_f*D_f values, the calculation amount each time is D_k*D_k, and the loop M times is D_k*D_k*M*D_f*D_f. By convolving the depth dimension in the pixel matrix, the pixel matrix is made thinner, so that the subsequent calculation amount is reduced.

In an embodiment, as shown in FIG. 4, step S122 includes:

S1221. Obtain a first average value corresponding to all values in the first convolution result.

S1222: Acquire first variances corresponding to all values in the first convolution result;

S1223. Divide each difference obtained by subtracting the first variance from each value in the first convolution result by the first variance to obtain a first normalized result.

In this embodiment, the normalization processing (Batch Normalization, which means normalization) is performed on the convolution result to solve the problem that the data distribution in the middle layer changes during the calculation process to prevent the gradient from disappearing or exploding and speeding up Training speed. In the specific normalization process, the first average value corresponding to all the values in the first convolution result is calculated first, and then the first variance corresponding to all the values in the first convolution result is calculated, and finally passed Divide each difference obtained by subtracting the first variance from each value in the first convolution result by the first variance to obtain a first normalized result.

In an embodiment, step S123 includes:

The negative value in the first normalized result is set to zero by the first activation function to activate the first output matrix.

In this embodiment, the first normalized result is activated by the first activation function to obtain the first output matrix, which increases the nonlinear relationship between the layers of the neural network. Otherwise, if there is no activation function , There is a simple linear relationship between layers, and each layer is equivalent to matrix multiplication, which cannot complete the complex tasks required by the neural network. In a specific implementation, the first activation function is a Relu function (Rectified linear unit, representing a modified linear unit), and the function of the Relu function is to increase the nonlinear relationship between the layers of the neural network. The expression of the Relu function is as follows: f(x)=max(0,x), that is, only the positive value in the first normalized result is retained, and the negative value in the first normalized result is set Zero to activate the first output matrix.

S130. Input the first output matrix to a second convolutional network pre-built in the convolutional layer for convolution to obtain a second output matrix.

In this embodiment, after the shallow convolution is completed, the width dimension convolution can be performed through the second convolution network constructed in the convolution layer in advance, and this convolution process is regarded as deep convolution.

In an embodiment, as shown in FIG. 5, step S130 includes:

S131: Perform convolution on the first output matrix through a 1*1 convolution kernel to obtain a second convolution result;

S132: Perform normalization processing on each value included in the second convolution result to obtain a second normalization result;

S133: Activate the second normalized result through a second activation function to obtain a second output matrix.

In this embodiment, for the M feature maps obtained in step S120 as the input of M channels, standard convolution is performed with N 1×1×M convolution kernels to obtain the output of D_f*D_f*N. The calculation amount analysis is about 1*1*M*N*D_f*D_f, and the calculation amount that can be saved is: 1/N+1/D_k ² . The general convolution kernel is 3*3, and the amount of calculation can be saved about 9 times. Among them, the normalization process in step S132 is the same as in step S122, and the activation function in step S133 is the same as in step S123.

S140. Input the second output matrix to a pooling layer for pooling, and obtain a pooling result.

In this embodiment, inputting the second output matrix to the pooling layer for pooling is to further sample the second output matrix to reduce dimensionality.

The original picture is 20*20, it is down-sampled, the sampling window is 10*10, and finally it is down-sampled into a 2*2 feature map.

The reason for pooling is because even after convolution, the image is still very large (because the convolution kernel is relatively small), so in order to reduce the data dimension, downsampling is performed. In the process of pooling, even if a lot of data is reduced, the statistical attributes of the features can still describe the image, and because the data dimension is reduced, overfitting is effectively avoided.

In practical applications, pooling is divided into maximum down sampling (Max-Pooling) and average down sampling (Mean-Pooling) according to the down-sampling method. That is, the second output matrix is input to the pooling layer to perform pooling through maximum down sampling or average down sampling to obtain a pooling result.

For example, the above original picture is 20*20 in size, the maximum value is downsampled, and the sampling window is 10*10, then the area where the original picture is 20*20 is divided into upper left, upper right, lower left, and lower right 4 10*10 areas, the maximum value of each 10*10 area is taken as the characteristic value of the area, then the maximum value is down-sampled, and the average value of each 10*10 area is taken as the characteristic value of the area Is the maximum downsampling. After the above processing, not only the key features of the image are retained, but also the dimensionality reduction is achieved.

S150. Input the pooling result to a fully connected layer to obtain a recognition result corresponding to the original image data, and send the recognition result to an uploader corresponding to the original image data.

In this embodiment, fully connected layers (FC) function as a "classifier" in the entire convolutional neural network. If operations such as the convolutional layer, pooling layer, and activation function layer map the original data to the hidden layer feature space, the fully connected layer functions to map the learned "distributed feature representation" to the sample label space. In actual use, the fully connected layer can be realized by the convolution operation: the fully connected layer that is fully connected to the previous layer can be converted into a convolution with a convolution kernel of 1*1; and the fully connected layer of the convolutional layer in the previous layer can be Converted into a global convolution with the convolution kernel h*w, where h and w are the height and width of the previous convolution result. After the recognition result is obtained, the recognition result is sent to the upload terminal corresponding to the original image data to notify the user to obtain the recognition result.

This method adopts the image recognition of the segmentable convolutional network, which reduces the amount of calculation in the image recognition process.

An embodiment of the present application also provides an image recognition device based on a separable convolutional network, and the image recognition device based on a separable convolutional network is used to execute any embodiment of the aforementioned image recognition method based on a separable convolutional network. Specifically, please refer to FIG. 6, which is a schematic block diagram of an image recognition apparatus based on a separable convolutional network provided by an embodiment of the present application. The image recognition device 100 based on a separable convolutional network can be configured in a server.

As shown in FIG. 6, the image recognition device 100 based on the separable convolutional network includes a picture receiving unit 110, a shallow convolution unit 120, a deep convolution unit 130, a pooling unit 140, and a recognition result obtaining unit 150.

The picture receiving unit 110 is used to receive original image data.

The shallow convolution unit 120 is configured to input the pixel matrix corresponding to the original image data to the first convolution network constructed in the convolution layer for convolution to obtain a first output matrix.

In an embodiment, as shown in FIG. 7, the shallow convolution unit 120 includes:

The first convolution unit 121 is configured to convolve the pixel matrix with a 3*3 deep convolution kernel to obtain a first convolution result;

The first normalization unit 122 is configured to perform normalization processing on each value included in the first convolution result to obtain a first normalization result;

The first activation unit 123 is configured to activate the first normalized result through a first activation function to obtain a first output matrix.

In an embodiment, the first convolution unit 121 is further configured to:

In an embodiment, as shown in FIG. 8, the first normalization unit 122 includes:

The average value obtaining unit 1221 is configured to obtain the first average value corresponding to all the values in the first convolution result;

The variance obtaining unit 1222 is configured to obtain the first variance corresponding to all the values in the first convolution result;

The normalization calculation unit 1223 is configured to divide each difference obtained by subtracting the first variance from each value in the first convolution result by the first variance to obtain the first normalization result.

In an embodiment, the first activation unit 123 is further configured to:

The deep convolution unit 130 is configured to input the first output matrix into a second convolutional network pre-built in the convolution layer for convolution to obtain a second output matrix.

In an embodiment, as shown in FIG. 9, the deep convolution unit 130 includes:

The second convolution unit 131 is configured to convolve the first output matrix with a 1*1 convolution kernel to obtain a second convolution result;

A second normalization unit 132, configured to 132, normalize each value included in the second convolution result to obtain a second normalization result;

The second activation unit 133 is configured to activate the second normalized result through a second activation function to obtain a second output matrix.

The pooling unit 140 is configured to input the second output matrix to the pooling layer for pooling, and obtain a pooling result.

The recognition result obtaining unit 150 is configured to input the pooling result into the fully connected layer to obtain the recognition result corresponding to the original image data, and send the recognition result to the uploader corresponding to the original image data.

The device adopts image recognition of a segmentable convolutional network, which reduces the amount of calculation in the image recognition process.

The above-mentioned image recognition apparatus based on a separable convolutional network may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in FIG. 10.

Please refer to FIG. 10, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.

10, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, the processor 502 can execute an image recognition method based on a separable convolutional network.

The processor 502 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can make the processor 502 execute an image recognition method based on a separable convolutional network.

The network interface 505 is used for network communication, such as providing data information transmission. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.

Wherein, the processor 502 is configured to run a computer program 5032 stored in a memory to implement the image recognition method based on a separable convolutional network in the embodiment of the present application.

Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 10 does not constitute a limitation on the specific configuration of the computer device. In other embodiments, the computer device may include more or less components than shown in the figure. Or combine certain components, or different component arrangements. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 10, and will not be repeated here.

It should be understood that, in this embodiment of the application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the image recognition method based on a separable convolutional network in the embodiment of the present application.

The storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the equipment, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

An image recognition method based on a separable convolutional network, including:

Receive original image data;

Input the pixel matrix corresponding to the original image data to the first convolutional network pre-built in the convolutional layer for convolution to obtain the first output matrix;

Inputting the first output matrix to a second convolutional network pre-built in the convolutional layer for convolution to obtain a second output matrix;

Input the second output matrix to the pooling layer for pooling to obtain a pooling result; and

The pooling result is input to the fully connected layer to obtain the recognition result corresponding to the original image data, and the recognition result is sent to the upload terminal corresponding to the original image data.
The image recognition method based on a separable convolutional network according to claim 1, wherein the pixel matrix corresponding to the original image data is input to the first convolutional network constructed in advance in the convolutional layer for convolution , Get the first output matrix, including:

Convolve the pixel matrix with a 3*3 deep convolution kernel to obtain a first convolution result;

Normalize each value included in the first convolution result to obtain a first normalized result;

The first normalized result is activated through a first activation function to obtain a first output matrix.
The image recognition method based on a separable convolutional network according to claim 1, wherein said inputting said first output matrix into a second convolutional network pre-built in the convolutional layer for convolution to obtain a second The output matrix includes:

Convolve the first output matrix with a 1*1 convolution kernel to obtain a second convolution result;

Normalize each value included in the second convolution result to obtain a second normalized result;

The second normalized result is activated through a second activation function to obtain a second output matrix.
The image recognition method based on a separable convolutional network according to claim 2, wherein the convolution of the pixel matrix by a 3*3 deep convolution kernel to obtain the first convolution result comprises:

Obtain the number of input channels in the pixel matrix, and traverse the pixel matrix to perform convolution through a 3*3 depth convolution kernel with the same number as the number of input channels to obtain a first convolution result.
The image recognition method based on a separable convolutional network according to claim 2, wherein the normalization process is performed on each value included in the first convolution result to obtain a first normalization result ,include:

Obtaining a first average value corresponding to all values in the first convolution result;

Obtaining first variances corresponding to all values in the first convolution result;

Divide each difference obtained by subtracting the first variance from each value in the first convolution result by the first variance to obtain a first normalized result.
The image recognition method based on a separable convolutional network according to claim 2, wherein said activating said first normalized result through a first activation function to obtain a first output matrix comprises:

The negative value in the first normalized result is set to zero by the first activation function to activate the first output matrix.
The image recognition method based on a separable convolutional network according to claim 1, wherein said inputting said second output matrix to a pooling layer for pooling to obtain a pooling result comprises:

The second output matrix is input to the pooling layer to perform pooling through maximum down sampling or average down sampling to obtain a pooling result.
The image recognition method based on a separable convolutional network according to claim 3, wherein the convolution of the first output matrix through a 1*1 convolution kernel to obtain a second convolution result comprises:

Obtain the number of input channels in the first output matrix, and traverse the first output matrix to perform convolution through a 1*1 convolution kernel with the same number of input channels as the number of input channels in the first output matrix. 2. Convolution result.
The image recognition method based on a separable convolutional network according to claim 1, wherein the inputting the pooling result to the fully connected layer to obtain the recognition result corresponding to the original image data comprises:

The pooling result is input to a fully connected layer for global convolution, and a recognition result corresponding to the original image data is obtained.
An image recognition device based on a separable convolutional network, including:

Picture receiving unit for receiving original image data;

The shallow convolution unit is configured to input the pixel matrix corresponding to the original image data to the first convolution network constructed in the convolution layer for convolution to obtain the first output matrix;

A deep convolution unit, configured to input the first output matrix into a second convolution network constructed in advance in the convolution layer for convolution to obtain a second output matrix;

A pooling unit for inputting the second output matrix to the pooling layer for pooling, and obtaining a pooling result; and

The recognition result obtaining unit is configured to input the pooling result to the fully connected layer to obtain the recognition result corresponding to the original image data, and send the recognition result to the uploader corresponding to the original image data.
A computer device includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the following steps:

Receive original image data;

Input the pixel matrix corresponding to the original image data to the first convolutional network pre-built in the convolutional layer for convolution to obtain the first output matrix;

Inputting the first output matrix to a second convolutional network pre-built in the convolutional layer for convolution to obtain a second output matrix;

Input the second output matrix to the pooling layer for pooling to obtain a pooling result; and

The pooling result is input to the fully connected layer to obtain the recognition result corresponding to the original image data, and the recognition result is sent to the upload terminal corresponding to the original image data.
11. The computer device according to claim 11, wherein the input of the pixel matrix corresponding to the original image data to a first convolutional network pre-built in the convolutional layer for convolution to obtain the first output matrix comprises :

Convolve the pixel matrix with a 3*3 deep convolution kernel to obtain a first convolution result;

Normalize each value included in the first convolution result to obtain a first normalized result;

The first normalized result is activated through a first activation function to obtain a first output matrix.
11. The computer device according to claim 11, wherein said inputting said first output matrix to a second convolutional network pre-built in a convolutional layer for convolution to obtain a second output matrix comprises:

Convolve the first output matrix with a 1*1 convolution kernel to obtain a second convolution result;

Normalize each value included in the second convolution result to obtain a second normalized result;

The second normalized result is activated through a second activation function to obtain a second output matrix.
The computer device according to claim 12, wherein the convolution of the pixel matrix with a 3*3 depth convolution kernel to obtain the first convolution result comprises:

Obtain the number of input channels in the pixel matrix, and traverse the pixel matrix to perform convolution through a 3*3 depth convolution kernel with the same number as the number of input channels to obtain a first convolution result.
The computer device according to claim 12, wherein said normalizing each value included in said first convolution result to obtain a first normalized result comprises:

Obtaining a first average value corresponding to all values in the first convolution result;

Obtaining first variances corresponding to all values in the first convolution result;

Divide each difference obtained by subtracting the first variance from each value in the first convolution result by the first variance to obtain a first normalized result.
11. The computer device according to claim 12, wherein said activating said first normalized result through a first activation function to obtain a first output matrix comprises:

The negative value in the first normalized result is set to zero by the first activation function to activate the first output matrix.
11. The computer device according to claim 11, wherein said inputting said second output matrix to a pooling layer for pooling to obtain a pooling result comprises:

The second output matrix is input to the pooling layer to perform pooling through maximum down sampling or average down sampling to obtain a pooling result.
The computer device according to claim 13, wherein the convolution of the first output matrix by a 1*1 convolution kernel to obtain a second convolution result comprises:

Obtain the number of input channels in the first output matrix, and traverse the first output matrix to perform convolution through a 1*1 convolution kernel with the same number of input channels as the number of input channels in the first output matrix. 2. Convolution result.
11. The computer device according to claim 11, wherein said inputting said pooling result to a fully connected layer to obtain a recognition result corresponding to said original image data comprises:

The pooling result is input to a fully connected layer for global convolution, and a recognition result corresponding to the original image data is obtained.
A computer-readable storage medium that stores a computer program that, when executed by a processor, causes the processor to perform the following operations:

Receive original image data;

Inputting the pixel matrix corresponding to the original image data to the first convolutional network pre-built in the convolutional layer for convolution to obtain the first output matrix;

Inputting the first output matrix to a second convolutional network pre-built in the convolutional layer for convolution to obtain a second output matrix;

Input the second output matrix to the pooling layer for pooling to obtain a pooling result; and

The pooling result is input to the fully connected layer to obtain the recognition result corresponding to the original image data, and the recognition result is sent to the upload terminal corresponding to the original image data.