CN112651451B

CN112651451B - Image recognition method, device, electronic equipment and storage medium

Info

Publication number: CN112651451B
Application number: CN202011606881.3A
Authority: CN
Inventors: 宋希彬; 周定富; 方进; 张良俊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Baidu USA LLC
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Baidu USA LLC
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2023-08-11
Anticipated expiration: 2040-12-30
Also published as: CN112651451A

Abstract

The invention provides an image recognition method, an image recognition device, electronic equipment and a storage medium, relates to the technical field of image processing, in particular to the artificial intelligence field of computer vision, deep learning and the like, and specifically adopts the implementation scheme that an image to be recognized is obtained, and the image characteristics of the image to be recognized are extracted; performing dimension reduction processing on the image features to obtain dimension reduction image features; enhancing the dimension-reduced image features on a feature extraction channel to obtain first enhanced image features; enhancing the dimension-reduced image features on pixels to obtain second enhanced image features; and acquiring the texture type of the image to be identified based on the first enhanced image feature and the second enhanced image feature. In the method, the image characteristics are enhanced and fused, so that the expression capacity of the image characteristics is enhanced, and the accuracy of identifying the texture types of the image is improved.

Description

Image recognition method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to the field of artificial intelligence technologies such as computer vision and deep learning.

Background

Based on traditional machine learning or deep learning, additional training data sets are needed to predict the texture information of the image, however, the nonlinear expression capacity of the traditional machine learning is often limited, and the deep learning has the problem that the image feature extraction is insufficient, so that the prediction accuracy of the image texture information is not high.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, storage medium, and computer program product for image recognition.

According to an aspect of the present disclosure, an image recognition method is provided, including acquiring an image to be recognized, and extracting image features of the image to be recognized; performing dimension reduction processing on the image features to obtain dimension reduction image features; enhancing the dimension-reduced image features on a feature extraction channel to obtain first enhanced image features; enhancing the dimension-reduced image features on pixels to obtain second enhanced image features; and acquiring the texture type of the image to be identified based on the first enhanced image feature and the second enhanced image feature.

According to a second aspect of the present disclosure, there is provided an image recognition apparatus including: the feature extraction module is used for acquiring an image to be identified and extracting image features of the image to be identified; the dimension reduction module is used for carrying out dimension reduction processing on the image features to obtain dimension reduction image features; the first enhancement module is used for enhancing the image features on the feature extraction channel so as to obtain first enhanced image features; the second enhancement module is used for enhancing the image features on pixels so as to obtain second enhanced image features; and the texture recognition module is used for acquiring the texture type of the image to be recognized based on the first enhanced image feature and the second enhanced image feature.

According to a third aspect of the present disclosure, an electronic device is presented, wherein the electronic device comprises a processor and a memory; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory for realizing the image recognition method as set forth in the first aspect above.

According to a fourth aspect of the present disclosure, a computer-readable storage medium is presented, on which a computer program is stored, comprising the program, when executed by a processor, implementing the image recognition method as presented in the first aspect above. A computer program product comprising instructions which when executed by a processor in the computer program product implement the image recognition method as set out in the first aspect above.

According to a fifth aspect of the present disclosure, a computer program product is presented, characterized in that the image recognition method as presented in the first aspect above is implemented when an instruction processor in the computer program product executes.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an image recognition method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image recognition method according to another embodiment of the present disclosure;

FIG. 3 is a flow chart of an image recognition method according to another embodiment of the present disclosure;

FIG. 4 is a flow chart of an image recognition method according to another embodiment of the present disclosure;

FIG. 5 is a flow chart of an image recognition method according to another embodiment of the present disclosure;

FIG. 6 is a block diagram of an image recognition device of an embodiment of the present disclosure;

FIG. 7 is a block diagram of an image recognition device of an embodiment of the present disclosure;

fig. 8 is a schematic block diagram of an electronic device of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Image Processing (Image Processing), a technique of analyzing an Image with a computer to achieve a desired result. Also known as image processing. Image processing generally refers to digital image processing. The digital image is a large two-dimensional array obtained by photographing with equipment such as an industrial camera, a video camera, a scanner and the like, wherein the elements of the array are called pixels, and the values of the pixels are called gray values. Image processing techniques generally include image compression, enhancement and restoration, matching, description and recognition of 3 parts.

Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and is introduced into Machine Learning to make it closer to the original goal, i.e., artificial intelligence. Deep learning is the inherent law and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.

Computer Vision (Computer Vision), which is a science of researching how to make a machine "look at", further means that a camera and a Computer are used to replace human eyes to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can obtain 'information' from images or multidimensional data. The information referred to herein refers to Shannon-defined information that may be used to assist in making a "decision". Because perception can be seen as the extraction of information from sensory signals, computer vision can also be seen as science of how to "perceive" an artificial system from images or multi-dimensional data.

Artificial intelligence (Artificial Intelligence, AI for short) is a discipline of researching and enabling a computer to simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a person, and has a technology at a hardware level and a technology at a software level. Artificial intelligence hardware technologies generally include computer vision technologies, speech recognition technologies, natural language processing technologies, and learning/deep learning, big data processing technologies, knowledge graph technologies, and the like.

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present disclosure. As shown, the image recognition method includes the steps of:

s101, acquiring an image to be identified, and extracting image features of the image to be identified.

In the embodiment of the disclosure, the image to be identified may be a pre-acquired image or may be the image acquired in real time. Optionally, the image is a color image.

After the image to be identified is acquired, in order to identify or classify the image to be identified, image features of the image to be identified need to be extracted, and the image features may include, but are not limited to, the following features: the image features mainly comprise color features, texture features, shape features and spatial relationship features of the image.

Alternatively, image features of the image to be recognized may be extracted by a deep learning model or a machine learning model, that is, the image to be recognized is input into a trained feature extraction network, and image features may be extracted based on the feature extraction network.

S102, performing dimension reduction processing on the image features to obtain dimension reduction image features.

In the embodiment of the disclosure, the same feature information in the image features can be described from different multiple dimensions, for example, one feature information can be described from dimensions such as a feature extraction channel, a feature length, a feature width and the like. In order to reduce the amount of data processing and to realize multiplication of candidate matrices, in the embodiment of the disclosure, dimension reduction processing may be performed on image features to obtain dimension-reduced image features.

And S103, enhancing the dimension-reduced image features on the feature extraction channel to obtain first enhanced image features.

In the implementation, in the process of extracting the features of the image to be identified, the features of the image to be identified need to be extracted through a plurality of feature extraction channels. Aiming at the problem of insufficient image feature extraction in the related art, in the embodiment of the present disclosure, feature enhancement may be performed on a feature extraction channel, so as to obtain a first enhanced image feature. Optionally, the reduced-dimension image feature is convolved based on a plurality of convolution networks to obtain enhancement weights of a plurality of feature extraction channels, and the first enhanced image feature can be obtained based on the enhancement weights of the channel levels and the image feature.

S104, enhancing the dimension-reduced image features on pixels to obtain second enhanced image features.

The image is composed of a plurality of pixels, each pixel has a certain contribution to the extraction of image features, and in order to solve the problem of insufficient extraction of image features in the related art, in the embodiment of the disclosure, the feature enhancement at the pixel level can be performed, so as to obtain the second enhanced image feature. Optionally, the reduced-dimension image feature is convolved based on a plurality of convolution networks to obtain an enhancement weight of each pixel, and the second enhanced image feature can be obtained based on the enhancement weight and the image feature of the pixel level.

S105, acquiring the texture type of the image to be identified based on the first enhanced image feature and the second enhanced image feature.

After the first enhanced image feature and the second enhanced image feature are obtained, the two enhanced image features are fused, and a final image feature is obtained. Optionally, the first enhanced image feature and the second enhanced image feature are weighted to obtain a final target image feature. The first enhanced image features and the second enhanced image features enable the intensity of the image features to be higher, and are more beneficial to improving the accuracy of image recognition.

After the final target image feature is obtained, classifying and identifying are carried out based on the final target image feature, so that the texture type of the image to be identified can be obtained. Optionally, classifying and identifying the target image features based on the trained texture classification model, and finally outputting the texture types corresponding to the image to be identified. For example, texture types may include soil materials, road surfaces, foliage, and the like.

The image recognition method provided by the disclosure comprises the steps of obtaining an image to be recognized, extracting image features of the image to be recognized, performing dimension reduction processing on the image features, obtaining dimension reduction image features, enhancing the dimension reduction image features on a feature extraction channel to obtain first enhanced image features, enhancing the dimension reduction image features on pixels to obtain second enhanced image features, and obtaining texture types of the image to be recognized based on the first enhanced image features and the second enhanced image features. In the method, after the image features are acquired, feature enhancement is performed in two aspects respectively to enhance the expression capability of the features, and sufficient image features can be provided to improve the accuracy of classifying and identifying the texture types of the images.

Fig. 2 is a flow chart of an image recognition method according to another embodiment of the present disclosure. As shown in fig. 2, the image recognition method specifically includes the following steps:

s201, acquiring an image to be identified, and extracting image features of the image to be identified.

S202, performing dimension reduction processing on the image features to obtain a first dimension reduction feature matrix and a second dimension reduction feature matrix.

The feature elements in the same row in the first dimension-reduction feature matrix belong to the same feature extraction channel, one column element corresponds to one pixel, and the second dimension-reduction feature matrix is a transposed matrix of the first dimension-reduction feature matrix. In the present disclosure, the dimension-reduced image feature may include a first dimension-reduced feature matrix and a second dimension-reduced feature matrix, where the two dimension-reduced feature matrices are used to obtain a first enhanced image feature and a second enhanced image feature.

The same feature information in the image feature may be described from multiple dimensions in an implementation, e.g., one feature information may be described from dimensions such as feature extraction channels, feature lengths, and feature widths. In order to reduce the amount of data processing and to enable multiplication of matrices, in embodiments of the present disclosure, dimension reduction processing may be performed on image features. Optionally, two dimensions of the feature length and the feature width in the image feature may be fused to obtain a first dimension reduction feature matrix and a second dimension reduction feature matrix.

And S203, acquiring a first enhanced image feature and a second enhanced image feature based on the first dimension reduction feature matrix and the second dimension reduction feature matrix.

The process of acquiring the first enhanced image feature includes: multiplying the first dimension-reduction feature matrix by the second dimension-reduction feature matrix to obtain a first weight matrix corresponding to the feature extraction channel, and obtaining the first enhanced image feature based on the image feature and the first weight matrix. Optionally, performing convolution operation on the image feature to obtain a first intermediate feature matrix, multiplying the first weight matrix with the first intermediate feature matrix to obtain a second intermediate feature matrix, and adding the first intermediate feature matrix and the second intermediate feature matrix to obtain the first enhanced image feature. In the embodiment of the disclosure, the feature extraction channel is subjected to feature enhancement, namely the feature extraction capability of the feature channel is enhanced, and the strength of extracted image features is higher, so that the accuracy of image recognition can be improved.

The first enhanced image feature acquisition process is explained below with reference to fig. 3, and as shown in fig. 3, the feature enhancement module at the channel level includes a convolution unit 31, a convolution unit 32, a convolution unit 33, a first matrix multiplication unit 34, a normalization unit 35, a second matrix multiplication unit 36, and an adder 37.

Wherein the image feature F _(c×w×h) As an input to the feature enhancement module at the channel level, where C represents the feature extraction channel, W represents the feature width, and H represents the feature length.

The image features F are respectively subjected to convolution unit 31 and convolution unit 32 _(c×w×h) Performing convolution operation on the image feature F _(c×w×h) Performing dimension reduction processing to obtain dimension reduction image features, namely a first dimension reduction feature matrix Q _c(c×(h*w)) And a second dimension-reducing feature matrix H _c((h*w)×c) . Wherein H is _c((h*w)×c) Is Q _c(c×(h*w)) Is a transposed matrix of (a). Further, Q is as follows _c(c×(h*w)) And H is _c((h*w)×c) Input to the first matrix multiplication unit 34, Q is within the first matrix multiplication unit 34 _c(c×(h*w)) And H is _c((h*w)×c) Matrix multiplication is carried out, and a first weight matrix M is output _c(c×c) And M is taken up in _c(c×c) After normalization (softmax) operation in the input normalization unit 35, a first weight matrix M 'corresponding to the feature extraction channel is obtained' _c(c×c) 。

Image feature F is convolved by convolution unit 33 _(c×w×h) Performing convolution operation to obtain a first intermediate feature matrix F _c(c×h×w)1 Finally, M 'is found in the second matrix multiplication unit 36 by the second matrix multiplication unit 36' _c(c×c) And F _c(c×h×w)1 Matrix multiplication is carried out to obtain a reinforced second intermediate feature matrix F _h(c×h×w)1 。

Further, the second intermediate feature matrix F is processed by adder 37 _h(c×h×w)1 And a first intermediate feature matrix F _c(c×h×w)1 Adding to obtain final first enhanced image feature F ₁ 。

The process of acquiring the second enhanced image feature includes: multiplying the second dimension-reduction feature matrix by the first dimension-reduction feature matrix to obtain a second weight matrix corresponding to the pixel, and obtaining a second enhanced image feature based on the image feature and the second weight matrix. Optionally, performing convolution operation on the image feature to obtain a third intermediate feature matrix, multiplying the second weight matrix with the third intermediate feature matrix to obtain a fourth intermediate feature matrix, and adding the third intermediate feature matrix and the fourth intermediate feature matrix to obtain a second enhanced image feature. In the embodiment of the disclosure, feature enhancement is performed on the pixels to improve the expression capability of image features, so that the accuracy of image recognition can be improved.

The second enhanced image feature acquisition process is explained below with reference to fig. 4, and as shown in fig. 4, the feature enhancement module at the pixel level includes a convolution unit 41, a convolution unit 42, a convolution unit 43, a first matrix multiplication unit 44, a normalization unit 45, a second matrix multiplication unit 46, and an adder 47.

Wherein the image feature F _(c×w×h) As input to the feature enhancement module at the pixel level.

The image features F are respectively subjected to convolution unit 41 and convolution unit 42 _(c×w×h) Performing convolution operation on the image feature F _(c×w×h) Performing dimension reduction processing to obtain dimension reduction image features, namely a first dimension reduction feature matrix Q _c(c×(h*w)) And a second dimension-reducing feature matrix H _c((h*w)×c) . Wherein H is _c((h*w)×c) Is Q _c(c×(h*w)) Is a transposed matrix of (a). Further, H _c((h*w)×c) And Q is equal to _c(c×(h*w)) Is input to a first matrix multiplication unit 44, H is within the first matrix multiplication unit 44 _c((h*w)×c) And Q is equal to _c(c×(h*w)) After matrix multiplication, a second weight matrix M can be obtained _{P((h*w)×(h*w))} And M is taken up in _{P((h*w)×(h*w))} The normalization operation is performed in the input normalization unit 45 to obtain a second weight matrix M 'corresponding to the pixel' _{P((h*w)×(h*w))} 。

Image feature F is processed by convolution unit 43 _(c×w×h) Performing convolution operation to obtain a third intermediate feature matrix F _c(c×h×w)2 Finally, the second matrix multiplication unit 46 multiplies the second matrix by the first matrixM 'in the multiplication unit 46' _{P((h*w)×(h*w))} And F _c(c×h×w)2 Matrix multiplication is carried out to obtain a fourth enhanced intermediate feature matrix F _h(c×h×w)2 。

Further, the fourth intermediate feature matrix F is processed by adder 47 _h(c×h×w)2 And a third intermediate feature matrix F _c(c×h×w)2 Performing channel up-addition to obtain final second enhanced image feature F ₂ 。

The process of acquiring the second enhanced image feature includes: multiplying the second dimension-reduction feature matrix by the first dimension-reduction feature matrix to obtain a second weight matrix corresponding to the pixel, and obtaining a second enhanced image feature based on the image feature and the second weight matrix. Optionally, performing convolution operation on the image feature to obtain a third intermediate feature matrix, multiplying the second weight matrix with the third intermediate feature matrix to obtain a fourth intermediate feature matrix, and adding the third intermediate feature matrix and the fourth intermediate feature matrix to obtain a second enhanced image feature.

S204, weighting the first enhanced image feature and the second enhanced image feature to obtain target image features.

In the embodiment of the disclosure, the acquisition of the target image feature requires weighting calculation based on the first enhanced image feature and the second enhanced image feature. Further, the target image feature is set to be F, as shown in fig. 3 and fig. 4, the image feature obtained by channel level enhancement of the image feature is set to be F1, the image feature obtained by pixel level enhancement of the image feature is set to be F2, and after the enhanced feature is obtained, the weights of the first enhanced image feature and the second enhanced image feature are fused with each other to form F1 and F2, that is, f=a×f1+b×f2. Wherein a and b are weight parameters that can be learned, it can be understood that the weight parameters a and b are obtained by debugging according to a training process and a testing process in the image texture recognition model in the embodiment of the disclosure.

S205, based on the target image characteristics, acquiring the texture type of the image to be identified.

The image texture recognition model referred to in the above embodiment is explained below. A nonlinear mapping model is first constructed, and then a training dataset is acquired, wherein the training dataset includes sample images and texture categories marked by the sample images. Training the constructed nonlinear mapping model based on the training data set to finally obtain an image texture recognition model capable of recognizing image textures.

Alternatively, as shown in fig. 5, the network structure of the image classification recognition model may include: a feature extraction layer 51, a feature enhancement layer 52, wherein the feature enhancement layer comprises a channel-level feature enhancer layer 521 and a pixel-level feature enhancer layer 522, a feature fusion layer 53, a Full Connected (FC) layer 54, and an L2 norm normalization (L2 nom) layer 55. The image to be identified is input into the image classification and identification model shown in fig. 5, image features can be extracted through the feature extraction layer 51, then the feature enhancement layer 52 is used for carrying out channel-level and pixel-level feature enhancement, the feature fusion layer 53 is used for carrying out feature fusion, the FC layer 54 is used for carrying out full connection on the fused image features, and finally the FC layer 54 is used for carrying out mapping on the fused image features, so that the texture type of the image to be identified is obtained.

Corresponding to the image recognition methods provided in the above embodiments, an embodiment of the present disclosure further provides an image recognition apparatus, and since the extraction apparatus of the image texture feature provided in the embodiment of the present disclosure corresponds to the image recognition method provided in the above embodiments, implementation of the image recognition method is also applicable to the image recognition method apparatus provided in the embodiment of the present disclosure, and will not be described in detail in the following embodiments.

Fig. 6 is a schematic structural view of an image recognition apparatus according to another embodiment of the present disclosure. As shown in fig. 6, the image recognition apparatus 600 includes: a feature extraction module 61, a dimension reduction module 62, a first enhancement module 63, a second enhancement module 64, and a texture recognition module 65. Wherein:

the feature extraction module 61 is configured to obtain an image to be identified, and extract image features of the image to be identified;

the dimension reduction module 62 is configured to perform dimension reduction processing on the image feature to obtain a dimension-reduced image feature;

a first enhancement module 63, configured to enhance the image feature on the feature extraction channel to obtain a first enhanced image feature;

a second enhancement module 64 for enhancing the image features on pixels to obtain second enhanced image features;

the texture recognition module 65 is configured to obtain a texture type of the image to be recognized based on the first enhanced image feature and the second enhanced image feature.

The image recognition device provided by the disclosure obtains an image to be recognized, extracts image features of the image to be recognized, performs dimension reduction processing on the image features, obtains dimension reduction image features, enhances the dimension reduction image features on a feature extraction channel to obtain first enhanced image features, enhances the dimension reduction image features on pixels to obtain second enhanced image features, and obtains texture types of the image to be recognized based on the first enhanced image features and the second enhanced image features. In the method, after the image features are acquired, feature enhancement is performed in two aspects respectively to enhance the expression capability of the features, and sufficient image features can be provided to improve the accuracy of classifying and identifying the texture types of the images.

Fig. 7 is a schematic structural view of an image recognition apparatus according to another embodiment of the present disclosure. As shown in fig. 7, the image recognition apparatus 700 includes: a feature extraction module 71, a dimension reduction module 72, a first enhancement module 73, a second enhancement module 74, and a texture recognition module 75.

The feature extraction module 71, the dimension reduction module 72, the first enhancement module 73, the second enhancement module 74, and the texture recognition module 75 have the same structure and function as the feature extraction module 61, the dimension reduction module 62, the first enhancement module 63, the second enhancement module 64, and the texture recognition module 65.

In the embodiment of the present disclosure, the dimension reduction module 72 is configured to fuse two dimensions of a feature length and a feature width in an image feature to obtain a first dimension reduction feature matrix and a second dimension reduction feature matrix, where feature elements in the same row in the first dimension reduction feature matrix belong to the same feature extraction channel, one column element corresponds to one pixel, and the second dimension reduction feature matrix is a transpose matrix of the first dimension reduction feature matrix; the first dimension-reduction feature matrix and the second dimension-reduction feature matrix are used for acquiring first enhanced image features and second enhanced image features.

In the disclosed embodiment, the first enhancement module 73 includes a first matrix multiplication unit 731 and a first acquisition unit 732.

And the first matrix multiplication unit 731 is configured to multiply the first dimension-reduction feature matrix with the second dimension-reduction feature matrix, and obtain a first weight matrix corresponding to the feature extraction channel.

A first obtaining unit 732 is configured to obtain a first enhanced image feature based on the image feature and the first weight matrix.

The first obtaining unit 732 is further configured to perform a convolution operation on the image feature to obtain a first intermediate feature matrix; multiplying the first weight matrix with the first intermediate feature matrix to obtain a second intermediate feature matrix; and adding the first intermediate feature matrix and the second intermediate feature matrix to obtain a first enhanced image feature.

In the embodiment of the present disclosure, the second enhancement module 74 includes a second matrix multiplication unit 741 and a second acquisition unit 742.

And the second matrix multiplication unit 741 is configured to multiply the second dimension-reduction feature matrix with the first dimension-reduction feature matrix, and obtain a second weight matrix corresponding to the pixel.

A second obtaining unit 742 is configured to obtain a second enhanced image feature based on the image feature and the second weight matrix.

The second obtaining unit 742 is further configured to perform a convolution operation on the image feature to obtain a third intermediate feature matrix; multiplying the second weight matrix with the third intermediate feature matrix to obtain a fourth intermediate feature matrix; adding the third intermediate feature matrix and the fourth intermediate feature matrix to obtain a second enhanced image feature

The texture recognition module 75 in the embodiment of the present disclosure includes: a weighting unit 751 and an identification unit 752.

The weighting unit 751 is used for weighting the first enhanced image feature and the second enhanced image feature to obtain a target image feature.

The identifying unit 752 is configured to identify a texture type of the image to be identified based on the target image feature.

In the method, after the image features are acquired, feature enhancement is performed in two aspects respectively to enhance the expression capability of the features, and sufficient image features can be provided to improve the accuracy of classifying and identifying the texture types of the images.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, an image recognition method. For example, in some embodiments, the image recognition method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the image recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the image recognition method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates blockchains.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image recognition method, comprising:

acquiring an image to be identified, and extracting image characteristics of the image to be identified;

performing dimension reduction processing on the image features to obtain dimension-reduced image features, wherein the dimension-reduced image features comprise a first dimension-reduced feature matrix and a second dimension-reduced feature matrix, and the dimension-reduced image features are used for obtaining first enhanced image features and second enhanced image features;

multiplying the first dimension-reduction feature matrix by the second dimension-reduction feature matrix to obtain a first weight matrix corresponding to the feature extraction channel; performing convolution operation on the image features to obtain a first intermediate feature matrix; multiplying the first weight matrix with the first intermediate feature matrix to obtain a second intermediate feature matrix; adding the first intermediate feature matrix and the second intermediate feature matrix to obtain the first enhanced image feature;

multiplying the second dimension-reduction feature matrix by the first dimension-reduction feature matrix to obtain a second weight matrix corresponding to the pixel; acquiring the second enhanced image feature based on the image feature and the second weight matrix;

and acquiring the texture type of the image to be identified based on the first enhanced image feature and the second enhanced image feature.

2. The image recognition method according to claim 1, wherein the performing the dimension reduction processing on the image feature to obtain a dimension-reduced image feature includes:

and fusing the feature length and the feature width in the image features to obtain the first dimension-reduction feature matrix and the second dimension-reduction feature matrix, wherein feature elements in the same row in the first dimension-reduction feature matrix belong to the same feature extraction channel, one column element corresponds to one pixel, and the second dimension-reduction feature matrix is a transposed matrix of the first dimension-reduction feature matrix.

3. The image recognition method of claim 1, wherein the acquiring the second enhanced image feature based on the image feature and the second weight matrix comprises:

performing convolution operation on the image features to obtain a third intermediate feature matrix;

multiplying the second weight matrix with the third intermediate feature matrix to obtain a fourth intermediate feature matrix;

and adding the third intermediate feature matrix and the fourth intermediate feature matrix to obtain the second enhanced image feature.

4. The image recognition method according to any one of claims 1-3, wherein the acquiring the texture type of the image to be recognized based on the first enhanced image feature and the second enhanced image feature includes:

weighting the first enhanced image feature and the second enhanced image feature to obtain a target image feature;

and identifying the texture type of the image to be identified based on the target image characteristics.

5. An image recognition apparatus comprising:

the feature extraction module is used for acquiring an image to be identified and extracting image features of the image to be identified;

the dimension reduction module is used for carrying out dimension reduction processing on the image features to obtain dimension reduction image features, wherein the dimension reduction image features comprise a first dimension reduction feature matrix and a second dimension reduction feature matrix, and the dimension reduction module is used for acquiring first enhanced image features and second enhanced image features;

a first enhancement module, comprising:

the first matrix multiplication unit is used for multiplying the first dimension reduction feature matrix and the second dimension reduction feature matrix to obtain a first weight matrix corresponding to the feature extraction channel;

the first acquisition unit is used for carrying out convolution operation on the image features to acquire a first intermediate feature matrix; multiplying the first weight matrix with the first intermediate feature matrix to obtain a second intermediate feature matrix; adding the first intermediate feature matrix and the second intermediate feature matrix to obtain the first enhanced image feature;

a second enhancement module, comprising:

the second matrix multiplication unit is used for multiplying the second dimension reduction feature matrix with the first dimension reduction feature matrix to obtain a second weight matrix corresponding to the pixel;

a second obtaining unit, configured to obtain the second enhanced image feature based on the image feature and the second weight matrix;

and the texture recognition module is used for acquiring the texture type of the image to be recognized based on the first enhanced image feature and the second enhanced image feature.

6. The image recognition device according to claim 5, wherein the dimension reduction module is configured to fuse two dimensions of a feature length and a feature width in the image feature to obtain the first dimension reduction feature matrix and the second dimension reduction feature matrix, where feature elements in a same row in the first dimension reduction feature matrix belong to a same feature extraction channel, one column element corresponds to one pixel, and the second dimension reduction feature matrix is a transpose matrix of the first dimension reduction feature matrix.

7. The image recognition device of claim 5, wherein the second acquisition unit is further configured to:

8. The image recognition device of any one of claims 5-7, wherein the texture recognition module comprises:

the weighting unit is used for weighting the first enhanced image feature and the second enhanced image feature to obtain a target image feature;

and the identification unit is used for identifying the texture type of the image to be identified based on the target image characteristics.

9. An electronic device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image recognition method of any one of claims 1-4.

10. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image recognition method of any one of claims 1-4.