CN116468947A

CN116468947A - Cutter image recognition method, cutter image recognition device, computer equipment and storage medium

Info

Publication number: CN116468947A
Application number: CN202310438264.4A
Authority: CN
Inventors: 张少特; 张奇特; 谭云培; 袁兴泷; 王兵正; 谢万桥
Original assignee: Hangzhou Eda Precision Electromechanical Science & Technology Co ltd
Current assignee: Hangzhou Eda Precision Electromechanical Science & Technology Co ltd
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-07-21

Abstract

The embodiment of the invention discloses a cutter image identification method, a cutter image identification device, computer equipment and a storage medium. The method comprises the following steps: acquiring an image to be identified; inputting the image to be identified into a type identification model for type identification to obtain an identification result; the type recognition model is obtained by training a convolutional neural classification network through a plurality of cutter images with type labels as sample sets, wherein the type recognition model comprises a convolutional layer, a pooling layer, a full-connection layer and a residual block, and a spatial attention module is inserted behind each convolutional layer and each residual block. By implementing the method provided by the embodiment of the invention, the problem that other spatial attention mechanism models learn that dimensional spatial information is not compact enough can be solved, and the recognition speed and precision of tools with various types and models are improved.

Description

Cutter image recognition method, cutter image recognition device, computer equipment and storage medium

Technical Field

The present invention relates to deep learning, and more particularly, to a tool image recognition method, apparatus, computer device, and storage medium.

Background

In the production process of products, cutters are required to be used for cutting the products, and because of technological requirements, the production flow of one product requires cutters with various specifications, but on the current production line, manual cutter selection is often adopted to be installed on a machine tool for cutting the products, and the cutter is manually returned to the designated cutter position after the cutting is finished, so that the mode of manually selecting the cutters has the problems of higher probability of wrong selection, disorder management and the like.

A plurality of intelligent cutter management cabinets are recorded in the prior art to solve the problem of intelligent cutter management. However, the existing intelligent tool management cabinet generally manages tools by means of tool information input by users during borrowing and preset tool positions, when returning tools, the users need to put the tools into a designated space according to the instructions of the management cabinet, and then the users rely on mechanical classification in the cabinet body to finish tool type identification and storage. Of course, there are also known techniques for identifying and managing tools by using a tool image identification technique, in which an identification effect of an attention module lifting algorithm is inserted into an existing convolutional neural classification network, and this has become an important research mode. Taking Coorfinate Attention spatial attention module as an example, the method generally comprises the following steps: pooling H dimension and W dimension is respectively carried out on the feature diagram output by the previous layer to obtain two feature diagrams of C multiplied by H multiplied by 1 and C multiplied by 1 multiplied by W; the obtained feature maps are spliced and convolved using a cx1 x 1 x (C/r) convolution kernel. The step aims to encode the space information and reduce the dimension on a C channel; normalizing the output batch of the previous step, splitting split, and restoring into two branches; the two branches respectively use convolution as (C/r) multiplied by 1 multiplied by C to carry out convolution learning, and the number of output channels is reduced to the number of channels C generated in the first step; after the branch output is activated, the spatial information is fused in a weighted manner on the channel. It can be seen that the convolution with convolution kernel of 1×1 is used in both the second step and the fourth step, so that spatial information is mainly learned from the channel dimension, and the close association of the dimensions of W and H is not fully utilized; and the second step learns that the attention is not tight enough in the respective dimensions in the case of convolution kernels sharing the same 1 x 1 for W and H spatial information extraction. Because the types and models of the cutters are very large, if the cutters are identified by adopting the traditional attention mechanism, the identification speed and the identification precision are not high.

Therefore, a new method is needed to be designed, so that the problem that other space attention mechanism models learn insufficient dimensional space information is solved, and the recognition speed and accuracy of tools with various types and models are improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a cutter image identification method, a cutter image identification device, computer equipment and a storage medium.

In order to achieve the above purpose, the present invention adopts the following technical scheme: the cutter image recognition method comprises the following steps:

acquiring an image to be identified;

inputting the image to be identified into a type identification model for type identification to obtain an identification result;

the type recognition model is obtained by training a convolutional neural classification network through a plurality of cutter images with type labels as sample sets, wherein the type recognition model comprises a convolutional layer, a pooling layer, a full-connection layer and a residual block, and a spatial attention module is inserted behind each convolutional layer and each residual block.

The further technical scheme is as follows: the step of inputting the image to be identified into a type identification model for type identification to obtain an identification result comprises the following steps:

inputting the image to be identified into a type identification model, and acquiring a feature map of the image to be identified by utilizing a convolution layer in the type identification model;

the method comprises the steps of obtaining the attention of an image to be identified in width and height by using a spatial attention module, and encoding the accurate position information of the image to be identified;

dividing the characteristic map into two directions of width and height, and respectively carrying out global average pooling to obtain a characteristic map in the width direction and a characteristic map in the height direction;

performing flattening operation on the characteristic diagram in the width direction and the characteristic diagram in the height direction to obtain two one-dimensional vectors;

inputting the two one-dimensional vectors into the full-connection layer to obtain two paths of output results;

performing dimension recovery on the two paths of output results, and respectively copying and expanding to obtain two identical characteristic layers of C multiplied by H multiplied by W, wherein W is a wide coordinate dimension, H is a high coordinate dimension, and C is the channel number;

performing dot product on two identical characteristic layers of C multiplied by H multiplied by W to obtain preliminary three-dimensional space attention weight;

carrying out spatial information fusion on the preliminary three-dimensional spatial attention weight to obtain a final spatial attention coding weight;

and fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain an identification result.

The further technical scheme is as follows: the step of inputting the two one-dimensional vectors to the full connection layer to obtain two paths of output results comprises the following steps:

and inputting the two one-dimensional vectors into a bottleneck structure of the full-connection layer to obtain two paths of output results.

The further technical scheme is as follows: the two paths of output results are S _h ＝σ(W ₂ ReLU(W ₁ Z _h ) S and S _w ＝σ(W ₃ ReLU(W ₄ Z _w ) Wherein, the method comprises the steps of, wherein,first layer full connection for H dimension directionWeights of the layers; />The weight of the second full-connection layer in the H dimension direction; />The weight of the first full-connection layer in the W dimension direction; />The weight of the second full-connection layer in the W dimension direction; z is Z _h 、Z _w The method comprises the steps of respectively obtaining two one-dimensional vectors, wherein ReLU and sigma are deep learning activation functions, wherein a first fully-connected dimension reduction coefficient is r which is a super parameter, and the first fully-connected dimension reduction coefficient is activated by adopting ReLU; the last fully connected layer restores the dimension of c×h×w, performing sigma activation on the feature to be learned; w is the wide coordinate dimension, H is the high coordinate dimension, and C is the channel number.

The further technical scheme is as follows: the step of fusing the preliminary three-dimensional spatial attention weights to obtain final spatial attention coding weights includes:

the preliminary three-dimensional spatial attention weights are encoded using a 3 x 3 convolution and activated using Sigmoid to obtain the final spatial attention encoding weights.

The further technical scheme is as follows: the fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain an identification result comprises the following steps:

and carrying out dot multiplication on the final spatial attention coding weight and the feature map of the image to be identified to obtain an identification result.

The invention also provides a cutter image recognition device, which comprises:

the image acquisition unit is used for acquiring an image to be identified;

the identification unit is used for inputting the image to be identified into a type identification model to carry out type identification so as to obtain an identification result;

The further technical scheme is as follows: the identification unit includes:

the characteristic layer acquisition subunit is used for inputting the image to be identified into a type identification model, and acquiring a characteristic image of the image to be identified by utilizing a convolution layer in the type identification model;

the coding subunit is used for acquiring the attention of the image to be identified in width and height by using the spatial attention module and coding the accurate position information of the image to be identified;

chi Huazi unit for dividing the feature map into two directions of width and height, and performing global average pooling to obtain a feature map in the width direction and a feature map in the height direction;

an operation subunit, configured to perform a flattening operation on the feature map in the width direction and the feature map in the height direction, so as to obtain two one-dimensional vectors;

the full-connection subunit is used for inputting the two one-dimensional vectors into the full-connection layer to obtain two paths of output results;

the recovery subunit is used for carrying out dimension recovery on the two paths of output results, and respectively copying and expanding to obtain two identical characteristic layers of C multiplied by H multiplied by W, wherein W is a wide coordinate dimension, H is a high coordinate dimension, and C is the number of channels;

the dot product subunit is used for dot-accumulating two identical characteristic layers of C multiplied by H multiplied by W to obtain a preliminary three-dimensional space attention weight;

the information fusion subunit is used for carrying out space information fusion on the preliminary three-dimensional space attention weight so as to obtain a final space attention coding weight;

and the content fusion subunit is used for fusing the final spatial attention coding weight with the feature map of the image to be identified so as to obtain an identification result.

The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.

The present invention also provides a storage medium storing a computer program which, when executed by a processor, performs the above-described method.

Compared with the prior art, the invention has the beneficial effects that: the invention carries out category identification by adopting a type identification model, in an identification result, the type identification model inserts a space attention module behind each convolution layer and residual block, the attention on the width and the height of the image is coded, the accurate position information is coded, the feature images in the width direction and the height direction are determined, the feature images are converted into one-dimensional vectors, the one-dimensional vectors are processed by a full connection layer, the dimensions are restored, the preliminary three-dimensional space attention weight is determined, the final space attention coding weight is determined, the identification result is determined by combining the feature image point multiplication mode, the problem that the space information of other space attention mechanism models is not compact enough is solved, and the identification speed and the accuracy of tools with various types and models are improved.

The invention is further described below with reference to the drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario of a tool image recognition method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for identifying a tool image according to an embodiment of the present invention;

FIG. 3 is a schematic sub-flowchart of a method for identifying a tool image according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a type recognition model according to an embodiment of the present invention;

FIG. 5 is a schematic block diagram of components of a convolutional neural classification network provided by an embodiment of the present invention;

FIG. 6 is a schematic block diagram of a convolutional neural classification network architecture provided by an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a tool image recognition device provided by an embodiment of the present invention;

fig. 8 is a schematic block diagram of an identification unit of the tool image identification apparatus provided by the embodiment of the invention;

fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a tool image recognition method according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a tool image recognition method according to an embodiment of the present invention. The cutter image recognition method is applied to the server. The server performs data interaction with the camera, and realizes that a ResNet50 is used as a backbone network, wherein the ResNet50 has 50 layers, including a convolution layer, a pooling layer, a full connection layer and the like. The ResNet50 is mainly characterized in that a residual block structure is adopted, so that the network can be deeper, and the problem of gradient disappearance is relieved. The spatial attention module mentioned in the embodiment is inserted behind each convolution layer and residual block on the backbone of the resnet50, so that the recognition accuracy is obviously improved compared with the basic resnet50 network, and the method is used for cutter type recognition. The problem that other spatial attention mechanism models learn that dimensional space information is not tight enough is solved.

Fig. 2 is a flowchart of a tool image recognition method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S120.

S110, acquiring an image to be identified.

In this embodiment, the image to be recognized refers to an image that needs to be subjected to type recognition, and in this embodiment, specifically refers to a tool image to be recognized.

S120, inputting the image to be identified into a type identification model for type identification so as to obtain an identification result.

In the present embodiment, the recognition result refers to the kind of the image to be recognized.

Referring to fig. 4, the type recognition model described above uses a res net50 as a backbone network, where the res net50 has 50 layers, including a convolutional layer, a pooling layer, a fully-connected layer, and the like. The ResNet50 is mainly characterized in that a residual block structure is adopted, so that the network can be deeper, and the problem of gradient disappearance is relieved. The spatial attention module is inserted behind each convolution layer and residual block on the backbone of the resnet50, so that the recognition accuracy is obviously improved compared with the basic resnet50 network.

In an embodiment, referring to fig. 3 to 5, the step S120 may include steps S121 to S129.

S121, inputting the image to be identified into a type identification model, and acquiring a feature map of the image to be identified by utilizing a convolution layer in the type identification model.

In this embodiment, the feature map of the image to be identified is first extracted by using the convolution layer, so that the subsequent pooling and spatial attention module processes the feature map, and the probability of each kind is determined, thereby determining the identification result.

S122, acquiring the attention of the image to be identified in width and height by using a spatial attention module, and encoding the accurate position information of the image to be identified;

s123, dividing the characteristic map into a width direction and a height direction, and respectively carrying out global average pooling to obtain a characteristic map in the width direction and a characteristic map in the height direction.

In this embodiment, attention on the width and height of the image is acquired and accurate position information is encoded, and the input feature map is divided into two directions of width and height, and global average pooling is performed to obtain feature maps in the two directions of width and height Wherein W is the wide coordinate dimension, H is the high coordinate dimension, and C is the channel number.

S124, performing flattening operation on the characteristic diagram in the width direction and the characteristic diagram in the height direction to obtain two one-dimensional vectors.

In the present embodiment, the one-dimensional vector refers to a vector in which the feature map in the width direction and the feature map in the height direction are converted.

Specifically, the feature images of two paths in the width direction and the height direction are obtained to be used as views, and Z is obtained respectively _h ＝1×(c×h)、Z _w =1× (c×w) two one-dimensional vectors.

S125, inputting the two one-dimensional vectors into the full connection layer to obtain two paths of output results.

In this embodiment, the two-way output result refers to the result output after the two one-dimensional vectors are input into the full connection layer for processing.

Specifically, two one-dimensional vectors are input to the bottleneck structure of the full-connection layer to obtain two paths of output results.

In this embodiment, the two output results are S _h ＝σ(W ₂ ReLU(W ₁ Z _h ) S and S _w ＝σ(W ₃ ReLU(W ₄ Z _w ) Wherein, the method comprises the steps of, wherein,the weight of the first full-connection layer in the H dimension direction; />The weight of the second full-connection layer in the H dimension direction; />The weight of the first full-connection layer in the W dimension direction; />The weight of the second full-connection layer in the W dimension direction; z is Z _h 、Z _w Respectively two one-dimensional vectors, reLU and sigma are deep learning activation functions, the first fully-connected dimension reduction coefficient is r which is an ultra-parameter, and the first fully-connected dimension reduction coefficient is adoptedActivating with a ReLU; the last fully connected layer restores the dimension of c×h×w, performing sigma activation on the feature to be learned; w is the wide coordinate dimension, H is the high coordinate dimension, and C is the channel number.

S126, performing dimension recovery on the two paths of output results, and respectively copying and expanding to obtain two identical characteristic layers of C multiplied by H multiplied by W, wherein W is a wide coordinate dimension, H is a high coordinate dimension, and C is the channel number.

In this embodiment, 3, view recovery dimension is performed on the two paths of output results, and two identical c×h×w feature layers are obtained by copy and extension respectively.

S127, performing dot product on the two identical C multiplied by H multiplied by W feature layers to obtain the preliminary three-dimensional space attention weight.

In this embodiment, the preliminary three-dimensional spatial attention weight refers to the result after the dot product of two identical c×h×w feature layers.

S128, carrying out spatial information fusion on the preliminary three-dimensional spatial attention weight to obtain a final spatial attention coding weight.

In this embodiment, the final spatial attention coding weight refers to a result formed by fusing spatial information with the preliminary three-dimensional spatial attention weight.

Specifically, the preliminary three-dimensional spatial attention weights are encoded using 3×3 convolution and activated using Sigmoid to obtain the final spatial attention encoding weights.

S129, fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain an identification result.

In this embodiment, the final spatial attention coding weight and the feature map of the image to be identified are subjected to dot multiplication to obtain an identification result.

For training of the type identification model, firstly, data acquisition is carried out, and 100 pieces of 10000 pieces of cutter picture data and 2000 pieces of other non-cutter picture data are acquired from cutter cabinets under different angles of cameras with different lights at different moments. And preprocessing the data, scaling the input picture to a specified size, and normalizing. The model is trained, visual characteristic thermodynamic diagrams are used in the model training process, the thermodynamic diagram change conditions are compared with those of the space attention module and the space attention module which are provided by the patent, and the model is focused on the main characteristic position of the cutter after the space attention module is provided by the embodiment is obviously found, so that the final recognition accuracy is obviously improved. And finally, deploying the model, deploying the trained model in a cutter cabinet, and identifying and counting returned cutters.

According to the cutter image recognition method, the type recognition model is adopted for recognizing the types of the images to be recognized, in the recognition result, the type recognition model inserts a spatial attention module behind each convolution layer and residual block, attention in the width and height of the images is coded, accurate position information is coded, feature images in the width and height directions are determined, the feature images are converted into one-dimensional vectors, dimensions are restored after full-connection layer processing is carried out, initial three-dimensional spatial attention weights are determined, final spatial attention coding weights are determined, the recognition result is determined by combining the feature image point multiplication mode, the problem that other spatial attention mechanism models learn insufficient dimensional space information is solved, and recognition speed and precision of cutters with various types and models are improved.

Fig. 7 is a schematic block diagram of a tool image recognition apparatus 300 according to an embodiment of the present invention. As shown in fig. 7, the present invention also provides a tool image recognition apparatus 300 corresponding to the above tool image recognition method. The tool image recognition apparatus 300 includes a unit for performing the above-described tool image recognition method, and may be configured in a server. Specifically, referring to fig. 7, the tool image recognition apparatus 300 includes an image acquisition unit 301 and a recognition unit 302.

An image acquisition unit 301 for acquiring an image to be identified; the identifying unit 302 is configured to input the image to be identified into a type identifying model for type identification, so as to obtain an identifying result; the type recognition model is obtained by training a convolutional neural classification network through a plurality of cutter images with type labels as sample sets, wherein the type recognition model comprises a convolutional layer, a pooling layer, a full-connection layer and a residual block, and a spatial attention module is inserted behind each convolutional layer and each residual block.

In an embodiment, as shown in fig. 8, the identifying unit 302 includes a feature layer acquiring subunit 3021, encoding subunits 3022 and Chi Huazi units 3023, an operating subunit 3024, a full connection subunit 3025, a recovery subunit 3026, a dot product subunit 3027, an information fusion subunit 3028, and a content fusion subunit 3029.

A feature layer obtaining subunit 3021, configured to input the image to be identified into a type identification model, and obtain a feature map of the image to be identified by using a convolution layer in the type identification model; an encoding subunit 3022, configured to acquire, by using a spatial attention module, attention of the image to be identified in width and height, and encode accurate position information of the image to be identified; chi Huazi unit 3023 for dividing the feature map into two directions of width and height, and performing global average pooling to obtain a feature map in the width direction and a feature map in the height direction; an operation subunit 3024, configured to perform a flattening operation on the feature map in the width direction and the feature map in the height direction, so as to obtain two one-dimensional vectors; a full-connection subunit 3025, configured to input two one-dimensional vectors to the full-connection layer to obtain two output results; a restoring subunit 3026, configured to restore dimensions of the two output results, and copy and extend the two output results respectively to obtain two identical c×h×w feature layers, where W is a wide coordinate dimension, H is a high coordinate dimension, and C is the number of channels; a dot product subunit 3027, configured to dot product two identical c×h×w feature layers to obtain a preliminary three-dimensional spatial attention weight; an information fusion subunit 3028, configured to perform spatial information fusion on the preliminary three-dimensional spatial attention weight, so as to obtain a final spatial attention coding weight; and a content fusion subunit 3029, configured to fuse the final spatial attention coding weight with the feature map of the image to be identified, so as to obtain an identification result.

In one embodiment, the fully-connected subunit 3025 is configured to input two one-dimensional vectors into the bottleneck structure of the fully-connected layer, so as to obtain two output results.

In one embodiment, the dot product subunit 3027 is configured to encode the preliminary three-dimensional spatial attention weights using 3×3 convolution and to obtain the final spatial attention encoding weights using Sigmoid activation.

In an embodiment, the information fusion subunit 3028 is configured to dot-multiply the final spatial attention coding weight with the feature map of the image to be identified to obtain the identification result.

It should be noted that, as will be clearly understood by those skilled in the art, the specific implementation process of the tool image recognition device 300 and each unit may refer to the corresponding description in the foregoing method embodiments, and for convenience and brevity of description, the description is omitted here.

The tool image recognition device 300 described above may be implemented in the form of a computer program that is executable on a computer apparatus as shown in fig. 9.

Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, where the server may be a stand-alone server or may be a server cluster formed by a plurality of servers.

With reference to FIG. 9, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a tool image recognition method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a tool image recognition method.

The network interface 505 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and does not constitute a limitation of the computer device 500 to which the present application is applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:

acquiring an image to be identified; inputting the image to be identified into a type identification model for type identification to obtain an identification result; the type recognition model is obtained by training a convolutional neural classification network through a plurality of cutter images with type labels as sample sets, wherein the type recognition model comprises a convolutional layer, a pooling layer, a full-connection layer and a residual block, and a spatial attention module is inserted behind each convolutional layer and each residual block.

In an embodiment, when the step of inputting the image to be identified into the type identification model to perform type identification to obtain the identification result is implemented by the processor 502, the following steps are specifically implemented:

inputting the image to be identified into a type identification model, and acquiring a feature map of the image to be identified by utilizing a convolution layer in the type identification model; the method comprises the steps of obtaining the attention of an image to be identified in width and height by using a spatial attention module, and encoding the accurate position information of the image to be identified; dividing the characteristic map into two directions of width and height, and respectively carrying out global average pooling to obtain a characteristic map in the width direction and a characteristic map in the height direction; performing flattening operation on the characteristic diagram in the width direction and the characteristic diagram in the height direction to obtain two one-dimensional vectors; inputting the two one-dimensional vectors into the full-connection layer to obtain two paths of output results; performing dimension recovery on the two paths of output results, and respectively copying and expanding to obtain two identical characteristic layers of C multiplied by H multiplied by W, wherein W is a wide coordinate dimension, H is a high coordinate dimension, and C is the channel number; performing dot product on two identical characteristic layers of C multiplied by H multiplied by W to obtain preliminary three-dimensional space attention weight; carrying out spatial information fusion on the preliminary three-dimensional spatial attention weight to obtain a final spatial attention coding weight; and fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain an identification result.

In one embodiment, when the step of inputting the two one-dimensional vectors to the full-connection layer to obtain the two output results is implemented by the processor 502, the following steps are specifically implemented:

Wherein the two paths of output results are S _h ＝σ(W ₂ ReLU(W ₁ Z _h ) S and S _w ＝σ(W ₃ ReLU(W ₄ Z _w ) Wherein, the method comprises the steps of, wherein,the weight of the first full-connection layer in the H dimension direction; />The weight of the second full-connection layer in the H dimension direction; />The weight of the first full-connection layer in the W dimension direction;the weight of the second full-connection layer in the W dimension direction; z is Z _h 、Z _w The method comprises the steps of respectively obtaining two one-dimensional vectors, wherein ReLU and sigma are deep learning activation functions, wherein a first fully-connected dimension reduction coefficient is r which is a super parameter, and the first fully-connected dimension reduction coefficient is activated by adopting ReLU; the last fully connected layer restores the dimension of c×h×w, performing sigma activation on the feature to be learned; w is a wide coordinate dimensionDegree, H is the high coordinate dimension, and C is the number of channels.

In an embodiment, when the step of fusing the preliminary three-dimensional spatial attention weights to obtain the final spatial attention coding weights is implemented by the processor 502, the following steps are specifically implemented:

In an embodiment, when the step of fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain the identification result is implemented by the processor 502, the following steps are specifically implemented:

It should be appreciated that in embodiments of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), the processor 502 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:

In one embodiment, when the processor executes the computer program to implement the step of inputting the image to be identified into a type identification model for type identification to obtain an identification result, the processor specifically implements the following steps:

In one embodiment, when the processor executes the computer program to implement the step of inputting the two one-dimensional vectors to the full-connection layer to obtain two output results, the processor specifically implements the following steps:

Wherein the two paths of output results are S _h ＝σ(W ₂ ReLU(W ₁ Z _h ) S and S _w ＝σ(W ₃ ReLU(W ₄ Z _w ) Wherein, the method comprises the steps of, wherein,the weight of the first full-connection layer in the H dimension direction; />The weight of the second full-connection layer in the H dimension direction; />The weight of the first full-connection layer in the W dimension direction;the weight of the second full-connection layer in the W dimension direction; z is Z _h 、Z _w The method comprises the steps of respectively obtaining two one-dimensional vectors, wherein ReLU and sigma are deep learning activation functions, wherein a first fully-connected dimension reduction coefficient is r which is a super parameter, and the first fully-connected dimension reduction coefficient is activated by adopting ReLU; the last fully connected layer restores the dimension of c×h×w, performing sigma activation on the feature to be learned; w is the wide coordinate dimension, H is the high coordinate dimension, and C is the channel number.

In one embodiment, when the processor executes the computer program to perform the step of fusing the preliminary three-dimensional spatial attention weights to obtain the final spatial attention coding weights, the processor specifically performs the following steps:

In an embodiment, when the processor executes the computer program to realize the step of fusing the final spatial attention coding weight with the feature map of the image to be identified to obtain the identification result, the following steps are specifically implemented:

The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method of identifying a tool image, comprising:

acquiring an image to be identified;

2. The method for recognizing a tool image according to claim 1, wherein the inputting the image to be recognized into a type recognition model for type recognition to obtain a recognition result includes:

3. The method of claim 2, wherein inputting the two one-dimensional vectors to the full-join layer to obtain the two-way output result comprises:

4. A tool image recognition method according to claim 3, whereinThe two paths of output results are S _h ＝(W ₂ ReLU( ₁ Z _h ) S _w ＝(W ₃ ReLU( ₄ Z _w ) Wherein, the method comprises the steps of, wherein, the weight of the first full-connection layer in the H dimension direction; />The weight of the second full-connection layer in the H dimension direction; />The weight of the first full-connection layer in the W dimension direction; />The weight of the second full-connection layer in the W dimension direction; z is Z _h 、Z _w The method comprises the steps of respectively obtaining two one-dimensional vectors, wherein ReLU and sigma are deep learning activation functions, wherein a first fully-connected dimension reduction coefficient is r which is a super parameter, and the first fully-connected dimension reduction coefficient is activated by adopting ReLU; the last fully connected layer restores the dimension of c×h×w, performing sigma activation on the feature to be learned; w is the wide coordinate dimension, H is the high coordinate dimension, and C is the channel number.

5. The method of claim 2, wherein the fusing the preliminary three-dimensional spatial attention weights with spatial information to obtain final spatial attention coding weights comprises:

6. The method according to claim 2, wherein the fusing the final spatial attention coding weights with the feature map of the image to be identified to obtain the identification result includes:

7. A tool image recognition apparatus, comprising:

the image acquisition unit is used for acquiring an image to be identified;

8. The tool image recognition device according to claim 7, wherein the recognition unit includes:

9. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-6.

10. A storage medium storing a computer program which, when executed by a processor, performs the method of any one of claims 1 to 6.