Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for classifying and identifying a commodity, which can improve the accuracy of the commodity classification and identification.
In a first aspect, an embodiment of the present application provides a method for classifying and identifying a commodity, including:
extracting first characteristic information of each area image of a commodity distribution image by adopting a first convolution neural network, wherein the commodity distribution image comprises a plurality of mutually-disjoint area images, and each area image corresponds to a commodity;
generating a corresponding attention area map according to the area image and the corresponding first characteristic information;
extracting second characteristic information of the attention area map by adopting a second convolutional neural network;
pooling the first characteristic information and the corresponding second characteristic information to obtain a bilinear vector;
and obtaining the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vector, and obtaining a classification result based on the probability distribution condition.
According to the embodiment of the application, the attention area image with more detailed information such as details and textures is subjected to feature extraction, and the feature information of the area image is combined, so that the fine feature difference of the commodity is identified by adopting the bilinear vector, and the accuracy of commodity classification identification can be improved.
Optionally, in the method for classifying and identifying a product according to the embodiment of the present application, the generating a corresponding attention area map according to the area image and the corresponding first feature information includes:
inputting the first characteristic information into a preset attention area extraction model to acquire position information of an attention area;
cutting the corresponding region image according to the position information of the attention region to obtain an initial attention region image;
and performing up-sampling processing on the initial attention area map to acquire an attention area map with the same resolution as that of a corresponding area image.
Optionally, in the method for classifying and identifying a commodity according to the embodiment of the present application, the region image is rectangular, and the attention region is square;
the cutting the corresponding region image according to the position information of the attention region to obtain an initial attention region map includes:
generating a mask M, wherein the size and the shape of the mask M are the same as those of the area image, and coordinate points of the mask M correspond to coordinate points of the area image one to one respectively;
acquiring a constraint calculation formula of the coordinate point of the mask M and the position information of the attention area;
and cutting out the part of the area image, which is positioned in the attention area, according to the constraint calculation formula to obtain an initial attention area map.
Optionally, in the method for classifying and identifying a commodity according to the embodiment of the present application, the preset attention area extraction model includes at least two convolution layers connected in sequence.
Optionally, in the method for classifying and identifying a commodity according to the embodiment of the present application, the obtaining a probability distribution situation of a classification of the commodity corresponding to the region image according to the bilinear vector, and obtaining a classification result based on the probability distribution situation includes:
sequentially carrying out square root taking operation and L2 normalization operation on the bilinear vectors to obtain target vectors;
inputting the target vector into a softmax function to obtain the probability distribution condition of the classification of the commodity corresponding to the area image;
and obtaining a classification result corresponding to the commodity corresponding to the area image according to the probability distribution condition.
Optionally, in the method for classifying and identifying a commodity according to the embodiment of the present application, before the extracting, by using a first convolutional neural network, first feature information of each area image of a commodity distribution image, the method further includes:
acquiring a commodity distribution image, and generating a plurality of calibration frames in the commodity distribution image, wherein each calibration frame comprises a commodity, and the calibration frames are not intersected with each other;
and extracting the image of the area surrounded by each calibration frame to obtain a corresponding area image.
In a second aspect, an embodiment of the present application further provides a device for classifying and identifying a commodity, including:
the system comprises a first extraction module, a second extraction module and a third extraction module, wherein the first extraction module is used for extracting first characteristic information of each area image of a commodity distribution image by adopting a first convolution neural network, the commodity distribution image comprises a plurality of mutually-disjoint area images, and each area image corresponds to a commodity;
the first generation module is used for generating a corresponding attention area map according to the area image and corresponding first feature information;
the second extraction module is used for extracting second characteristic information of the attention area map by adopting a second convolutional neural network;
the pooling module is used for pooling the first characteristic information and the second characteristic information corresponding to the first characteristic information to obtain a bilinear vector;
and the identification module is used for acquiring the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vector and acquiring a classification result based on the probability distribution condition.
Optionally, in the article classification and identification device according to an embodiment of the present application, the first generating module includes:
a first obtaining unit, configured to input the first feature information into a preset attention area extraction model to obtain location information of an attention area;
the second acquisition unit is used for cutting the corresponding region image according to the position information of the attention region so as to acquire an initial attention region map;
an amplifying unit for performing up-sampling processing on the initial attention area map to acquire an attention area map having the same resolution as that of the corresponding area image
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.
In a fourth aspect, embodiments of the present application provide a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.
As can be seen from the above, in the embodiment of the application, the first feature information of each area image of the commodity distribution image is extracted by using the first convolutional neural network, where the area image includes a plurality of mutually disjoint area images, and each area image corresponds to a commodity; generating a corresponding attention area map according to the area image and the corresponding first characteristic information; extracting second characteristic information of the attention area map by adopting a second convolutional neural network; pooling the first characteristic information and the second characteristic information to obtain a bilinear vector; acquiring the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vector, and acquiring a classification result based on the probability distribution condition; thereby realizing the classification and identification of the commodities; according to the method and the device, the attention area image with more detailed information such as details and textures is subjected to feature extraction, and the feature information of the area image is combined, so that the fine feature difference of the commodity is identified by adopting the bilinear vector, and the accuracy of commodity classification identification can be improved.
The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart illustrating a product classification and identification method according to some embodiments of the present disclosure. The commodity classification and identification method is used for identifying the classification of commodities in the open type intelligent retail container so as to facilitate automatic generation of orders or automatic settlement in an automatic selling process. The commodity classification and identification method can be applied to a remote server in communication connection with the open type intelligent retail container and can also be applied to a main control module arranged in the open type intelligent retail container. The commodity classification and identification method comprises the following steps:
s101, extracting first characteristic information of an area image of a commodity distribution image by using a first convolution neural network, wherein the commodity distribution image comprises a plurality of mutually-disjoint area images, and each area image corresponds to a commodity.
And S102, generating a corresponding attention area map according to the area image and the corresponding first characteristic information.
S103, extracting second characteristic information of the attention area map by adopting a second convolutional neural network.
And S104, performing pooling processing on the first characteristic information and the corresponding second characteristic information to obtain a bilinear vector.
And S105, obtaining the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vectors, and obtaining a classification result based on the probability distribution condition.
In step S101, the distribution image of the product in the open intelligent retail container is captured by the fisheye camera in the open intelligent retail container. The region image is rectangular or square, each region image corresponds to an image of a commodity, and the plurality of region images are the same in size. When arranging commodity, in order to be convenient for shoot and be convenient for access, a plurality of commodities in this packing cupboard are the rectangular array interval and arrange.
Wherein, a preset first convolution neural network a can be adopted to perform feature extraction on each region image X to obtain a corresponding feature fA(X). The first convolutional neural network a is a convolutional neural network commonly used in the prior art for extracting picture feature information.
It is understood that, in some embodiments, before the step S101, the method for identifying the product classification further includes the following steps: s1011, acquiring a commodity distribution image, and generating a plurality of calibration frames in the commodity distribution image, wherein each calibration frame comprises a commodity, and the calibration frames are not intersected with each other; and S1012, extracting the image of the area surrounded by the calibration frames to obtain a corresponding area image. The calibration frames are rectangular, and any two adjacent calibration frames do not intersect. Then, an image in which each of the calibration frames encloses an area is extracted from the product distribution image with the calibration frame as a boundary, thereby obtaining a plurality of area images.
In step S102, the attention area map is an image of an area where the first feature information is extracted from the area image. Wherein, the attention area graph and the attention area are both square. In some embodiments, this step S102 includes the following sub-steps: s1021, inputting the first feature information into a preset attention area extraction model to obtain position information of an attention area; s1022, cutting the corresponding area image according to the position information of the attention area to acquire an initial attention area image; and S1023, performing up-sampling processing on the initial attention area map to acquire an attention area map with the same size and resolution as the corresponding area image.
In step S1021, the predetermined attention area extraction model is composed of at least two convolution layers, which may be two or more convolution layers. The position information of the attention area is { tx, ty, t1}, where (tx, ty) is the center coordinate of the attention area, the t1 is half the side length of the attention area, and the attention area defaults to a square.
In step S1022, the clipped attention area image is rectangular, and in the specific operation, a rectangular area of all the first extracted feature information is extracted from the corresponding area image according to the position information, that is, all the extracted first feature information is extracted from the rectangular area. Therefore, after the clipping, the ratio of the number of the first feature information in the attention area map to the area of the attention area map is increased, which is convenient for extracting and identifying feature information such as details and textures in the attention area map.
Specifically, in some embodiments, this step S1021 includes: generating a mask M, wherein the size and the shape of the mask M are the same as those of the area image, and each coordinate point of the mask M corresponds to each coordinate point of the area image one to one; acquiring a constraint calculation formula of the coordinate point of the mask M and the position information of the attention area; according to said orderThe bundle calculation formula cuts out a portion of the region image located in the attention region to obtain an initial attention region map. Wherein, the constraint calculation formula is M (x, y) = { h (x-tx 1) -h (x-tx 2) } { h (y-ty 1) -h (y-ty 2) }; wherein h (x) = 1/(1 + e)-kx). Where, (tx 1, ty 1) is the coordinate of the top left corner of the attention area obtained from { tx, ty, t1} as the position information, and (tx 2, ty 2) is the coordinate of the bottom right corner of the attention area obtained from { tx, ty, t1} as the position information. By setting the value of the empirical value k, it can be ensured that the value of the mask M (x, y) is 1 when only the coordinate point of the mask M is in the attention area, and the value of the mask M (x, y) is 0 when not in the attention area; the M (x, y) is thus used to crop the initial attention area map from the corresponding area image.
In this step S103, a preset second convolutional neural network B may be used to perform feature extraction on each attention area map Y to obtain a corresponding feature fB(Y). The second convolutional neural network B is a convolutional neural network commonly used in the prior art for extracting picture feature information. The attention area map in the area image is cut out, and the attention area map is set to have the same size and resolution as the area image, so that the detail and texture information of the commodity can be conveniently extracted from the attention area map, and the accuracy of subsequent classification and identification is improved.
In the step S104, the first characteristic information f is first obtainedA(X) and the second characteristic information fBAnd (Y) merging, and then performing pooling operation on the merged feature information by adopting a preset pooling layer to obtain a bilinear feature vector V.
In step S105, the bilinear vector V may be preprocessed and then input into a preset probability distribution prediction function to obtain a probability distribution of the classification of the commodity corresponding to the region image.
Specifically, in some embodiments, this step S105 includes: s1051, sequentially carrying out square root taking operation and L2 normalization operation on the bilinear vectors to obtain target vectors; s1052, inputting the target vector into a softmax function to obtain the probability distribution condition of the classification of the commodity corresponding to the area image; and S1053, obtaining the classification result corresponding to the commodity corresponding to the area image according to the probability distribution condition. Wherein, the probability distribution situation is used for describing the probability that the commodity corresponding to the area image belongs to various categories. For example, for a certain region image, the probability distribution obtained at the end is: p1, P2, P3, P4 and P5, wherein the commodity categories corresponding to P1, P2, P3, P4 and P5 are A1, A2, A3, A4 and A5 respectively. Among these P1, P2, P3, P4, and P5, since P2 is the largest, the classification of the certain region image is a2 type.
As can be seen from the above, in the embodiment of the application, the first feature information of each area image of the commodity distribution image is extracted by using the first convolutional neural network, where the area image includes a plurality of mutually disjoint area images, and each area image corresponds to a commodity; generating a corresponding attention area map according to the area image and the corresponding first characteristic information; extracting second characteristic information of the attention area map by adopting a second convolutional neural network; pooling the first characteristic information and the second characteristic information to obtain a bilinear vector; acquiring the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vector, and acquiring a classification result based on the probability distribution condition; thereby realizing the classification and identification of the commodities; according to the method and the device, the attention area image with more detailed information such as details and textures is subjected to feature extraction, and the feature information of the area image is combined, so that the fine feature difference of the commodity is identified by adopting the bilinear vector, and the accuracy of commodity classification identification can be improved.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a product classification and identification device in some embodiments of the present application. This commodity classification recognition device includes: a first extraction module 201, a first generation module 202, a second extraction module 203, a pooling module 204, and an identification module 205.
The first extraction module 201 is configured to extract first feature information of each area image of a product distribution image by using a first convolutional neural network, where the product distribution image includes a plurality of mutually disjoint area images, and each area image corresponds to a product. The commodity distribution image is a distribution image of commodities in the container shot by a fisheye camera in the container. The region image is rectangular or square, each region image corresponds to an image of a commodity, and the plurality of region images are the same in size. When arranging commodity, in order to be convenient for shoot and be convenient for access, a plurality of commodities in this packing cupboard are the rectangular array interval and arrange.
Wherein, a preset first convolution neural network a can be adopted to perform feature extraction on each region image to obtain a corresponding feature fA(X). The first convolutional neural network a is a convolutional neural network commonly used in the prior art for extracting picture feature information.
It is to be appreciated that in some embodiments, the first extraction module 201 is further configured to: acquiring a commodity distribution image, and generating a plurality of calibration frames in the commodity distribution image, wherein each calibration frame comprises a commodity, and the calibration frames are not intersected with each other; and extracting the image of the area surrounded by each calibration frame to obtain a corresponding area image. The calibration frames are rectangular, and any two adjacent calibration frames do not intersect. Then, an image in which each of the calibration frames encloses an area is extracted from the product distribution image with the calibration frame as a boundary, thereby obtaining a plurality of area images.
The first generating module 202 is configured to generate a corresponding attention area map according to the area image and the corresponding first feature information. The attention area map is an image of an area where the feature information is extracted in the area image. Wherein, the attention area graph and the attention area are both square. Specifically, the first generating module 202 includes: a first obtaining unit, configured to input the first feature information into a preset attention area extraction model to obtain location information of an attention area; the second acquisition unit is used for cutting the corresponding region image according to the position information of the attention region so as to acquire an initial attention region map; and the amplifying unit is used for performing up-sampling processing on the initial attention area map so as to acquire the attention area map with the same resolution as that of the corresponding area image.
The predetermined attention area extraction model is composed of at least two convolution layers connected with each other, and may be two or more than two convolution layers. The position information of the attention area is { tx, ty, t1}, where (tx, ty) is the center coordinate of the attention area, the t1 is half the side length of the attention area, and the attention area defaults to a square. The attention area image obtained through cutting is rectangular, a rectangular area with more extracted feature information is cut out from the corresponding area image according to the position information during specific operation, and all the extracted first feature information is extracted from the rectangular area.
Specifically, in some embodiments, the second obtaining unit is configured to: generating a mask M, wherein the size and the shape of the mask M are the same as those of each area image, and each coordinate point of the mask M corresponds to each coordinate point of each area image one to one; acquiring a constraint calculation formula of the coordinate point of the mask M and the position information of the attention area; and cutting out the part of the area image, which is positioned in the attention area, according to the constraint calculation formula to obtain an initial attention area map. Wherein, the constraint calculation formula is M (x, y) = { h (x-tx 1) -h (x-tx 2) } { h (y-ty 1) -h (y-ty 2) }; wherein h (x) = 1/(1 + e)-kx). Where, (tx 1, ty 1) is the coordinate of the top left corner of the attention area obtained from { tx, ty, t1} as the position information, and (tx 2, ty 2) is the coordinate of the bottom right corner of the attention area obtained from { tx, ty, t1} as the position information. By setting the value of k, it can be ensured that the value of mask M (x, y) is 1 only when (x, y) is in the attention area; the M (x, y) is thus used to crop the initial attention area map from the corresponding area image.
The second extraction module 203 is configured to extract second feature information of the attention area map by using a second convolutional neural network;the second extraction module 203 may extract features of each region image by using a preset second convolutional neural network B to obtain corresponding features fB(Y). The second convolutional neural network B is a convolutional neural network commonly used in the prior art for extracting picture feature information.
The pooling module 204 is configured to pool the first feature information and the second feature information to obtain a bilinear vector; the pooling module 204 first stores the first feature information fA(X) and the second characteristic information fBAnd (Y) merging, and then performing pooling operation on the merged feature information by adopting a preset pooling layer to obtain a bilinear feature vector V.
The identification module 205 is configured to obtain a probability distribution of the classification of the commodity corresponding to the region image according to the bilinear vector, and obtain a classification result based on the probability distribution. The bilinear vectors may be preprocessed and then input into a preset probability distribution function to obtain the probability distribution of the classification of the commodity corresponding to the region image. Specifically, the identification module 205 is configured to perform a square root operation and an L2 normalization operation on the bilinear vector in sequence to obtain a target vector; inputting the target vector into a softmax function to obtain the probability distribution condition of the classification of the commodity corresponding to the area image; and obtaining a classification result corresponding to the commodity corresponding to the area image according to the probability distribution condition. Wherein, the probability distribution situation is used for describing the probability that the commodity corresponding to the area image belongs to various categories. For example, for a certain region image, the probability distribution obtained at the end is: p1, P2, P3, P4 and P5, wherein the corresponding categories of P1, P2, P3, P4 and P5 are A1, A2, A3, A4 and A5 respectively. Among these P1, P2, P3, P4, and P5, since P2 is the largest, the classification of the certain region image is a2 type.
As can be seen from the above, in the embodiment of the application, the first feature information of each area image of the commodity distribution image is extracted by using the first convolutional neural network, where the area image includes a plurality of mutually disjoint area images, and each area image corresponds to a commodity; generating a corresponding attention area map according to the area image and the corresponding first characteristic information; extracting second characteristic information of the attention area map by adopting a second convolutional neural network; pooling the first characteristic information and the second characteristic information to obtain a bilinear vector; acquiring the probability distribution condition of the classification of the commodities corresponding to the region image according to the bilinear vector, and acquiring a classification result based on the probability distribution condition; thereby realizing the classification and identification of the commodities; according to the method and the device, the attention area image with more definite identifiable details, textures and other information is subjected to feature extraction, and the features of the area image are combined, so that the tiny feature difference of commodity mutual exclusion is identified by adopting the bilinear vector, and the accuracy of commodity classification identification can be improved.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device 3 according to an embodiment of the present application, where the present application provides an electronic device 3, including: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the computing device is running to perform the method in any of the alternative implementations of the embodiments described above.
The embodiment of the present application provides a storage medium, and when being executed by a processor, the computer program performs the method in any optional implementation manner of the above embodiment.
The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.