CN114255377A

CN114255377A - Differential commodity detection and classification method for intelligent container

Info

Publication number: CN114255377A
Application number: CN202111476957.XA
Authority: CN
Inventors: 冯栋; 刘治宇; 刘浩; 陈洪伟
Original assignee: Qingdao Turing Technology Co ltd
Current assignee: Qingdao Turing Technology Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-29

Abstract

The invention provides a method for detecting and classifying difference commodities of an intelligent container, which comprises the steps of obtaining two commodity images shot by a camera arranged above a goods shelf of the intelligent container at different moments; detecting the commodities in the two commodity images by using a commodity difference detection model trained in advance to obtain a difference commodity detection result; the difference commodity detection result comprises the coordinates of a difference commodity detection frame and image information to which the difference commodity belongs; and identifying the detected different commodities according to the detection result of the different commodities to obtain the category information of the different commodities. According to the scheme of the invention, the image positions of the different commodities on the scene images of two intelligent containers before and after the purchase of a consumer are directly detected, and then the commodities at the image positions of the different commodities are identified to determine the type of the different commodities, so that the problems of high labeling cost, high updating cost, high deployment cost and the like of the existing fully supervised target detection model are solved.

Description

Differential commodity detection and classification method for intelligent container

Technical Field

The invention relates to the technical field of computer vision and deep learning, in particular to a method for detecting and classifying different commodities of an intelligent container.

Background

The e-commerce in the internet era develops very rapidly, but after a period of high-speed development, the traditional e-commerce enters a bottleneck period, the requirements of people on convenience and timeliness of consumption are higher and higher, and the traditional e-commerce is difficult to meet the requirements of people on convenient high-quality life. In the context of the concept of "new retail," traditional e-commerce is trying to integrate with off-line sales channels, and intelligent containers are an important development direction for new retail.

The solutions of the intelligent container are divided into two categories of vision and non-vision, the non-vision solution occupies most markets by the advantages of simple principle, convenient deployment, high accuracy and the like, but with the progress and development of the deep neural network in the field of computer vision, the vision solution based on the deep neural network becomes the key research point of the solution of the intelligent container.

Disclosure of Invention

The invention provides a method for detecting and classifying differential commodities of intelligent containers, which can directly detect the positions of the differential commodities on scene images of two intelligent containers before and after a consumer purchases the commodities, and then obtain the types of the differential commodities by using a target recognition model, thereby completing the functions of automatic commodity settlement, inventory checking and the like.

The invention provides a method for detecting and classifying different commodities of an intelligent container, which comprises the following steps:

acquiring two commodity images shot by a camera arranged above a goods shelf of an intelligent container at different moments; the two commodity images are obtained by overlooking and shooting by the camera;

detecting the commodities in the two commodity images by using a commodity difference detection model trained in advance, wherein the detection process comprises the following steps: extracting, fusing and target regressing the characteristics to obtain a detection result of the different commodity; the difference commodity detection result comprises the coordinates of a difference commodity detection frame and image information to which the difference commodity belongs;

and identifying the detected different commodities according to the detection result of the different commodities to obtain the category information of the different commodities.

In an optional embodiment, the commodity difference detection model is provided with two weight-sharing feature extractors, output ends of the two weight-sharing feature extractors are connected with a feature fusion operator, and output ends of the feature fusion operator are connected with a regression network;

correspondingly, the commodity difference detection model trained in advance is used for detecting the commodities in the two commodity images, and the detection processing comprises the following steps: and (3) obtaining a detection result of the different commodity by feature extraction, feature fusion and target regression, wherein the detection result comprises the following steps:

respectively extracting the features of the two commodity images through the feature extractor shared by the two weights to obtain a first image feature and a second image feature;

calculating the difference value of the first image characteristic and the second image characteristic through the characteristic fusion operator to obtain a fusion image characteristic;

and identifying the fusion image characteristics through the regression network to obtain a difference commodity detection result.

Further, the feature extractor shared by the two weights adopts ResNet-18 which deletes the last full connection layer.

Further, a spatial attention module and a channel attention module exist in the regression network.

In an optional embodiment, the classifying the detected different commodities according to the different commodity detection result to obtain different commodity category information includes:

determining position information of the differential commodities according to the coordinates of the differential commodity detection frame and the image information of the differential commodities;

and identifying the commodities at the positions of the different commodities by using the commodity identification model trained in advance, and determining the category information of the different commodities.

In an optional embodiment, before acquiring two images of the commodity captured by the cameras disposed above the intelligent container rack at different times, the method further includes:

collecting a plurality of groups of different commodity images; each group of difference commodity images comprises two commodity images which are overlooked and shot at the same visual angle and different moments;

performing frame marking on the different commodities in the multiple groups of different commodity images to obtain marking frame information corresponding to each group of different commodity images, and generating label information of each group of different commodity images according to the marking frame information;

performing data enhancement processing on the multiple groups of difference commodity images to obtain multiple groups of difference commodity training images;

and training the constructed commodity difference detection model by using the plurality of groups of difference commodity training images and the label information to obtain the commodity difference detection model which is trained in advance.

Further, the generating label information of each group of difference commodity images according to the labeling frame information includes:

performing grid division on the multiple groups of different commodity training images according to the preset grid size to obtain at least one grid area;

generating label information corresponding to each grid area of each group of the differential commodity training images according to the labeling frame information corresponding to each group of the differential commodity training images; the label information comprises the position relationship between the grids and the central point of the labeling frame, the position relationship between the two difference commodity images and the central point of the labeling frame, the abscissa and the ordinate of the central point of the labeling frame, and the length and the width of the labeling frame.

Further, the data enhancement processing on the plurality of groups of difference commodity images comprises at least one of the following steps:

randomly replacing the positions of two different commodity images in each group of different commodity training images;

randomly cutting and/or randomly filling two different commodity images in each group of different commodity training images;

turning over two different commodity images in each group of different commodity training images in a random mirror image manner;

and enhancing the contrast and/or brightness and/or saturation of the two different commodity images in each group of different commodity training images.

Further, the training of the constructed commodity difference detection model by using the plurality of groups of difference commodity training images and the label information includes:

carrying out different commodity detection on each grid area in the multiple groups of different commodity training images, and adjusting an anchor frame preset in each grid area according to a detection result to obtain different commodity prediction information of each grid area;

calculating a difference commodity detection loss value according to the difference commodity prediction information and the label information of each grid area, and reversely transmitting the difference commodity detection loss value to each layer of the commodity difference detection model so as to update weight parameters of each layer according to the difference commodity detection loss value;

and repeating the training steps until the commodity difference detection model converges.

The invention provides a method for detecting and classifying difference commodities of an intelligent container, which comprises the steps of obtaining two commodity images shot by a camera arranged above a goods shelf of the intelligent container at different moments; the two commodity images are obtained by overlooking and shooting by the camera; detecting the commodities in the two commodity images by using a commodity difference detection model trained in advance, wherein the detection process comprises the following steps: extracting, fusing and target regressing the characteristics to obtain a detection result of the different commodity; the difference commodity detection result comprises the coordinates of a difference commodity detection frame and image information to which the difference commodity belongs; and identifying the detected different commodities according to the detection result of the different commodities to obtain the category information of the different commodities. Compared with the prior art, the scheme directly detects the image positions of the different commodities on the scene images of the two intelligent containers before and after the consumer purchases the goods, and then the target recognition model is used for recognizing the commodities at the image positions of the different commodities to obtain the categories of the different commodities so as to realize automatic settlement and intelligent management of the commodities.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a scenario architecture upon which the present disclosure is based;

FIG. 2 is a schematic flow chart of a method for detecting and classifying different commodities of an intelligent container according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a commodity difference detection model according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of a training method for a commodity difference detection model according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, the intelligent container mainly identifies the types of commodities taken away by consumers through a visual identification algorithm and automatically settles accounts, and the main principle is to shoot images of commodities purchased by the users by using a camera, automatically identify the types of the purchased commodities by using a trained target identification model and further settle the commodity cost according to the types of the commodities purchased by the users.

However, the method realizes commodity identification and calculation based on the fully supervised target detection model, and has the problems of high labeling cost, high updating cost, high deployment cost and the like.

Fig. 1 is a schematic diagram of a scene architecture based on the present disclosure, and as shown in fig. 1, the scene architecture based on the present disclosure may include an intelligent container 1, a differential goods detection and classification apparatus 2, and a camera 3.

The different commodity detecting and classifying device 2 is hardware or software that can interact with the camera 3 through a network, and can be used to execute the different commodity detecting and classifying method described in each embodiment described below.

When the different product detecting and classifying device 2 is hardware, it may be an electronic device having an arithmetic function. When the different product detecting and classifying device 2 is software, it may be installed in an electronic device having an arithmetic function. The electronic devices include, but are not limited to, servers, smart boxes, desktop computers, and the like.

The camera 3 may be a hardware device integrated on the intelligent container 1 and capable of shooting a large range of objects in a short distance.

In an actual scene, the differential commodity detecting and classifying device 2 may be a server integrated with or installed on the intelligent container 1, the differential commodity detecting and classifying device 2 may be operated on the intelligent container 1, and the differential commodity detecting and classifying device 2 may also be integrated with or installed in a backend server that processes commodity images, so as to provide a commodity detecting and classifying service for the intelligent container 1. The specific process is as follows: the differential commodity detection and classification device 2 obtains two commodity images shot by the camera 3 before and after the consumer purchases the commodity, and the differential commodity detection and classification device 2 detects the two commodity images shot before and after the consumer purchases the commodity by adopting the method shown in the following embodiment to determine the position of the differential commodity in the intelligent container 1, and identifies the commodity type of the position of the differential commodity for automatic settlement of the commodity.

The method for detecting and classifying the different commodities of the intelligent container provided by the application is further explained as follows:

fig. 2 is a schematic flow chart of a method for detecting and classifying different commodities of an intelligent container according to an embodiment of the present disclosure. As shown in fig. 2, the method for detecting and classifying different commodities of an intelligent container provided by the embodiment of the present disclosure includes:

s21, acquiring two commodity images shot by a camera arranged above the intelligent container shelf at different moments; wherein the two commodity images are obtained by the camera through overlooking shooting.

The camera can shoot a large-range commodity image in a short distance.

In this embodiment, since the consumer takes the commodity away from the shelf of the intelligent container when purchasing the commodity in the intelligent container, and the camera can be configured to the position capable of shooting all the commodities on the shelf in order to accurately identify the commodity taken away by the user, the overlooking image of all the commodities on the shelf can be shot by using the camera, and the commodity identification error caused by the shooting dead angle can be avoided.

S22, detecting the commodities in the two commodity images by using the commodity difference detection model trained in advance, wherein the detection process comprises the following steps: extracting, fusing and target regressing the characteristics to obtain a detection result of the different commodity; and the difference commodity detection result comprises the coordinates of the difference commodity detection frame and the image information to which the difference commodity belongs.

In this embodiment, unlike a common target detection model that detects an input single commodity image, the commodity difference detection model of this embodiment inputs a pair of commodity images, and the commodity difference detection model sequentially performs feature extraction, feature fusion, and target regression on two commodity images to obtain coordinates of a different commodity detection frame and image information to which a different commodity belongs.

Specifically, as shown in fig. 3, the commodity difference detection model is composed of two feature extractors shared by weights, a feature fusion operator, and a regression network, and the difference commodity detection process includes: respectively extracting the features of the two commodity images through the feature extractor shared by the two weights to obtain a first image feature and a second image feature; calculating the difference value of the first image characteristic and the second image characteristic through the characteristic fusion operator to obtain a fusion image characteristic; and identifying the fusion image characteristics through the regression network to obtain a difference commodity detection result.

Further, the feature extractor of this embodiment extracts the depth features of two input commodity images by deleting the ResNet-18 of the last full connection layer, and the initialization parameters of the ResNet-18 are pre-trained on the ImageNet data set. The regression network of the present embodiment has a spatial attention module and a channel attention module.

The advantage of this arrangement is that two weight-shared feature extractors are used to extract the features of the pair of input commodity images respectively. Because the parameters of the two feature extractors are shared, the two input commodity images are mapped to the same feature space, which is beneficial to obtaining the difference of the two input commodity images on the spatial position in the follow-up process; the key of the algorithm is to fuse the features extracted by the two feature extractors by using a feature fusion operator to obtain difference information, perform difference calculation on the features of the two input commodity images to obtain fusion image features containing the difference information, and regress the difference commodity information according to the fusion image feature images; in order to better analyze the dependency relationship between the spatial difference of the fused image features and the global features, two types of attention modules of the spatial dimension and the channel dimension are added into a regression network to respectively simulate the semantic interdependency in the spatial dimension and the channel dimension and better analyze the spatial difference features.

And S23, identifying the detected difference commodities according to the detection result of the difference commodities to obtain the type information of the difference commodities.

In the embodiment, on the basis of acquiring the coordinates of the differential commodity detection frame and the image to which the differential commodity belongs, the commodity identification model can be used for identifying the differential commodity in the image to which the differential commodity belongs, the type of the differential commodity is determined to be the type of the commodity purchased by the consumer, further, the cost settlement can be performed according to the type of the commodity purchased by the consumer, and the intelligent management of the commodity in the intelligent container is realized.

Specifically, the position information of the differential commodity is determined according to the coordinates of the differential commodity detection frame and the image information of the differential commodity; and identifying the commodities at the positions of the different commodities by using the commodity identification model trained in advance, and determining the category information of the different commodities.

The commodity difference detection model has the advantages that the commodity difference detection model only detects the positions of the different commodities, commodity classification is not carried out, and commodity classification work is completed by the special commodity identification model, so that when a product is new, only the commodity identification model needs to be updated, the commodity difference detection model does not need to be retrained, the difficulty of model training is reduced, and a large amount of model updating cost can be saved.

The embodiment provides a commodity identification method of an intelligent container, which comprises the steps of obtaining two commodity images shot by a camera arranged above a goods shelf of the intelligent container at different moments; the two commodity images are obtained by overlooking and shooting by the camera; detecting the commodities in the two commodity images by using a commodity difference detection model trained in advance, wherein the detection process comprises the following steps: extracting, fusing and target regressing the characteristics to obtain a detection result of the different commodity; the difference commodity detection result comprises the coordinates of a difference commodity detection frame and image information to which the difference commodity belongs; and identifying the detected different commodities according to the detection result of the different commodities to obtain the category information of the different commodities. By adopting the technical scheme provided by the embodiment of the disclosure, the image positions of the different commodities on the scene images of two intelligent containers before and after the consumer purchases the different commodities are directly detected, and the target recognition model is used for recognizing the commodities at the image positions of the different commodities to obtain the categories of the different commodities, so that the problems of high labeling cost, high updating cost, high deployment cost and the like are solved.

On the basis of the foregoing embodiment, fig. 4 is a flowchart illustrating a method for training a commodity difference detection model according to an embodiment of the present disclosure, where before the two commodity images captured by the cameras disposed above the shelves of the intelligent containers at different times are obtained in step S21, the method further includes a stage of training the commodity difference detection model, as shown in fig. 4, the method includes:

s41, collecting a plurality of groups of difference commodity images; each group of difference commodity images comprises two commodity images which are overlooked and shot at the same visual angle and different moments.

In this embodiment, because there is no available public data set to directly train, the data set used for training the model needs to be collected and labeled according to the actual application scene, the image pair captured in the container can be collected, each pair of images is a simulation of the scene in the container at two moments before and after consumer consumption, the cameras in the container are used for looking down and capturing at different moments at the same viewing angle, the capturing time interval is not long, so as to ensure that the two images are basically in the same illumination and background, but the commodities in the images are different, the capturing condition is set to simulate the situation that the consumer consumes once in the intelligent container, the time for consuming once is usually not too long, the purchased commodities are not too many, and the commodity change caused by consumption is not very large.

And S42, performing frame annotation on the different commodities in the multiple groups of different commodity images to obtain annotation frame information corresponding to each group of different commodity images, and generating label information of each group of different commodity images according to the annotation frame information.

In this embodiment, the labeling frame of each group of difference commodity images is the position of the difference commodity in each group of difference commodity images, the labeling frame information can be stored as an xml file, the labeling frame information records the enclosing frames of the difference commodities on the two images of each group of difference commodity images, the positions and sizes of the difference commodities can be recorded by using the coordinates of the upper left corner and the lower right corner of the enclosing frames, and during training, the labeling frame information of each group of difference commodity images is analyzed according to a specific rule to obtain the label information of each group of difference commodity images for model training.

Specifically, after the frame annotation is performed on the difference commodities in each group of difference commodity images, the annotation frame information is analyzed, and the method comprises the following steps: performing grid division on the multiple groups of different commodity training images according to the preset grid size to obtain at least one grid area; generating label information corresponding to each grid area of each group of the differential commodity training images according to the labeling frame information corresponding to each group of the differential commodity training images; the label information comprises the position relationship between the grids and the central point of the labeling frame, the position relationship between the two difference commodity images and the central point of the labeling frame, the abscissa and the ordinate of the central point of the labeling frame, and the length and the width of the labeling frame.

For example, each set of difference product images includes a graph a and a graph B, each set of difference product images is divided into S × S grids, each grid corresponding to a label represented by a vector of the form: [ P (Obj), P (A | Obj), P (B | Obj), midx, midy, w, h ], P (Obj) indicates whether a center point of a labeling frame in the grid falls in, P (A | Obj) and P (B | Obj) respectively indicate whether the center point of the labeling frame falls in the graph A or the graph B, midx and midy indicate horizontal and vertical coordinates of the center point of the labeling frame, and w and h indicate the length and width of the labeling frame. If the center point of the marking frame is in the current grid, setting P (obj) to be 1, otherwise, setting 0; if the label frame is on the graph A, setting P (A | Obj) as 1, otherwise setting 0, setting the setting of P (B | Obj) in the same way, if two different label frames are on the graph A and the graph B respectively, and the central points of the two different label frames just fall in the same grid, setting P (A | Obj) and P (B | Obj) as 1 at the same time, and recording the size and the position information of the label frame by (midx, midy, w, h). According to the above rule, one label vector with a size of S × 7 can be obtained for each group of difference product images.

And S43, performing data enhancement processing on the multiple groups of difference commodity images to obtain multiple groups of difference commodity training images.

In this embodiment, since training sample data is insufficient and the distribution of the difference objects on each group of difference commodity images is irregular, the type and the number of the training sample data are expanded by processing the initial training sample data in a data enhancement mode.

Specifically, the data enhancement processing includes at least one of: randomly replacing the positions of two different commodity images in each group of different commodity training images; randomly cutting and/or randomly filling two different commodity images in each group of different commodity training images; turning over two different commodity images in each group of different commodity training images in a random mirror image manner; and enhancing the contrast and/or brightness and/or saturation of the two different commodity images in each group of different commodity training images.

For example, in order to reduce the negative influence caused by uneven distribution of the difference commodities in the image, a randomly replaced data enhancement strategy is designed, the sequence of the graph A and the graph B and the corresponding label information are replaced by 50% of probability, and the negative influence on the learning process caused by the fixed input sequence of the graph A and the graph B is reduced, and the quantity of the difference commodities distributed in the graph A and the graph B is balanced. In order to increase the diversity of the training samples, randomly cutting or randomly filling two different commodity images in each group of different commodity training images to obtain samples with richer sizes; then randomly adjusting contrast, brightness and saturation of the image; and finally, randomly mirroring and turning the graph A and the graph B with the probability of 50%. When each group of different commodity training images is subjected to the image enhancement processing, the processing modes of the graph A and the graph B are required to be completely the same.

And S44, training the constructed commodity difference detection model by using the plurality of groups of difference commodity training images and the label information to obtain the commodity difference detection model which is trained in advance.

In the embodiment, a plurality of groups of different commodity training images and label information are input into a constructed commodity difference detection model, and different commodities in each group of different commodity training images are detected through the commodity difference detection model to obtain different commodity prediction information; calculating a difference commodity detection loss value according to the difference commodity prediction information and the label information, and reversely transmitting the difference commodity detection loss value to each layer of the commodity difference detection model so as to update weight parameters of each layer according to the difference commodity detection loss value; and repeating the training steps until the commodity difference detection model converges.

Specifically, performing different commodity detection on each grid area in the multiple groups of different commodity training images, and adjusting an anchor frame preset in each grid area according to a detection result to obtain different commodity prediction information of each grid area; calculating a difference commodity detection loss value according to the difference commodity prediction information and the label information of each grid area, and reversely transmitting the difference commodity detection loss value to each layer of the commodity difference detection model so as to update weight parameters of each layer according to the difference commodity detection loss value; and repeating the training steps until the commodity difference detection model converges.

For example, predicting the offsets of the sample center coordinates and the length and width from a preset anchor box is much simpler than directly regressing the coordinates, which simplifies the regression problem and makes the network easier to train. The preset anchor boxes tile the feature map in a convolution manner such that the position of each anchor box relative to its corresponding grid is fixed. For each prediction box, the network predicts the following: the probability of the target object in the prediction box, the probability of the object in the prediction box falling on the two images respectively, and the position of the prediction box. For example, K anchor boxes are preset for each grid, 1 object score, 2 position scores and 4 offsets from the anchor boxes are predicted for each anchor box, so 7K filters are applied around each grid in the feature map. The model models the task of detecting differences as a regression problem. The image is first divided into S by S grids, each of which predicts K bounding boxes. The whole loss function is divided into a priori frame loss and a regression loss, the regression loss comprises an object loss, a coordinate loss and a category loss, the category of the commodity difference detection model refers to that an object falls on a graph A or a graph B in space, does not represent the category of a commodity, and is information on a spatial level.

Where t represents the total number of current training samples,

indicating that the number of the current training samples is less than the preset number of times T₀Only if this condition is met, the prior frame loss is calculated, since L_priorThe design of (a) is just to let the model learn the preset anchor frame faster at an early stage.

The regression loss is defined as follows:

when the intersection ratio of a prediction frame obtained by the jth anchor point corresponding to the ith grid and a labeling frame falling into the grid is smaller than a preset threshold value Thresh, the prediction frame is considered to have no target object, and L is calculated for the prediction frame_noojb(ii) a When the jth anchor frame corresponding to the ith grid is matched with the label frame falling into the grid, calculating L for the prediction frame_obj，L_croodAnd L_class. In order to reflect that the same prediction deviation has different influences on large-scale and small-scale labeling boxes, we add error loss

The parameter reduces the punishment value of the prediction deviation to the large-scale frame and increases the punishment value of the prediction deviation to the small-scale frame,

and

respectively, the length and width of the labeling frame are normalized relative to the current grid, and the value is between 0 and 1.

The loss function is obtained by adding the above loss functions according to different weights, and is specifically shown in formula six.

L_t＝λ_prior*L_prior+λ_coord*L_coord+λ_noobj*L_noobj+λ_obj*L_obj+λ_class*L_classFormula six

In practice, the weight is set to λ_prior＝0.01,λ_noobj＝0.5,λ_obj＝5,λ_crd＝2,λ_class＝1。

It can be seen that the loss function of the present embodiment is to calculate the loss values of each part in each grid, and the meaning of the loss function is not exactly the same as that of the loss function of the common target detection model. Firstly, the independent variable of the loss function is the fusion characteristic of a pair of input images, and the meaning of the represented information is different from that of a common target detection model. Secondly, the semantics of the category is completely different from that of the common target detection model, the category of the loss function represents which image the current prediction frame belongs to, and is semantic information of a spatial level, and the category in the common target detection model refers to the category to which the object in the prediction frame specifically belongs.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. The method for detecting and classifying the different commodities of the intelligent container is characterized by comprising the following steps of:

2. The method for detecting and classifying the difference commodities of the intelligent container according to claim 1, wherein the commodity difference detection model is provided with two weight-sharing feature extractors, the output ends of the two weight-sharing feature extractors are connected with a feature fusion operator, and the output ends of the feature fusion operator are connected with a regression network;

3. The method for detecting and classifying differential commodities of intelligent containers as claimed in claim 2, wherein the two weight-sharing feature extractors use ResNet-18 with the last fully-connected layer deleted.

4. The method for detecting and classifying differential commodities of intelligent containers according to claim 2, wherein a space attention module and a channel attention module exist in the regression network.

5. The method for detecting and classifying the different commodities in the intelligent container according to the claim 1, wherein the step of classifying the detected different commodities according to the detection result of the different commodities to obtain the category information of the different commodities comprises the steps of:

6. The method for detecting and classifying the differential commodities of the intelligent container according to claim 1, wherein before the obtaining of two commodity images taken by the camera arranged above the shelf of the intelligent container at different times, the method further comprises:

7. The method for detecting and classifying the different commodities in the intelligent container according to the claim 6, wherein the generating of the label information of each group of different commodity images according to the labeling frame information comprises:

8. The method for detecting and classifying the differential commodities in the intelligent container according to claim 7, wherein the data enhancement processing on the plurality of groups of differential commodity images comprises at least one of the following steps:

9. The method for detecting and classifying the different commodities in the intelligent container according to claim 8, wherein the training of the constructed commodity difference detection model by using the plurality of groups of different commodity training images and the label information comprises: