CN117593547A

CN117593547A - Commodity area detection method, commodity area detection device, terminal equipment and medium

Info

Publication number: CN117593547A
Application number: CN202311524234.1A
Authority: CN
Inventors: 周斌; 陈应文; 徐洪亮; 许洁斌
Original assignee: Guangzhou Xuanwu Wireless Technology Co Ltd
Current assignee: Guangzhou Xuanwu Wireless Technology Co Ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-23

Abstract

The invention discloses a commodity area detection method, a commodity area detection device, terminal equipment and a medium, wherein the commodity area detection method comprises the following steps: acquiring a local image to be detected and a panoramic image to be detected, which contain commodity objects, inputting the local image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the local image to be detected with the panoramic image to be detected, a detection frame corresponding to the local image to be detected in the panoramic image to be detected is generated, and the position of the commodity objects to be detected in a market is determined according to the detection frame; training the model aiming at the partial shooting image containing the commodity and the panoramic image corresponding to the partial shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market through the method and the device, thereby improving the accuracy of searching the commodity position.

Description

Commodity area detection method, commodity area detection device, terminal equipment and medium

Technical Field

The present invention relates to the field of computer vision and target detection technologies, and in particular, to a method, an apparatus, a terminal device, and a medium for detecting a commodity area.

Background

Currently, the detection and positioning can be realized only by performing target training on the category which is detected in advance on the basis of the category which is detected in advance and needs to be fixed in advance by a deep learning target detection algorithm; in the quick-elimination industry, a method for searching and positioning a local shot picture in a picture image captured by a market camera is lacking, the local shot picture is searched and positioned in the picture image captured by the market camera by using the existing method, and the specific area position of the local shot picture containing the commodity in the panoramic picture image captured by the market camera cannot be obtained, so that a user cannot accurately find the specific position of the commodity in the market.

Disclosure of Invention

The embodiment of the invention provides a commodity area detection method, a commodity area detection device, terminal equipment and a medium, which can improve the accuracy of commodity position searching.

An embodiment of the present invention provides a method, an apparatus, a terminal device, and a medium for detecting a commodity area, including:

acquiring a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises commodity objects;

inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;

and determining the position of the commodity object to be detected in the mall according to the detection frame.

Further, the image area detection model includes: the feature extraction layer, the feature association layer and the rectangular frame prediction head;

inputting the partial image to be detected and the detection panoramic image into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected, including:

inputting the local image to be detected and the panoramic image to be detected into the feature extraction layer so that the feature extraction layer generates a first local feature image and a first panoramic feature image, and transmitting the first local feature image and the first panoramic feature image to a feature correlation layer;

the feature correlation layer carries out convolution operation on the first local feature map and the first panoramic feature map, generates a first similarity feature map of the first local feature map and the first panoramic feature map, and transmits the first similarity feature map to the rectangular frame detection head;

and the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected.

Further, the training of the image region detection model includes:

acquiring a plurality of first partial images containing commodity objects and a first panoramic image; wherein, a first rectangular frame corresponding to the first partial image in the first panoramic image is marked in the first panoramic image;

acquiring a plurality of second panoramic images containing commodity objects, randomly selecting a rectangular area in the second panoramic images, taking the images in the rectangular area as second partial images, and taking a rectangular frame corresponding to the rectangular area as a second rectangular frame;

scaling the first panoramic image and the second panoramic image to a first preset size, scaling the first partial image and the second partial image to a second preset size, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling ratio of the first panoramic image and the second panoramic image;

taking the scaled first panoramic image and second panoramic image as search graphs, dividing each search graph into a plurality of grids, and generating a plurality of anchor frames with different aspect ratios by taking the center of each grid as the center in each grid to obtain the center point abscissa, the center point ordinate, the anchor frame width, the anchor frame height and the prediction score of each anchor frame;

according to the scaled first rectangular frame and the scaled second rectangular frame serving as target frames, calculating IOUs of the target frames and each anchor frame according to the target frames and the anchor frames:

wherein A represents a target frame; b represents an anchor frame; a and B represent the intersection area of the target frame and the anchor frame; a U B represents the union area of the target frame and the anchor frame;

taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample;

performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; and when training is performed each time, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value.

Further, the calculating learning target values of the positive sample and the negative sample respectively includes:

calculating a model detection frame abscissa predicted value according to the target frame center point abscissa and the anchor frame center point abscissa;

calculating a predicted value of the ordinate of the model detection frame according to the ordinate of the center point of the target frame and the ordinate of the center point of the anchor frame;

calculating a model detection frame width predicted value according to the target frame width and the anchor frame width;

calculating a model detection frame height predicted value according to the target frame height and the anchor frame height;

taking the model detection frame abscissa predicted value, the model detection frame ordinate predicted value, the model detection frame width predicted value, the model detection frame height predicted value and the prediction score of the positive sample as the learning target value of the positive sample; wherein the predictive score of the positive sample is 1;

taking the predictive score of the negative sample as a learning target value of the negative sample; wherein the negative sample has a predictive score of 0.

Further, the loss function of the image area detection model to be trained is specifically:

I _ij ^obj representing the ith row and j column grid as positive samples; i _ij ^noobj Representing the ith row and j column grid as a negative sample; lambda is a fixed value; beta is a fixed value;the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained; />Detecting a frame width predicted value for the model; />A high predicted value is detected for the model; />Predictive scores for the model test frames.

Further, the generating a detection frame corresponding to the local image to be detected in the panoramic image to be detected includes:

calculating the central point abscissa of the detection frame according to the model detection frame abscissa predicted value and the anchor frame central point abscissa corresponding to the highest predicted score;

calculating the ordinate of the central point of the detection frame according to the predicted value of the ordinate of the model detection frame and the ordinate of the central point of the anchor frame corresponding to the highest predicted score;

calculating the detection frame width according to the model detection frame width predicted value and the anchor frame width corresponding to the highest prediction score;

calculating the height of the detection frame according to the predicted value of the height of the model detection frame and the height of the anchor frame corresponding to the highest predicted score;

determining the position of the detection frame according to the abscissa of the central point of the detection frame, the ordinate of the central point of the detection frame, the width of the detection frame and the height of the detection frame;

and generating a detection frame according to the position of the detection frame.

Further, when the image area detection model to be trained is subjected to iterative training, 60% of samples are trained in a strong supervision training mode, and 40% of samples are trained in a self-supervision training mode.

As an improvement of the above-mentioned scheme, another embodiment of the present invention correspondingly provides a commodity area detecting apparatus, including:

the image acquisition module is used for acquiring a to-be-detected local image and a to-be-detected panoramic image which contain commodity objects;

the detection frame generation module is used for inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;

and the commodity position determining module is used for determining the position of the commodity object to be detected in the market according to the detection frame.

Another embodiment of the present invention provides a terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement a commodity area detection method as described in the above embodiment.

Another embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute a method for detecting a commodity area according to the foregoing embodiment.

The invention has the following beneficial effects:

the invention provides a commodity area detection method, a commodity area detection device, terminal equipment and a medium, wherein the method comprises the steps of obtaining a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises commodity objects; inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected; and determining the position of the commodity object to be detected in the mall according to the detection frame. Training the model aiming at the partial shooting image containing the commodity and the panoramic image corresponding to the partial shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market according to the detection frame obtained by the image area detection model, thereby improving the accuracy of searching the commodity position.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting a commodity area according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a commodity area detecting apparatus according to an embodiment of the present invention;

FIG. 3 is a diagram of an image area detection model of a commodity area detection method according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating generation of a detection frame of a commodity area detection method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flow chart of a method for detecting a commodity area according to an embodiment of the present invention includes:

s1, acquiring a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises a commodity object;

in a preferred embodiment of the present invention, a commodity image including a commodity object is taken as a partial image to be detected, a container or a refrigerator panoramic image including the commodity object is taken as a panoramic image to be detected, the partial image to be detected is scaled to 383 x 383 size, and the panoramic image to be detected is scaled to 511 x 511 size.

S2, inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;

in a preferred embodiment of the present invention, the image region detection model includes: the feature extraction layer, the feature association layer and the rectangular frame prediction head;

the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;

specifically, the image region detection model mainly includes three parts: a feature extraction layer, a feature association layer and a rectangular frame pre-measurement head, as shown in fig. 3; the feature extraction layer is a feature extraction layer based on a twin network, the twin network uses a residual network Resnet50 as a basic structure of the twin network, because the local image to be detected and the model structure used by the panoramic image to be detected are identical and the weights are shared, the layer of network structure mainly extracts feature expressions of the same projection space of the reduced local image to be detected and the panoramic image to be detected, which are input in the step S1 and contain commodity objects, wherein the input size of the local image to be detected is 383 x 3, the input size of the panoramic image to be detected is 511 x 3, the local image to be detected and the panoramic image to be detected obtain a corresponding local feature map and a panoramic feature map after passing through the Resnet50 structure, the sizes of the local feature map and the panoramic feature map are 47 x 1024 and 63 x 1024 respectively, and then the feature channels of the local feature map and the panoramic feature map are compressed to 256 through a convolution layer, and the first local feature map and the first feature map are obtained and have the sizes of 47 x 63 x 1024 and 63 x 256 respectively;

specifically, the feature correlation layer performs convolution operation on the first local feature map and the first panoramic feature map, and mainly performs convolution operation on the first panoramic feature map by taking the first local feature map as a convolution check, so as to obtain a first similarity feature map of the first local feature map and the first panoramic feature map, wherein the first similarity feature map has a similarity information relationship between the first local feature map and the first panoramic feature map, and can provide feature expression information for subsequent rectangular coordinate prediction; mainly, after performing two-layer convolution operation and linear interpolation operation on the first local feature map and the second local feature map, respectively obtaining feature maps with sizes of 180×180×256 and 244×244×256, and then performing convolution operation on the obtained local feature map with sizes of 180×180×256 as a panoramic feature map with convolution check sizes of 244×244×256 to obtain a first similarity feature map with sizes of 65×65×256;

specifically, the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;

in a preferred embodiment of the present invention, the training of the image region detection model includes:

performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; during each training, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value;

specifically, a plurality of first partial images containing commodity objects and first panoramic images are collected, the first partial images and the first panoramic images are combined into a pair, wherein the positions of first rectangular frames corresponding to the first partial images in the first panoramic images are marked in the first panoramic images, namely the positions of commodities in the scenes of a refrigerator and a counter are marked, and the positions of the commodities are used as strong supervision training samples (1000 groups); collecting a plurality of second panoramic images (1000) containing commodity objects, wherein the collected second panoramic images do not need to be marked and are used as self-supervision training samples; randomly selecting a rectangular area A from the second panoramic image, matting out the image in the rectangular area A, performing random homography transformation to generate a diagram B, taking the diagram B as a second partial image, and taking a rectangular frame corresponding to the rectangular area A as a second rectangular frame; scaling the first panoramic image and the second panoramic image to a first preset size 511 x 511, scaling the first partial image and the second partial image to a second preset size 383 x 383, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling of the first panoramic image and the second panoramic image; taking the scaled first partial image and the scaled second partial image as template images, taking the scaled first panoramic image and the scaled second panoramic image as search images, dividing each search image into 65 x 65 grids, generating 9 anchor frames (Anchors) with different aspect ratios taking the center of each grid as the center in each grid, namely 65 x 9 = 38025 anchor frames (Anchors), and obtaining the central point abscissa, the central point ordinate, the anchor frame width, the anchor frame height and the prediction scores (x, y, w, h, score) of each anchor frame; the left diagram in fig. 4 is a cut grid, and the right diagram is an example of 9 anchor boxes for one of the grids, the 9 anchor boxes and the corresponding grid having the same center point. The panorama size of the example is 511×511, and is divided into 65×65 grids, and each grid is 7.9×7.9; the width and height of the 9 Anchors are respectively: 50,50;50,100,100,50;100,100;100,200;200,100;200,200;200,400;400,200; setting up Anchor frame learning targets by using the generated 38025 Anchor Anchor frames and scaled first and second rectangular frames (target frames), and calculating IOU of the target frames and each Anchor frame according to the target frames and the Anchor frames:

taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample; performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; and when training is performed each time, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value.

In a preferred embodiment of the present invention, the calculating learning target values of the positive and negative samples, respectively, includes:

according to the abscissa of the center point of the target frame and the abscissa of the center point of the anchor frame, calculating the predicted value of the abscissa of the model detection frame: x is x _ijk ＝X _r -X _ijk ；

Calculating a predicted value of the ordinate of the detection frame of the model according to the ordinate of the center point of the target frame and the ordinate of the center point of the anchor frame: y is _ijk ＝Y _r -Y _ijk ；

Detecting a frame width predicted value according to the target frame width and the anchor frame width calculation model:

calculating a model detection frame height predicted value according to the target frame height and the anchor frame height:

wherein X is _r The abscissa of the center point of the target frame; y is Y _r The ordinate is the center point of the target frame; w (W) _r The frame width is the target frame width; h _r The target frame is high; x is X _ijk The horizontal coordinate of the center point of the positive sample anchor frame; y is Y _ijk Is the ordinate of the center point of the positive sample anchor frame; w (W) _ijk The anchor frame is wide for positive samples; h _ijk The anchor frame is high for positive samples; p is p _ijk Predicting a score for a positive sample;

taking the predictive score of the negative sample as a learning target value of the negative sample; wherein the negative sample has a predictive score of 0;

in a preferred embodiment of the present invention, the loss function of the image area detection model to be trained is specifically:

I _ij ^obj representing the ith row and j column grid as positive samples; i _ij ^noobj Representing the ith row and j column grid as a negative sample; lambda is a fixed value; beta is a fixed value;the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained; />Detecting a frame width predicted value for the model; />A high predicted value is detected for the model; />Predicting the score for the model detection frame;

specifically, λ is here 2.0; beta is here 0.1; the loss function represents the difference between the predicted value and the learning target value of the image region detection model, and the model learning effect is better when the difference is smaller; and training and optimizing the loss functions on all training samples by using a gradient descent method, obtaining a trained image area detection model after the loss functions are converged (the loss functions are not descended any more), and training 300 times by using the method in the embodiment to reach model convergence.

In a preferred embodiment of the present invention, the generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected includes:

according to the model detection frame abscissa predicted value and the anchor frame central point abscissa corresponding to the highest prediction scoreCalculating the abscissa of the center point of the detection frame:

calculating the ordinate of the central point of the detection frame according to the predicted value of the ordinate of the model detection frame and the ordinate of the central point of the anchor frame corresponding to the highest predicted score:

according to the model detection frame width predicted value and the anchor frame width corresponding to the highest prediction score, calculating the detection frame width:

according to the model detection frame height predicted value and the anchor frame height corresponding to the highest prediction score, calculating the detection frame height:

generating a detection frame according to the position of the detection frame;

wherein,the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained;

detecting a frame width predicted value for the model; />A high predicted value is detected for the model; x is X _ijk The abscissa of the center point of the anchor frame corresponding to the highest predictive score is obtained; y is Y _ijk The ordinate of the center point of the anchor frame corresponding to the highest predictive score; w (W) _ijk The anchor frame width corresponding to the highest predictive score is obtained; h _ijk The anchor frame corresponding to the highest predictive score is high.

Schematically, when the image area detection model to be trained is subjected to iterative training, 60% of samples are trained in a strong supervision training mode, and 40% of samples are trained in a self-supervision training mode.

S3, determining the position of the commodity object to be detected in the market according to the detection frame;

specifically, the position of the commodity object to be detected in the market can be determined according to the detection frame.

Through implementing the embodiment, the image area detection model is subjected to strong supervision and self-supervision training, so that the model learns the related knowledge of the quick-elimination industry in a market scene, a user can be helped to find the specific position of the commodity in the market in a picture grabbed by the market according to the commodity image to be found, and the accuracy of finding the commodity position is improved.

Referring to fig. 2, a schematic structural diagram of a commodity area detecting apparatus according to an embodiment of the present invention includes:

the commodity position determining module is used for determining the position of the commodity object to be detected in the market according to the detection frame;

the method comprises the steps of obtaining a local image to be detected and a panoramic image to be detected, which contain commodity objects, through an image obtaining module, inputting the local image to be detected and the panoramic image to be detected into a preset image area detection model according to a detection frame generating module, so that the image area detection model matches the local image to be detected with the panoramic image to be detected, a detection frame corresponding to the local image to be detected in the panoramic image to be detected is generated, then determining the position of the commodity objects to be detected in a market according to the detection frame through a commodity position determining module, training the model according to the local shooting image containing the commodity and the panoramic image corresponding to the local shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market according to the detection frame obtained by the image area detection model, so that the accuracy of searching the commodity position is improved.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

It will be clear to those skilled in the art that, for convenience and brevity, reference may be made to the corresponding process in the foregoing method embodiment for the specific working process of the apparatus described above, which is not described herein again.

Another embodiment of the present invention also provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement a commodity area detection method as described in the foregoing embodiments. The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.

The memory may be used to store the computer program, and the processor may implement various functions of the terminal device by running or executing the computer program stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid state storage device.

The storage medium is a computer readable storage medium, and the computer program is stored in the computer readable storage medium, and when executed by a processor, the computer program can implement the steps of the above-mentioned method embodiments. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A commodity area detection method, comprising:

2. The commodity area detection method according to claim 1, wherein the image area detection model comprises: the feature extraction layer, the feature association layer and the rectangular frame prediction head;

3. The method of claim 1, wherein the training of the image area detection model comprises:

4. A commodity area detection method according to claim 3, wherein said calculating learning target values for the positive and negative samples, respectively, comprises:

5. A method for detecting a commodity area according to claim 3, wherein the loss function of the image area detection model to be trained is specifically:

6. A commodity area detection method according to claim 3, wherein said generating a detection frame for said partial image to be detected corresponding to said panoramic image to be detected comprises:

7. The commodity area detection method according to claim 1, wherein 60% of the samples are trained in a strong supervised training manner and 40% of the samples are trained in a self-supervised training manner when the image area detection model to be trained is iteratively trained.

8. A commodity area detecting apparatus, comprising:

9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a commodity area detection method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform a commodity area detection method according to any one of claims 1 to 7.