CN117593547A - Commodity area detection method, commodity area detection device, terminal equipment and medium - Google Patents

Commodity area detection method, commodity area detection device, terminal equipment and medium Download PDF

Info

Publication number
CN117593547A
CN117593547A CN202311524234.1A CN202311524234A CN117593547A CN 117593547 A CN117593547 A CN 117593547A CN 202311524234 A CN202311524234 A CN 202311524234A CN 117593547 A CN117593547 A CN 117593547A
Authority
CN
China
Prior art keywords
image
frame
detected
model
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311524234.1A
Other languages
Chinese (zh)
Inventor
周斌
陈应文
徐洪亮
许洁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xuanwu Wireless Technology Co Ltd
Original Assignee
Guangzhou Xuanwu Wireless Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xuanwu Wireless Technology Co Ltd filed Critical Guangzhou Xuanwu Wireless Technology Co Ltd
Priority to CN202311524234.1A priority Critical patent/CN117593547A/en
Publication of CN117593547A publication Critical patent/CN117593547A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a commodity area detection method, a commodity area detection device, terminal equipment and a medium, wherein the commodity area detection method comprises the following steps: acquiring a local image to be detected and a panoramic image to be detected, which contain commodity objects, inputting the local image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the local image to be detected with the panoramic image to be detected, a detection frame corresponding to the local image to be detected in the panoramic image to be detected is generated, and the position of the commodity objects to be detected in a market is determined according to the detection frame; training the model aiming at the partial shooting image containing the commodity and the panoramic image corresponding to the partial shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market through the method and the device, thereby improving the accuracy of searching the commodity position.

Description

Commodity area detection method, commodity area detection device, terminal equipment and medium
Technical Field
The present invention relates to the field of computer vision and target detection technologies, and in particular, to a method, an apparatus, a terminal device, and a medium for detecting a commodity area.
Background
Currently, the detection and positioning can be realized only by performing target training on the category which is detected in advance on the basis of the category which is detected in advance and needs to be fixed in advance by a deep learning target detection algorithm; in the quick-elimination industry, a method for searching and positioning a local shot picture in a picture image captured by a market camera is lacking, the local shot picture is searched and positioned in the picture image captured by the market camera by using the existing method, and the specific area position of the local shot picture containing the commodity in the panoramic picture image captured by the market camera cannot be obtained, so that a user cannot accurately find the specific position of the commodity in the market.
Disclosure of Invention
The embodiment of the invention provides a commodity area detection method, a commodity area detection device, terminal equipment and a medium, which can improve the accuracy of commodity position searching.
An embodiment of the present invention provides a method, an apparatus, a terminal device, and a medium for detecting a commodity area, including:
acquiring a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises commodity objects;
inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;
and determining the position of the commodity object to be detected in the mall according to the detection frame.
Further, the image area detection model includes: the feature extraction layer, the feature association layer and the rectangular frame prediction head;
inputting the partial image to be detected and the detection panoramic image into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected, including:
inputting the local image to be detected and the panoramic image to be detected into the feature extraction layer so that the feature extraction layer generates a first local feature image and a first panoramic feature image, and transmitting the first local feature image and the first panoramic feature image to a feature correlation layer;
the feature correlation layer carries out convolution operation on the first local feature map and the first panoramic feature map, generates a first similarity feature map of the first local feature map and the first panoramic feature map, and transmits the first similarity feature map to the rectangular frame detection head;
and the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected.
Further, the training of the image region detection model includes:
acquiring a plurality of first partial images containing commodity objects and a first panoramic image; wherein, a first rectangular frame corresponding to the first partial image in the first panoramic image is marked in the first panoramic image;
acquiring a plurality of second panoramic images containing commodity objects, randomly selecting a rectangular area in the second panoramic images, taking the images in the rectangular area as second partial images, and taking a rectangular frame corresponding to the rectangular area as a second rectangular frame;
scaling the first panoramic image and the second panoramic image to a first preset size, scaling the first partial image and the second partial image to a second preset size, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling ratio of the first panoramic image and the second panoramic image;
taking the scaled first panoramic image and second panoramic image as search graphs, dividing each search graph into a plurality of grids, and generating a plurality of anchor frames with different aspect ratios by taking the center of each grid as the center in each grid to obtain the center point abscissa, the center point ordinate, the anchor frame width, the anchor frame height and the prediction score of each anchor frame;
according to the scaled first rectangular frame and the scaled second rectangular frame serving as target frames, calculating IOUs of the target frames and each anchor frame according to the target frames and the anchor frames:
wherein A represents a target frame; b represents an anchor frame; a and B represent the intersection area of the target frame and the anchor frame; a U B represents the union area of the target frame and the anchor frame;
taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample;
performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; and when training is performed each time, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value.
Further, the calculating learning target values of the positive sample and the negative sample respectively includes:
calculating a model detection frame abscissa predicted value according to the target frame center point abscissa and the anchor frame center point abscissa;
calculating a predicted value of the ordinate of the model detection frame according to the ordinate of the center point of the target frame and the ordinate of the center point of the anchor frame;
calculating a model detection frame width predicted value according to the target frame width and the anchor frame width;
calculating a model detection frame height predicted value according to the target frame height and the anchor frame height;
taking the model detection frame abscissa predicted value, the model detection frame ordinate predicted value, the model detection frame width predicted value, the model detection frame height predicted value and the prediction score of the positive sample as the learning target value of the positive sample; wherein the predictive score of the positive sample is 1;
taking the predictive score of the negative sample as a learning target value of the negative sample; wherein the negative sample has a predictive score of 0.
Further, the loss function of the image area detection model to be trained is specifically:
I ij obj representing the ith row and j column grid as positive samples; i ij noobj Representing the ith row and j column grid as a negative sample; lambda is a fixed value; beta is a fixed value;the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained; />Detecting a frame width predicted value for the model; />A high predicted value is detected for the model; />Predictive scores for the model test frames.
Further, the generating a detection frame corresponding to the local image to be detected in the panoramic image to be detected includes:
calculating the central point abscissa of the detection frame according to the model detection frame abscissa predicted value and the anchor frame central point abscissa corresponding to the highest predicted score;
calculating the ordinate of the central point of the detection frame according to the predicted value of the ordinate of the model detection frame and the ordinate of the central point of the anchor frame corresponding to the highest predicted score;
calculating the detection frame width according to the model detection frame width predicted value and the anchor frame width corresponding to the highest prediction score;
calculating the height of the detection frame according to the predicted value of the height of the model detection frame and the height of the anchor frame corresponding to the highest predicted score;
determining the position of the detection frame according to the abscissa of the central point of the detection frame, the ordinate of the central point of the detection frame, the width of the detection frame and the height of the detection frame;
and generating a detection frame according to the position of the detection frame.
Further, when the image area detection model to be trained is subjected to iterative training, 60% of samples are trained in a strong supervision training mode, and 40% of samples are trained in a self-supervision training mode.
As an improvement of the above-mentioned scheme, another embodiment of the present invention correspondingly provides a commodity area detecting apparatus, including:
the image acquisition module is used for acquiring a to-be-detected local image and a to-be-detected panoramic image which contain commodity objects;
the detection frame generation module is used for inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;
and the commodity position determining module is used for determining the position of the commodity object to be detected in the market according to the detection frame.
Another embodiment of the present invention provides a terminal device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement a commodity area detection method as described in the above embodiment.
Another embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute a method for detecting a commodity area according to the foregoing embodiment.
The invention has the following beneficial effects:
the invention provides a commodity area detection method, a commodity area detection device, terminal equipment and a medium, wherein the method comprises the steps of obtaining a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises commodity objects; inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected; and determining the position of the commodity object to be detected in the mall according to the detection frame. Training the model aiming at the partial shooting image containing the commodity and the panoramic image corresponding to the partial shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market according to the detection frame obtained by the image area detection model, thereby improving the accuracy of searching the commodity position.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting a commodity area according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a commodity area detecting apparatus according to an embodiment of the present invention;
FIG. 3 is a diagram of an image area detection model of a commodity area detection method according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating generation of a detection frame of a commodity area detection method according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of a method for detecting a commodity area according to an embodiment of the present invention includes:
s1, acquiring a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises a commodity object;
in a preferred embodiment of the present invention, a commodity image including a commodity object is taken as a partial image to be detected, a container or a refrigerator panoramic image including the commodity object is taken as a panoramic image to be detected, the partial image to be detected is scaled to 383 x 383 size, and the panoramic image to be detected is scaled to 511 x 511 size.
S2, inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;
in a preferred embodiment of the present invention, the image region detection model includes: the feature extraction layer, the feature association layer and the rectangular frame prediction head;
inputting the partial image to be detected and the detection panoramic image into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected, including:
inputting the local image to be detected and the panoramic image to be detected into the feature extraction layer so that the feature extraction layer generates a first local feature image and a first panoramic feature image, and transmitting the first local feature image and the first panoramic feature image to a feature correlation layer;
the feature correlation layer carries out convolution operation on the first local feature map and the first panoramic feature map, generates a first similarity feature map of the first local feature map and the first panoramic feature map, and transmits the first similarity feature map to the rectangular frame detection head;
the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;
specifically, the image region detection model mainly includes three parts: a feature extraction layer, a feature association layer and a rectangular frame pre-measurement head, as shown in fig. 3; the feature extraction layer is a feature extraction layer based on a twin network, the twin network uses a residual network Resnet50 as a basic structure of the twin network, because the local image to be detected and the model structure used by the panoramic image to be detected are identical and the weights are shared, the layer of network structure mainly extracts feature expressions of the same projection space of the reduced local image to be detected and the panoramic image to be detected, which are input in the step S1 and contain commodity objects, wherein the input size of the local image to be detected is 383 x 3, the input size of the panoramic image to be detected is 511 x 3, the local image to be detected and the panoramic image to be detected obtain a corresponding local feature map and a panoramic feature map after passing through the Resnet50 structure, the sizes of the local feature map and the panoramic feature map are 47 x 1024 and 63 x 1024 respectively, and then the feature channels of the local feature map and the panoramic feature map are compressed to 256 through a convolution layer, and the first local feature map and the first feature map are obtained and have the sizes of 47 x 63 x 1024 and 63 x 256 respectively;
specifically, the feature correlation layer performs convolution operation on the first local feature map and the first panoramic feature map, and mainly performs convolution operation on the first panoramic feature map by taking the first local feature map as a convolution check, so as to obtain a first similarity feature map of the first local feature map and the first panoramic feature map, wherein the first similarity feature map has a similarity information relationship between the first local feature map and the first panoramic feature map, and can provide feature expression information for subsequent rectangular coordinate prediction; mainly, after performing two-layer convolution operation and linear interpolation operation on the first local feature map and the second local feature map, respectively obtaining feature maps with sizes of 180×180×256 and 244×244×256, and then performing convolution operation on the obtained local feature map with sizes of 180×180×256 as a panoramic feature map with convolution check sizes of 244×244×256 to obtain a first similarity feature map with sizes of 65×65×256;
specifically, the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;
in a preferred embodiment of the present invention, the training of the image region detection model includes:
acquiring a plurality of first partial images containing commodity objects and a first panoramic image; wherein, a first rectangular frame corresponding to the first partial image in the first panoramic image is marked in the first panoramic image;
acquiring a plurality of second panoramic images containing commodity objects, randomly selecting a rectangular area in the second panoramic images, taking the images in the rectangular area as second partial images, and taking a rectangular frame corresponding to the rectangular area as a second rectangular frame;
scaling the first panoramic image and the second panoramic image to a first preset size, scaling the first partial image and the second partial image to a second preset size, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling ratio of the first panoramic image and the second panoramic image;
taking the scaled first panoramic image and second panoramic image as search graphs, dividing each search graph into a plurality of grids, and generating a plurality of anchor frames with different aspect ratios by taking the center of each grid as the center in each grid to obtain the center point abscissa, the center point ordinate, the anchor frame width, the anchor frame height and the prediction score of each anchor frame;
according to the scaled first rectangular frame and the scaled second rectangular frame serving as target frames, calculating IOUs of the target frames and each anchor frame according to the target frames and the anchor frames:
wherein A represents a target frame; b represents an anchor frame; a and B represent the intersection area of the target frame and the anchor frame; a U B represents the union area of the target frame and the anchor frame;
taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample;
performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; during each training, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value;
specifically, a plurality of first partial images containing commodity objects and first panoramic images are collected, the first partial images and the first panoramic images are combined into a pair, wherein the positions of first rectangular frames corresponding to the first partial images in the first panoramic images are marked in the first panoramic images, namely the positions of commodities in the scenes of a refrigerator and a counter are marked, and the positions of the commodities are used as strong supervision training samples (1000 groups); collecting a plurality of second panoramic images (1000) containing commodity objects, wherein the collected second panoramic images do not need to be marked and are used as self-supervision training samples; randomly selecting a rectangular area A from the second panoramic image, matting out the image in the rectangular area A, performing random homography transformation to generate a diagram B, taking the diagram B as a second partial image, and taking a rectangular frame corresponding to the rectangular area A as a second rectangular frame; scaling the first panoramic image and the second panoramic image to a first preset size 511 x 511, scaling the first partial image and the second partial image to a second preset size 383 x 383, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling of the first panoramic image and the second panoramic image; taking the scaled first partial image and the scaled second partial image as template images, taking the scaled first panoramic image and the scaled second panoramic image as search images, dividing each search image into 65 x 65 grids, generating 9 anchor frames (Anchors) with different aspect ratios taking the center of each grid as the center in each grid, namely 65 x 9 = 38025 anchor frames (Anchors), and obtaining the central point abscissa, the central point ordinate, the anchor frame width, the anchor frame height and the prediction scores (x, y, w, h, score) of each anchor frame; the left diagram in fig. 4 is a cut grid, and the right diagram is an example of 9 anchor boxes for one of the grids, the 9 anchor boxes and the corresponding grid having the same center point. The panorama size of the example is 511×511, and is divided into 65×65 grids, and each grid is 7.9×7.9; the width and height of the 9 Anchors are respectively: 50,50;50,100,100,50;100,100;100,200;200,100;200,200;200,400;400,200; setting up Anchor frame learning targets by using the generated 38025 Anchor Anchor frames and scaled first and second rectangular frames (target frames), and calculating IOU of the target frames and each Anchor frame according to the target frames and the Anchor frames:
wherein A represents a target frame; b represents an anchor frame; a and B represent the intersection area of the target frame and the anchor frame; a U B represents the union area of the target frame and the anchor frame;
taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample; performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; and when training is performed each time, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value.
In a preferred embodiment of the present invention, the calculating learning target values of the positive and negative samples, respectively, includes:
according to the abscissa of the center point of the target frame and the abscissa of the center point of the anchor frame, calculating the predicted value of the abscissa of the model detection frame: x is x ijk =X r -X ijk
Calculating a predicted value of the ordinate of the detection frame of the model according to the ordinate of the center point of the target frame and the ordinate of the center point of the anchor frame: y is ijk =Y r -Y ijk
Detecting a frame width predicted value according to the target frame width and the anchor frame width calculation model:
calculating a model detection frame height predicted value according to the target frame height and the anchor frame height:
wherein X is r The abscissa of the center point of the target frame; y is Y r The ordinate is the center point of the target frame; w (W) r The frame width is the target frame width; h r The target frame is high; x is X ijk The horizontal coordinate of the center point of the positive sample anchor frame; y is Y ijk Is the ordinate of the center point of the positive sample anchor frame; w (W) ijk The anchor frame is wide for positive samples; h ijk The anchor frame is high for positive samples; p is p ijk Predicting a score for a positive sample;
taking the model detection frame abscissa predicted value, the model detection frame ordinate predicted value, the model detection frame width predicted value, the model detection frame height predicted value and the prediction score of the positive sample as the learning target value of the positive sample; wherein the predictive score of the positive sample is 1;
taking the predictive score of the negative sample as a learning target value of the negative sample; wherein the negative sample has a predictive score of 0;
in a preferred embodiment of the present invention, the loss function of the image area detection model to be trained is specifically:
I ij obj representing the ith row and j column grid as positive samples; i ij noobj Representing the ith row and j column grid as a negative sample; lambda is a fixed value; beta is a fixed value;the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained; />Detecting a frame width predicted value for the model; />A high predicted value is detected for the model; />Predicting the score for the model detection frame;
specifically, λ is here 2.0; beta is here 0.1; the loss function represents the difference between the predicted value and the learning target value of the image region detection model, and the model learning effect is better when the difference is smaller; and training and optimizing the loss functions on all training samples by using a gradient descent method, obtaining a trained image area detection model after the loss functions are converged (the loss functions are not descended any more), and training 300 times by using the method in the embodiment to reach model convergence.
In a preferred embodiment of the present invention, the generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected includes:
according to the model detection frame abscissa predicted value and the anchor frame central point abscissa corresponding to the highest prediction scoreCalculating the abscissa of the center point of the detection frame:
calculating the ordinate of the central point of the detection frame according to the predicted value of the ordinate of the model detection frame and the ordinate of the central point of the anchor frame corresponding to the highest predicted score:
according to the model detection frame width predicted value and the anchor frame width corresponding to the highest prediction score, calculating the detection frame width:
according to the model detection frame height predicted value and the anchor frame height corresponding to the highest prediction score, calculating the detection frame height:
determining the position of the detection frame according to the abscissa of the central point of the detection frame, the ordinate of the central point of the detection frame, the width of the detection frame and the height of the detection frame;
generating a detection frame according to the position of the detection frame;
wherein,the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained;
detecting a frame width predicted value for the model; />A high predicted value is detected for the model; x is X ijk The abscissa of the center point of the anchor frame corresponding to the highest predictive score is obtained; y is Y ijk The ordinate of the center point of the anchor frame corresponding to the highest predictive score; w (W) ijk The anchor frame width corresponding to the highest predictive score is obtained; h ijk The anchor frame corresponding to the highest predictive score is high.
Schematically, when the image area detection model to be trained is subjected to iterative training, 60% of samples are trained in a strong supervision training mode, and 40% of samples are trained in a self-supervision training mode.
S3, determining the position of the commodity object to be detected in the market according to the detection frame;
specifically, the position of the commodity object to be detected in the market can be determined according to the detection frame.
Through implementing the embodiment, the image area detection model is subjected to strong supervision and self-supervision training, so that the model learns the related knowledge of the quick-elimination industry in a market scene, a user can be helped to find the specific position of the commodity in the market in a picture grabbed by the market according to the commodity image to be found, and the accuracy of finding the commodity position is improved.
Referring to fig. 2, a schematic structural diagram of a commodity area detecting apparatus according to an embodiment of the present invention includes:
the image acquisition module is used for acquiring a to-be-detected local image and a to-be-detected panoramic image which contain commodity objects;
the detection frame generation module is used for inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;
the commodity position determining module is used for determining the position of the commodity object to be detected in the market according to the detection frame;
the method comprises the steps of obtaining a local image to be detected and a panoramic image to be detected, which contain commodity objects, through an image obtaining module, inputting the local image to be detected and the panoramic image to be detected into a preset image area detection model according to a detection frame generating module, so that the image area detection model matches the local image to be detected with the panoramic image to be detected, a detection frame corresponding to the local image to be detected in the panoramic image to be detected is generated, then determining the position of the commodity objects to be detected in a market according to the detection frame through a commodity position determining module, training the model according to the local shooting image containing the commodity and the panoramic image corresponding to the local shooting image to obtain an image area detection model, and accurately finding the specific position of the commodity in the market according to the detection frame obtained by the image area detection model, so that the accuracy of searching the commodity position is improved.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
It will be clear to those skilled in the art that, for convenience and brevity, reference may be made to the corresponding process in the foregoing method embodiment for the specific working process of the apparatus described above, which is not described herein again.
Another embodiment of the present invention also provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement a commodity area detection method as described in the foregoing embodiments. The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.
The memory may be used to store the computer program, and the processor may implement various functions of the terminal device by running or executing the computer program stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid state storage device.
Another embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute a method for detecting a commodity area according to the foregoing embodiment.
The storage medium is a computer readable storage medium, and the computer program is stored in the computer readable storage medium, and when executed by a processor, the computer program can implement the steps of the above-mentioned method embodiments. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (10)

1. A commodity area detection method, comprising:
acquiring a local image to be detected and a panoramic image to be detected, wherein the local image to be detected comprises commodity objects;
inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected;
and determining the position of the commodity object to be detected in the mall according to the detection frame.
2. The commodity area detection method according to claim 1, wherein the image area detection model comprises: the feature extraction layer, the feature association layer and the rectangular frame prediction head;
inputting the partial image to be detected and the detection panoramic image into a preset image area detection model, so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and generating a detection frame corresponding to the partial image to be detected in the panoramic image to be detected, including:
inputting the local image to be detected and the panoramic image to be detected into the feature extraction layer so that the feature extraction layer generates a first local feature image and a first panoramic feature image, and transmitting the first local feature image and the first panoramic feature image to a feature correlation layer;
the feature correlation layer carries out convolution operation on the first local feature map and the first panoramic feature map, generates a first similarity feature map of the first local feature map and the first panoramic feature map, and transmits the first similarity feature map to the rectangular frame detection head;
and the rectangular frame detection head carries out convolution operation on the first similarity feature map to generate a detection frame corresponding to the partial image to be detected in the panoramic image to be detected.
3. The method of claim 1, wherein the training of the image area detection model comprises:
acquiring a plurality of first partial images containing commodity objects and a first panoramic image; wherein, a first rectangular frame corresponding to the first partial image in the first panoramic image is marked in the first panoramic image;
acquiring a plurality of second panoramic images containing commodity objects, randomly selecting a rectangular area in the second panoramic images, taking the images in the rectangular area as second partial images, and taking a rectangular frame corresponding to the rectangular area as a second rectangular frame;
scaling the first panoramic image and the second panoramic image to a first preset size, scaling the first partial image and the second partial image to a second preset size, and correspondingly scaling the first rectangular frame and the second rectangular frame according to the scaling ratio of the first panoramic image and the second panoramic image;
taking the scaled first panoramic image and second panoramic image as search graphs, dividing each search graph into a plurality of grids, and generating a plurality of anchor frames with different aspect ratios by taking the center of each grid as the center in each grid to obtain the center point abscissa, the center point ordinate, the anchor frame width, the anchor frame height and the prediction score of each anchor frame;
according to the scaled first rectangular frame and the scaled second rectangular frame serving as target frames, calculating IOUs of the target frames and each anchor frame according to the target frames and the anchor frames:
wherein A represents a target frame; b represents an anchor frame; a and B represent the intersection area of the target frame and the anchor frame; a U B represents the union area of the target frame and the anchor frame;
taking an anchor frame with the IOU less than or equal to 0.3< as a positive sample, taking the anchor frame with the IOU less than or equal to 0 < 0.3 as a negative sample, and respectively calculating learning target values of the positive sample and the negative sample;
performing iterative training on the image area detection model to be trained according to the positive sample and the negative sample, and obtaining the image area detection model after training is completed; and when training is performed each time, calculating a loss function value of the image area detection model to be trained according to a learning target value corresponding to the sample and a predicted value of the image area detection model to be trained, and adjusting model parameters of the image area detection model to be trained according to the loss function value.
4. A commodity area detection method according to claim 3, wherein said calculating learning target values for the positive and negative samples, respectively, comprises:
calculating a model detection frame abscissa predicted value according to the target frame center point abscissa and the anchor frame center point abscissa;
calculating a predicted value of the ordinate of the model detection frame according to the ordinate of the center point of the target frame and the ordinate of the center point of the anchor frame;
calculating a model detection frame width predicted value according to the target frame width and the anchor frame width;
calculating a model detection frame height predicted value according to the target frame height and the anchor frame height;
taking the model detection frame abscissa predicted value, the model detection frame ordinate predicted value, the model detection frame width predicted value, the model detection frame height predicted value and the prediction score of the positive sample as the learning target value of the positive sample; wherein the predictive score of the positive sample is 1;
taking the predictive score of the negative sample as a learning target value of the negative sample; wherein the negative sample has a predictive score of 0.
5. A method for detecting a commodity area according to claim 3, wherein the loss function of the image area detection model to be trained is specifically:
I ij obj representing the ith row and j column grid as positive samples; i ij noobj Representing the ith row and j column grid as a negative sample; lambda is a fixed value; beta is a fixed value;the method comprises the steps of detecting a frame abscissa predicted value for a model; />The vertical coordinate predicted value of the model detection frame is obtained; />Detecting a frame width predicted value for the model; />A high predicted value is detected for the model; />Predictive scores for the model test frames.
6. A commodity area detection method according to claim 3, wherein said generating a detection frame for said partial image to be detected corresponding to said panoramic image to be detected comprises:
calculating the central point abscissa of the detection frame according to the model detection frame abscissa predicted value and the anchor frame central point abscissa corresponding to the highest predicted score;
calculating the ordinate of the central point of the detection frame according to the predicted value of the ordinate of the model detection frame and the ordinate of the central point of the anchor frame corresponding to the highest predicted score;
calculating the detection frame width according to the model detection frame width predicted value and the anchor frame width corresponding to the highest prediction score;
calculating the height of the detection frame according to the predicted value of the height of the model detection frame and the height of the anchor frame corresponding to the highest predicted score;
determining the position of the detection frame according to the abscissa of the central point of the detection frame, the ordinate of the central point of the detection frame, the width of the detection frame and the height of the detection frame;
and generating a detection frame according to the position of the detection frame.
7. The commodity area detection method according to claim 1, wherein 60% of the samples are trained in a strong supervised training manner and 40% of the samples are trained in a self-supervised training manner when the image area detection model to be trained is iteratively trained.
8. A commodity area detecting apparatus, comprising:
the image acquisition module is used for acquiring a to-be-detected local image and a to-be-detected panoramic image which contain commodity objects;
the detection frame generation module is used for inputting the partial image to be detected and the panoramic image to be detected into a preset image area detection model so that the image area detection model matches the partial image to be detected with the panoramic image to be detected, and a detection frame corresponding to the partial image to be detected in the panoramic image to be detected is generated;
and the commodity position determining module is used for determining the position of the commodity object to be detected in the market according to the detection frame.
9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a commodity area detection method according to any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform a commodity area detection method according to any one of claims 1 to 7.
CN202311524234.1A 2023-11-15 2023-11-15 Commodity area detection method, commodity area detection device, terminal equipment and medium Pending CN117593547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311524234.1A CN117593547A (en) 2023-11-15 2023-11-15 Commodity area detection method, commodity area detection device, terminal equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311524234.1A CN117593547A (en) 2023-11-15 2023-11-15 Commodity area detection method, commodity area detection device, terminal equipment and medium

Publications (1)

Publication Number Publication Date
CN117593547A true CN117593547A (en) 2024-02-23

Family

ID=89915912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311524234.1A Pending CN117593547A (en) 2023-11-15 2023-11-15 Commodity area detection method, commodity area detection device, terminal equipment and medium

Country Status (1)

Country Link
CN (1) CN117593547A (en)

Similar Documents

Publication Publication Date Title
US10936911B2 (en) Logo detection
CN107481279B (en) Monocular video depth map calculation method
CN109934847B (en) Method and device for estimating posture of weak texture three-dimensional object
CN110717366A (en) Text information identification method, device, equipment and storage medium
EP3093822B1 (en) Displaying a target object imaged in a moving picture
CN110147750B (en) Image searching method and system based on motion acceleration and electronic equipment
CN109977824B (en) Article taking and placing identification method, device and equipment
EP3012781A1 (en) Method and apparatus for extracting feature correspondences from multiple images
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN110544202A (en) parallax image splicing method and system based on template matching and feature clustering
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN110956131B (en) Single-target tracking method, device and system
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
CN112270748B (en) Three-dimensional reconstruction method and device based on image
CN115439700B (en) Image processing method and device and machine-readable storage medium
CN116310899A (en) YOLOv 5-based improved target detection method and device and training method
CN117593547A (en) Commodity area detection method, commodity area detection device, terminal equipment and medium
CN111967403B (en) Video movement area determining method and device and electronic equipment
CN114677578A (en) Method and device for determining training sample data
CN111767757B (en) Identity information determining method and device
CN113096104A (en) Training method and device of target segmentation model and target segmentation method and device
Peng et al. Deep-Learning-Based Precision Visual Tracking
CN113420844B (en) Object defect detection method and device, electronic equipment and storage medium
US11941820B1 (en) Method for tracking an object in a low frame-rate video and object tracking device using the same
CN116612474B (en) Object detection method, device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination