Background
In the retail product industry, monitoring the display and placement of Stock Keeping Units (SKUs) of offline sales channels and displaying of offline popularization related materials is always an important link in sales operation.
The traditional sales channel monitoring means is to confirm whether the promotion material display is qualified or not by dispatching a service representative to manually check the goods. In recent years, with the development of image recognition technology, a technical means mainly for image detection has appeared to assist in the inventory of articles or the detection of materials.
Tally is carried out through the mode of manual checking, wastes time and energy, and is difficult to ensure the accuracy and prevent cheating, and also has certain subjectivity to the judgement of material compliance. Although the technical means taking image detection as a core to assist the inventory of the articles can greatly improve the working efficiency, a space for improving and improving the efficiency exists, for example, the current product detection based on the image usually only focuses on the detected articles, but lacks the analysis of the whole scene, so that the false detection or the missing detection of the identification result is easy to occur.
Disclosure of Invention
In view of the above, an object of the present application is to provide an article identification method, a model training method, an apparatus and an electronic device, so as to solve the problem that the conventional image detection method is prone to false detection and missing detection.
The embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides an article identification method, including: acquiring an article image to be identified; predicting the category and the instance of each pixel point in the article image by using a pre-trained panoramic segmentation model, and giving the category id and the instance id of each pixel point; and determining the category and the number of the articles in the article image based on the category id and the instance id of each pixel point. In the embodiment of the application, the image is subjected to panoramic analysis in a panoramic segmentation mode, the category id and the instance id of each pixel point are given, missing detection and false detection are avoided (for example, a single pixel point is prevented from being divided into a plurality of different instances), not only the countable instances and the category of the articles in the image can be detected, but also more complete information can be obtained, and the classification of the articles can be given if the background is part of the images.
With reference to a possible implementation manner of the embodiment of the first aspect, determining the category and the number of the items in the item image based on the category id and the instance id to which each pixel point belongs includes: eliminating pixel points with the same type id as the type id of the background; and determining the category and the number of the articles in the article image based on the category id and the instance id of each remaining pixel point. In the embodiment of the application, when the category and the number of the articles in the article image are determined, the pixel points with the category id being the same as the category id of the background are removed, then the category and the number of the articles in the article image are determined based on the category id and the instance id of each remaining pixel point, and the category and the number of the articles in the article image can be determined more quickly and accurately.
With reference to a possible implementation manner of the embodiment of the first aspect, determining the category and the number of the items in the item image based on the category id and the instance id to which each remaining pixel point belongs includes: classifying the pixel points with the same category id in the rest pixel points into one category to obtain the category of the article in the article image; and classifying the pixel points with the same instance id in the pixel points with the same type id into one class to obtain the number of the articles under the type id. In the embodiment of the application, the pixel points with the same category id are classified into one type, and then the pixel points with the same category id in the pixel points with the same category id are classified into one type, so that the number of the articles under the category id can be obtained, and finally the number of the articles in different scenes (different article categories) can be output instead of the number of the articles in the whole picture.
With reference to one possible implementation manner of the embodiment of the first aspect, acquiring an image of an item to be identified includes: an item image of an item placed on a shelf to be identified is acquired. In the embodiment of the application, the article image of the article to be identified placed on the shelf is acquired, and the automatic detection of the article on the shelf is realized, so that the problem of manual checking is solved.
With reference to a possible implementation manner of the embodiment of the first aspect, before predicting the category and the instance to which each pixel point in the item image belongs by using a pre-trained panorama segmentation model, the method further includes: acquiring a plurality of scene images related to the application scene of the article image; respectively labeling the background in each scene image and the category and example to which the article belongs to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of different examples in the same category are different; and training a panoramic segmentation model by using the training image to obtain the trained panoramic segmentation model. In the embodiment of the application, a plurality of scene images related to the application scene of the article image to be recognized are obtained, then the article and the background in the required image are labeled, and the labeled data is used for training the panoramic segmentation model, so that the accuracy of the trained panoramic segmentation model is higher, and the result of object recognition is more reliable.
In a second aspect, an embodiment of the present application further provides a model training method, including: acquiring a plurality of scene images related to a retail scene; respectively labeling the background in each scene image and the category and example to which the article belongs to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of different examples in the same category are different; and training an initial panorama segmentation model by using the training image to obtain a trained panorama segmentation model.
In a third aspect, an embodiment of the present application further provides an article identification device, including: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring an article image to be identified; the processing module is used for predicting the category and the example of each pixel point in the article image by using a pre-trained panoramic segmentation model and giving the category id and the example id of each pixel point; and the method is also used for determining the category and the number of the articles in the article image based on the category id and the instance id to which each pixel point belongs.
In a fourth aspect, an embodiment of the present application further provides a model training apparatus, including: the system comprises an acquisition module, a marking module and a training module; the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of scene images related to retail scenes; the labeling module is used for labeling the background in each scene image and the category and the example to which the article belongs to respectively to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of the articles in the same category and different examples are different; and the training module is used for training the initial panorama segmentation model by using the training image to obtain a trained panorama segmentation model.
In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a memory and a processor, the processor coupled to the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform the method according to the first aspect and/or any possible implementation manner of the first aspect, or to perform the method according to the second aspect.
In a sixth aspect, embodiments of the present application further provide a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method provided in the foregoing first aspect and/or any one of the possible implementation manners of the first aspect, or to perform the method provided in the foregoing second aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In view of the fact that the existing image detection method only outputs an isolated product detection result but lacks analysis of the whole scene, and further easily causes the situations of missing detection and false detection, the embodiment of the application provides an article identification method adopting panoramic segmentation. Compared with the prior image detection technology, the article identification method provided by the embodiment of the application can output the article number in different scenes (different article types) in the same picture, but not only output the article number of the whole picture; secondly, for the publicity materials which usually appear in the form of posters, stickers and the like with large areas, the existing forms determine that the publicity materials are difficult to detect through a detection technology, but the publicity materials are more suitable for giving the categories of the areas to which the publicity materials belong through panoramic segmentation, finally, the method can carry out pixel-by-pixel category prediction and case-to-instance prediction on the input images, and the information can be used for picture correction, picture splicing and the like.
The following describes a training process of a panorama segmentation model according to the present application, and the following describes a model training method provided in an embodiment of the present application with reference to fig. 1.
Step S101: a plurality of scene images associated with a retail scene are acquired.
In model training, it is necessary to acquire image data required for training, for example, to acquire a plurality of scene images related to a retail scene. For example, for the identification of an article in a retail scene, the required image data may be an image of the article including the article placed on a shelf, that is, the image of the article placed on the shelf is photographed, so that the required image data can be obtained.
Step S102: and respectively labeling the background in each scene image and the category and the example to which the article belongs to obtain a training image.
After a plurality of scene images related to the retail scene are obtained, labeling is respectively carried out on the background in each scene image and the category and the example to which the article belongs, and a training image is obtained. When in labeling, the articles in the same category are labeled the same, the articles in different categories are labeled differently, and the labeling types of different instances in the same category are different.
As an optional implementation manner, before the labeling, the image may be further preprocessed, for example, brightness adjustment, denoising (deblurring), rotation, and the like are performed, and then the background in each of the preprocessed scene images and the category and the example to which the object belongs are labeled to obtain a training image.
Step S103: and training an initial panorama segmentation model by using the training image to obtain a trained panorama segmentation model.
And after the training image is obtained, training the initial panorama segmentation model by using the training image to obtain the trained panorama segmentation model. The initial panorama segmentation model may be any currently used panorama segmentation model, such as a panorama Feature Pyramid network (panorama FPN). The specific training process is well known to those skilled in the art and will not be described here.
After the panoramic segmentation model is obtained through training, the pre-trained panoramic segmentation model can be applied to identify the article in the article image to be identified, and the article identification method provided by the embodiment of the present application will be described below with reference to fig. 2.
Step S201: an image of an item to be identified is acquired.
An item image to be identified is acquired, for example, an item image of an item placed on a shelf to be identified is acquired. In an alternative implementation mode, a user takes a picture of a scene to be identified and uploads the picture, and a server acquires an image of an object to be identified uploaded by the user.
Step S202: and predicting the category and the instance of each pixel point in the article image by using a pre-trained panoramic segmentation model, and giving the category id and the instance id of each pixel point.
After an article image to be identified is obtained, predicting the category and the instance of each pixel point in the article image by using a pre-trained panoramic segmentation model, and giving the category id and the instance id of each pixel point. The pre-trained panorama segmentation model may be a panorama segmentation model obtained by training using the above-mentioned model training method, that is, before predicting the category and the instance to which each pixel point in the article image belongs using the pre-trained panorama segmentation model, the method further includes: acquiring a plurality of scene images related to the application scene of the article image; respectively labeling the background in each scene image and the category and example to which the article belongs to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of different examples in the same category are different; and training a panoramic segmentation model by using the training image to obtain the trained panoramic segmentation model.
And giving an image to be identified, predicting each pixel point in the image by the panoramic segmentation model, and giving the id of the category to which the pixel point belongs and the id of the instance to which the pixel point belongs. The method is equivalent to converting the problem of identifying the objects in a scene into a panoramic segmentation task instead of the conventional detection task. From the principle of the model, the panoramic segmentation model can be considered to be composed of an instance segmentation result prediction branch, a semantic segmentation result prediction branch and a result fusion branch. Specifically, in the context of item identification, the example segmentation branch is used to predict the segmentation result of the item, the semantic segmentation branch is used to predict the foreground (item) and background (non-item) and the specific background category, and the final output result is obtained through the result fusion part. In terms of output form, each pixel point in the panoramic segmentation task corresponds to an instance (single article) id, the category corresponding to the pixel point is the category to which the instance belongs, and in the current common detection task, the single pixel point can belong to a plurality of different instances, so that the situation that one object is detected for many times can be avoided by adopting the panoramic segmentation. It should be mentioned that the part regarded as background in the current common detection algorithm is also classified in the panorama segmentation task, such as shelf range, floor, wall, poster, etc., and this part of information is also part of the sales channel information and is usually concerned by the customer; secondly, since the background part is also segmented, the object which is divided into the foreground but not classified can be regarded as missed detection. According to actual needs, the obtained pixel-by-pixel prediction results can be processed, the types of the appearing articles and the quantity of the articles in each type can be easily obtained through the type labels and different instance ids, and scene information except the articles, the judgment of potential missing detection conditions and the like can be output.
Step S203: and determining the category and the number of the articles in the article image based on the category id and the instance id of each pixel point.
After the category id and the instance id of each pixel point are obtained, the category and the number of the articles in the article image can be determined based on the category id and the instance id of each pixel point. When determining the number and the type of the articles in the article image, in one embodiment, the process may be to remove pixel points whose type id is the same as the type id to which the background belongs (the id may be an id designated when labeling the background); and determining the category and the number of the articles in the article image based on the category id and the instance id of each remaining pixel point. In this embodiment, when determining the number and the category of the articles in the article image, the background portion in the article image needs to be removed, so that the pixel points having the same category id as the category id of the background are removed, and then the category and the number of the articles in the article image are determined based on the category id and the instance id of each remaining pixel point.
The process of determining the category and the number of the articles in the article image based on the category id and the instance id to which each remaining pixel point belongs may be: classifying the pixel points with the same category id in the rest pixel points into one category to obtain the category of the article in the article image; and classifying the pixel points with the same instance id in the pixel points with the same type id into one class to obtain the number of the articles under the type id.
Optionally, after the category id and the instance id of each pixel point are obtained and the category and the number of the articles in the article image are predicted, the predicted category id and the predicted instance id of each pixel point and the category and the number of the articles in the article image can be returned to the user terminal, so that the returned result can be displayed and recorded by the user terminal, and in addition, the user terminal can also integrate more functions to meet the needs of users, such as statistics, screening, complaints and the like of the result.
In the embodiment of the application, the class id of each pixel in the picture to be detected and the instance id thereof are given through the panoramic segmentation model, so that the whole picture scene is analyzed, the isolated product detection result is not only output, but also the quantity of articles in different scenes (namely different article classes) can be output, and the condition that the false detection is caused by the condition that the same article is detected for multiple times is avoided.
The embodiment of the present application further provides a model training apparatus 100, as shown in fig. 3. The model training apparatus 100 includes: an acquisition module 110, a labeling module 120, and a training module 130.
An acquisition module 110 is configured to acquire a plurality of scene images related to a retail scene.
And the labeling module 120 is configured to label the background in each scene image and the category and the example to which the article belongs, respectively, to obtain a training image, where in labeling, the articles in the same category are labeled the same, and the labeling types of the articles in the same category are different.
And a training module 130, configured to train the initial panorama segmentation model with the training image to obtain a trained panorama segmentation model.
The model training apparatus 100 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the embodiment that are not mentioned in the description of the present application.
The embodiment of the present application further provides an article identification apparatus 200, as shown in fig. 4. The article recognition apparatus 200 includes: an acquisition module 210 and a processing module 220.
The acquiring module 210 is configured to acquire an image of an object to be identified.
The processing module 220 is configured to predict a category and an instance to which each pixel point in the article image belongs by using a pre-trained panorama segmentation model, and give a category id and an instance id to which each pixel point belongs; and the method is also used for determining the category and the number of the articles in the article image based on the category id and the instance id to which each pixel point belongs.
Optionally, the obtaining module 210 is configured to obtain an item image of the item to be identified, which is placed on the shelf.
Optionally, the processing module 220 is configured to remove a pixel point whose category id is the same as the type id to which the background belongs; and determining the category and the number of the articles in the article image based on the category id and the instance id of each remaining pixel point.
Optionally, the processing module 220 is configured to classify the pixels with the same category id in the remaining pixels into one category, so as to obtain the category of the article in the article image; and classifying the pixel points with the same instance id in the pixel points with the same type id into one class to obtain the number of the articles under the type id.
Optionally, the article recognition apparatus 200 further comprises a model training module for: acquiring a plurality of scene images related to the application scene of the article image; respectively labeling the background in each scene image and the category and example to which the article belongs to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of different examples in the same category are different; and training a panoramic segmentation model by using the training image to obtain the trained panoramic segmentation model.
The article identification device 200 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the embodiment that are not mentioned in the description.
As shown in fig. 5, fig. 5 is a block diagram illustrating an electronic device 300 for forming a ring network according to an embodiment of the present application. The electronic device 300 includes: a transceiver 310, a memory 320, a communication bus 330, and a processor 340.
The elements of the transceiver 310, the memory 320 and the processor 340 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically coupled to each other via one or more communication buses 330 or signal lines. The transceiver 310 is used for transceiving data. The memory 320 is used for storing a computer program, such as a software functional module, such as the model training apparatus 100 shown in fig. 3 or the article identification apparatus 200 shown in fig. 4. The model training apparatus 100 or the article recognition apparatus 200 includes at least one software function module, which may be stored in the memory 320 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 300. The processor 340 is used for executing executable modules stored in the memory 320, such as software functional modules or computer programs included in the model training apparatus 100. For example, processor 340 for obtaining a plurality of scene images relating to a retail scene; respectively labeling the background in each scene image and the category and example to which the article belongs to obtain a training image, wherein during labeling, the articles in the same category are labeled the same, and the labeling types of different examples in the same category are different; and training an initial panorama segmentation model by using the training image to obtain a trained panorama segmentation model.
The processor 340, when being configured to execute an executable module stored in the memory 320, such as a software functional module or a computer program included in the article identification apparatus 200, is configured to obtain an image of an article to be identified; predicting the category and the instance of each pixel point in the article image by using a pre-trained panoramic segmentation model, and giving the category id and the instance id of each pixel point; and determining the category and the number of the articles in the article image based on the category id and the instance id of each pixel point.
The Memory 320 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
Processor 340 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 340 may be any conventional processor or the like.
The electronic device 300 includes, but is not limited to, a computer, a server, and the like. The electronic device may also be connected to a user terminal through a network, wherein at least one Application (APP) is installed in the user terminal and corresponds to the electronic device 300, so that the electronic device 300 provides services for a user. For example, receiving an article image to be identified sent by a user through a user terminal, predicting the category and the instance to which each pixel point in the article image belongs by using a pre-trained panoramic segmentation model, and giving the category id and the instance id to which each pixel point belongs; and finally, returning the category id and the instance id of each pixel point and/or the category and the number of the articles in the article image to the user terminal.
The present embodiment also provides a non-volatile computer-readable storage medium (hereinafter, referred to as a storage medium), where the storage medium stores a computer program, and the computer program is executed by a computer, such as the electronic device 300, to execute the article identification method and/or the model training method described above.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.