CN111507253B

CN111507253B - Display article auditing method and device based on artificial intelligence

Info

Publication number: CN111507253B
Application number: CN202010300775.6A
Authority: CN
Inventors: 郭卉; 黄飞跃; 袁豪磊; 郭晓威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2023-06-30
Anticipated expiration: 2040-04-16
Also published as: CN111507253A

Abstract

The invention provides a display article auditing method, a device, electronic equipment and a computer readable storage medium based on artificial intelligence; the method comprises the following steps: determining a plurality of display scene candidate areas included in the picture to be identified, and extracting image features of each display scene candidate area; identifying a type of showcase and a location of the showcase from each showcase candidate region based on the image characteristics of each showcase candidate region; based on the image characteristics of the region corresponding to the position of the display scene, performing object identification processing corresponding to the type of the display scene so as to determine the type of the object displayed in the display scene and the position of the object; and executing auditing operation based on the type and the position of the object in the display scene to obtain an auditing result of the display scene. According to the invention, the articles in the picture can be accurately identified, so that the identified articles can be automatically checked.

Description

Display article auditing method and device based on artificial intelligence

Technical Field

The present invention relates to an artificial intelligence technology, and in particular, to an artificial intelligence-based display article auditing method, apparatus, electronic device, and computer-readable storage medium.

Background

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

As an important technical content in the artificial intelligence software technology, the computer vision technology has been rapidly developed in recent years, the image recognition technology is an important branch in the computer vision technology, and various types of pictures can be recognized through the image recognition technology, such as face recognition, animal and plant variety recognition, article recognition and the like, but the recognition is based on the premise that the scene of the picture is known or the scene of the picture has no influence on the recognition target object, and when the scene has influence on the recognition target object, the scene to which the picture belongs and the target object in the scene are difficult to recognize accurately at the same time.

Disclosure of Invention

The embodiment of the invention provides a display article auditing method, device, electronic equipment and computer readable storage medium based on artificial intelligence, which can accurately identify articles of pictures and automatically audit the identified articles.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an artificial intelligence-based display article auditing method, which comprises the following steps:

determining a plurality of display scene candidate areas included in the picture to be identified, and extracting image features of each display scene candidate area;

identifying a type of showcase and a location of the showcase from each of the showcase candidate regions based on image characteristics of the each showcase candidate region;

performing an article identification process corresponding to a type of the display scene based on image features of an area corresponding to the position of the display scene to determine the type of the article displayed in the display scene and the position of the article;

and executing auditing operation based on the type and the position of the object in the display scene to obtain an auditing result of the display scene.

The embodiment of the invention provides an artificial intelligence-based display article auditing device, which comprises the following components:

The display scene detection module is used for determining a plurality of display scene candidate areas included in the picture to be identified and extracting image characteristics of each display scene candidate area;

a display scene locating module, configured to identify a type of a display scene and a position of the display scene from each display scene candidate region based on an image feature of the each display scene candidate region;

the article identification module is used for carrying out article identification processing corresponding to the type of the display scene based on the image characteristics of the area corresponding to the position of the display scene so as to determine the type of the articles displayed in the display scene and the position of the articles;

and the auditing module is used for executing auditing operation based on the type and the position of the object in the display scene to obtain an auditing result of the display scene.

In the above scheme, the display scene detection module is configured to:

dividing the picture to be identified into a plurality of subareas;

determining a plurality of sets of sub-regions of the plurality of sub-regions that satisfy a similar condition for at least one dimension, wherein the type of dimension includes size, texture, edge, and color;

combining each group of subareas in the plurality of groups of subareas respectively to correspondingly obtain a plurality of display scene candidate areas;

And carrying out convolution operation on each display scene candidate region to obtain image features comprising at least one of texture, edge and color.

The display scene positioning module is used for:

classifying the display scene candidate areas based on the image characteristics of each display scene candidate area to obtain probabilities of a plurality of candidate scene types corresponding to the display scene candidate areas, and determining the candidate scene type corresponding to the maximum probability as the display scene type included in the display scene candidate areas;

and carrying out bounding box regression processing on the display scene candidate region based on the image characteristics of the display scene candidate region to obtain the position of a bounding box of the display scene included in the display scene candidate region as the position of the display scene.

The article identification module is used for: determining a plurality of subareas from the areas corresponding to the positions of the display scenes to serve as article candidate areas;

classifying the item candidate region based on the image characteristics of the item candidate region to obtain the type of the item included in the item candidate region, wherein the type of the item corresponds to the type of the display scene;

And carrying out bounding box regression processing on the item candidate region based on the image characteristics of the item candidate region to obtain the position of a bounding box included in the item candidate region as the position of the item included in the item candidate region.

The auditing module is used for:

determining a longitudinal distance between any two items based on the ordinate in the position of each item in the display scene;

identifying any two items whose longitudinal distance does not exceed a longitudinal distance threshold as being in the same display layer, and identifying any two items whose longitudinal distance is greater than the longitudinal distance threshold as being in a different display layer;

generating a display layer table based on the identified display layer and the items in the display layer;

based on the type and the position of the objects in the display layer, the distribution condition of the objects of different types in the display layer table is determined to be used as a display layer auditing result of the display scene.

The auditing module is further used for:

the following is performed for each of the display layers:

determining a lateral distance between any two items in the display layer based on an abscissa in the position of any two items in the display layer;

When the transverse distance does not exceed the transverse distance threshold, identifying any two articles in the display layer as being on the same row surface, and when the transverse distance does not exceed the transverse distance threshold, identifying any two articles in the display layer as being on different row surfaces;

generating a noodle table based on the identified noodle and the items in the noodle;

and determining the distribution condition of the different types of articles in the row surface table based on the types and the positions of the articles in the row surface, and taking the distribution condition as a row surface auditing result of the display scene.

The auditing module is further used for:

the following is performed for each type of item in the display scenario:

determining a corresponding imaging area in the picture to be identified based on the position of the object, and determining a plurality of material candidate areas corresponding to the type of the display scene from the imaging area;

and identifying the type of the material associated with the type of the article and the position of the material from each material candidate area as a material auditing result of the display scene.

The auditing module is further used for:

determining the total number and position distribution of different types of articles based on the type and position of each article in the display scene;

Comparing the total number and the position distribution situation with the display total number and the distribution situation appointed by different types of articles in the auditing rule to obtain the differences of the types, the number and the distribution situation of the articles with quantity missing in the display scene, and taking the differences as the missing auditing result of the display scene.

The display scene detection module is further configured to:

when the picture to be identified comprises a plurality of display scenes and the auditing operation target scene is part of the display scenes, determining a bounding box corresponding to the position of the target display scene;

and filtering out the articles which are not in the bounding box from the articles identified by the pictures to be identified, and taking the filtered remaining articles as the articles for executing the auditing operation.

The embodiment of the invention also provides electronic equipment, which comprises:

a memory for storing executable instructions;

and the processor is used for realizing the display article auditing method based on artificial intelligence when executing the executable instructions stored in the memory.

The embodiment of the invention provides a computer readable storage medium which stores executable instructions for causing a processor to execute, thereby realizing the display article auditing method based on artificial intelligence.

The embodiment of the invention has the following beneficial effects:

by dividing the object recognition into the display scene recognition and the object recognition based on the display scene, the object recognition method can be compatible with diversified scenes to perform efficient and accurate recognition, so that the efficiency and accuracy of object auditing are ensured.

Drawings

FIG. 1A is a schematic diagram of an architecture of an automated display product auditing system according to an embodiment of the present invention;

FIG. 1B is another schematic diagram of an automated display article auditing system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a server according to an embodiment of the present invention;

FIG. 3 is a flow chart of an artificial intelligence based display item auditing method according to an embodiment of the present invention;

FIG. 4 is a flow chart of an artificial intelligence based display item auditing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the module composition of an automated auditing apparatus for display article according to an embodiment of the present invention;

FIG. 6 is a schematic view of a merchandise audit provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a commodity auditing procedure according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) Display scene: the environment in which the items are placed, such as shelves in a supermarket, refrigerators, counters, etc., may include several different types of items in the display scene.

2) Material: specific information of the articles corresponding to the materials can be determined according to the information displayed in the materials, and the expression forms of the specific information can be price tags, plug-in cards, posters, brand marks and the like of the articles.

The existing article detection system can identify articles located at different positions in an input specific scene based on the input specific scene, but the identification is based on the premise that the specific scene is not identified, so that when the scene of the article is changed, the identification requirement is correspondingly changed, if the identification method is still used, a large access between an identification result and an actual situation is caused, and because the identification mode can only identify the specific type of article, and the external reality constraint of the article such as a display shelf or a counter is not considered, the problem of misjudgment of a target or recall of a background is easy to occur, for example, the article outside the shelf range is identified as the target article. And such item detection systems are also unable to identify material in the display environment that is closely related to the target item, such as a tag, card, etc.

In view of the above problems, an embodiment of the present invention provides an automatic auditing system for displaying articles, wherein aiming at a shot article display picture or video, based on bottom layer capabilities such as display scene positioning, article detection, material detection, etc., the system realizes display scene recognition and article recognition based on the picture, and auditing based on article recognition results, and the auditing functions include: business capabilities such as item counting, display auditing, out-of-stock checking, etc.

Referring to FIG. 1A, FIG. 1A is a schematic diagram of an architecture of a display article automated auditing system 100 according to an embodiment of the present invention, wherein the display article automated auditing system 100 includes: server 200, network 300, and terminal 400. Server 200 is connected to terminal 400 via network 300. Network 300 may be a wide area network or a local area network, or a combination of both. The automatic checking system 100 for displaying articles is used for identifying the scenes in the pictures or videos and the articles in the scenes, checking the arrangement, the quantity and the like of the articles based on the type and the position of the identified articles, and finally transmitting the checking result to the terminal 400 through the network 300. The display item automation auditing system 100 may audit items in multiple types of scenarios, such as supermarket scenarios, bookstore scenarios, warehouse scenarios, shop shelf scenarios, and the like.

The display article auditing method based on artificial intelligence provided by the embodiment of the invention can be realized through the following processes: firstly, a plurality of network services are deployed in advance in the server 200, then the terminal 400 calls a camera to acquire a photo or a video, the terminal 400 sends an audit request and a data packet to the server 200 through the network 300, the data packet comprises the photo or the video acquired by the terminal, after the server 200 receives the audit request and the data packet, the network service (audit service) adapted to the audit request is searched in the plurality of network services deployed in advance, the photo or the video is identified and audited through the network service, so as to obtain an audit result, and finally, the audit result is returned to the terminal 400.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CD N, basic cloud computing services such as big data and artificial intelligent platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart camera, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In the embodiment of the invention, the auditing service provided by the server can be packaged into the cloud service for the user to audit the articles displayed in the pictures or videos, and the user logs in the cloud service in the forms of a browser/cloud client and the like to submit auditing requests and pictures (videos); the cloud service responds to the auditing request, identifies the display scene in the picture (video), further identifies the object in the display scene based on the identified display scene, finally audits the identified object to obtain an auditing result, and sends the auditing result to the user, and the user can determine whether the object in the display scene is displayed according to a preset rule or not according to the auditing result.

Specifically, cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks, etc. in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data.

Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

In other embodiments, the display item auditing method based on artificial intelligence provided by the embodiments of the present invention may also be implemented in combination with blockchain technology.

Referring to FIG. 1B, FIG. 1B is another architectural diagram of an automated display article auditing system 100 according to an embodiment of the present invention. Wherein the display article automated auditing system 100 comprises: server 200, network 300, terminal 400, and blockchain network 500 (node 510-1, node 510-2, and node 510-3 are illustratively shown as being included in blockchain network 500). The blockchain network 500 is configured to receive the audit result sent by the terminal 400, and perform comprehensive analysis on the audit result sent by each terminal to determine that the target display object is in compliance in the same display scene in different places.

The display article auditing method based on artificial intelligence provided by the embodiment of the invention can be realized by the following modes:

first, a plurality of network services are deployed in advance in the server 200, in order to determine whether a target showpiece is showcase in the same showcase in a plurality of places, the terminals 400 in the plurality of places call the cameras to acquire photos or videos including the target showpiece, the plurality of terminals 400 send audit requests and data packets to the server 200 through the network 300, the data packets include the photos or videos acquired by the terminals, the server 200, after receiving the audit requests and the data packets from the plurality of terminals, searches the network services (audit services) adapted to the audit requests in the plurality of network services deployed in advance, identifies the showcase in the photos or videos through the network services, further identifies the showcase in the showcase based on the identified showcase, finally performs audit on the identified showcase, obtains a plurality of audit results, and sends the audit results to the corresponding terminals 400, and then, the terminals 400 send the audit results to the blockchain network 500 through the network 300, after receiving the audit results sent by the terminals 400, the blockchain network 500 determines that the target showcase is showcase in the same showcase in the plurality of places.

Continuing to describe the server 200 shown in fig. 1A-1B, referring to fig. 2, fig. 2 is a schematic structural diagram of the server 200 provided in an embodiment of the present invention, and the server 200 shown in fig. 2 includes: at least one processor 410, a memory 440, at least one network interface 420. The various components in server 200 are coupled together by bus system 430. It is understood that bus system 430 is used to enable connected communications between these components. The bus system 430 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 430.

The processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

Memory 440 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 440 optionally includes one or more storage devices physically remote from processor 410.

Memory 440 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Mem ory) and the volatile memory may be random access memory (RAM, random Access Memory). The memory 440 described in embodiments of the present invention is intended to comprise any suitable type of memory.

In some embodiments, memory 440 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 441 including system programs, e.g., a framework layer, a core library layer, a driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;

network communication module 442 for reaching other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: bluetooth, wireless compatibility authentication (WiF i), and universal serial bus (USB, universal Serial Bus), etc.;

in some embodiments, the apparatus provided by embodiments of the present invention may be implemented in software, and fig. 2 shows an artificial intelligence based display item auditing apparatus 453 stored in a memory 440, which may be in the form of a program and plug-in, etc., comprising the following software modules: the display scene detection module 4531, the display scene positioning module 4532, the item identification module 4533, and the auditing module 4534 are logical, and thus may be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be described hereinafter.

In other embodiments, the display article auditing apparatus provided by the embodiments of the present invention may be implemented in hardware, and by way of example, the display article auditing apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the methods provided by the embodiments of the present invention. The method, for example, a processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASICs, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device), field programmable gate arrays (FPGAs, fields-Programmable Gate Array), or other electronic components.

Exemplary applications and implementations of the display item auditing system provided by embodiments of the present invention will be described with respect to an artificial intelligence-based display item auditing method provided by embodiments of the present invention.

Referring to fig. 3, fig. 3 is a schematic flow chart of an artificial intelligence based display item auditing method according to an embodiment of the present invention, and will be described with reference to the steps shown in fig. 3.

In step 101, the server determines a plurality of showcase candidate regions included in the picture to be identified, and extracts image features of each showcase candidate region.

When the picture to be identified is a plurality of pictures, the server sequentially determines a plurality of display scene candidate areas in each picture to be identified, and extracts image features. The form of the picture to be identified can be a photo, a video frame obtained by decoding the video, or other forms. Each picture to be identified at least comprises one display scene candidate area, for example, the picture is a picture of a refrigerator, only one display scene of the refrigerator is displayed, and further, the picture of a supermarket comprises a plurality of side-by-side shelves, and the side-by-side shelves form a plurality of display scene candidate areas.

Referring to fig. 4, fig. 4 is a schematic flow diagram of an artificial intelligence based display article auditing method according to an embodiment of the present invention, a picture to be identified is sent to a server by a terminal, after receiving an image acquisition request, the terminal invokes a camera to pick up a target scene to obtain a picture or a video, and sends the picture or the video to the server in a form of a data packet together with the auditing request, and after receiving the auditing request and the data packet, the server processes the data packet to obtain the picture to be identified.

In some embodiments, the server determines a plurality of display scene candidate regions included in the picture to be identified, and extracts an image feature of each display scene candidate region, which may be implemented in the following manner:

the server divides the picture to be identified into a plurality of subareas; the server determines a plurality of groups of sub-regions of the plurality of sub-regions that satisfy a similar condition for at least one dimension, wherein the type of dimension includes size, texture, edge, and color; the server merges the subareas of each group to obtain a plurality of display scene candidate areas; the server convolves each showcase candidate region with a convolution layer in the showcase detection model to obtain image features including at least one of texture, edge, and color.

Parameters in similar conditions such as size, texture, edge and color are closely related to a specific type of display scene, so that multiple groups of subareas can be screened through similar conditions of at least one dimension such as size, texture, edge and color, and candidate areas of the display scene can be obtained. For example, if a supermarket scene needs to be identified, parameters of the similar conditions are related to the supermarket scene. If the picture to be identified is a supermarket scene, a supermarket scene candidate region can be obtained by identifying the picture; if the picture to be identified is not a supermarket scene, a plurality of groups of sub-regions of the picture cannot meet the similar conditions, and a supermarket scene candidate region cannot be obtained, so that the picture is filtered.

Multiple show scene candidate regions may be determined by an input layer in a trained show scene detection model, and because of the similarity and connectivity of regions in the picture, regions containing specific items and scenes may be selected and combined based on this nature of the regions to yield a proposed box, i.e., a show scene candidate region.

Firstly, dividing a picture to be identified to obtain a plurality of subareas. And then taking a plurality of similar subareas in the picture as a group of subareas, so that a plurality of groups of similar subareas can be obtained, wherein the similarity refers to that the similarity of at least one dimension of the two or more areas in the dimensions of size, texture, edge, color and the like is larger than a similarity threshold value. And thirdly, carrying out circumscribed rectangle on each group of subareas, namely merging the subareas of each group to obtain a plurality of display scene candidate areas (proposal boxes). Finally, scaling the display scene candidate region to a specified size, and extracting image features of the display scene candidate region with the specified size through a convolution layer in a trained display scene detection model, wherein the display scene detection model is used for identifying display scenes in pictures to determine the types and positions of the display scenes, and the model is a deep learning model and comprises an input layer, a convolution layer, a full connection layer and an output layer.

The training process of the display scene detection model is as follows: the server is in communication connection with the cloud big data storage center, can apply for calling data to the cloud big data storage center in real time, trains the display scene detection model by calling sample data related to the display scene, and adjusts model parameters (learning rate, iteration times, batch size and the like) of the display scene detection model according to training results so as to obtain a trained display scene detection model. The sample data comprises a training sample and a test sample, the sample data is divided into the training sample and the test sample, the showcase detection model is trained through the training sample, and whether the showcase detection model is accurate or not and adaptive to the adjustment model parameters are detected through the test sample. The training samples comprise positive sample data and negative sample data, wherein the positive sample data accounts for 75% of the whole sample data, the positive sample data is related data of each display scene, and the negative sample data is data irrelevant to the display scene.

In some embodiments, a candidate region network (Region Proposal Networks, RPN) may also be utilized to generate a plurality of showscene candidate regions.

Therefore, by adopting a sub-region merging strategy, suspected object frames with various sizes can be determined, so that the sub-regions possibly belonging to the candidate region of the display scene are not omitted, the diversity of dimensions of similar regions is merged, and the precision of region merging is improved.

In step 102, the server identifies a type of showcase and a location of the showcase from each showcase candidate region based on the image characteristics of each showcase candidate region.

The preferred processing method of step 102 may refer to step S404 in fig. 4. The display scene detection model classifies the display scene candidate regions through a support vector machine (Support Vector Machine, SVM) or softmax logistic regression based on the image characteristics of each display scene candidate region to obtain the probability of candidate scenes of a plurality of preset types, for example, four preset types of display scene candidate scenes are a shelf, a refrigerator, a counter and a bookshelf, then the probability of each display scene candidate region being a shelf scene, a refrigerator scene, a counter scene and a bookshelf scene is respectively determined, the candidate scene type corresponding to the maximum probability (which is required to be greater than a scene probability threshold) is determined as the display scene type included in the display scene candidate region, and if the maximum probability is smaller than the scene probability threshold, the picture does not need to be continuously identified if any one of the candidate scenes of the preset types is indicated to be not included in the picture to be identified.

After the type of the display scene is determined, carrying out bounding box regression processing on the display scene candidate region based on the image characteristics of the display scene candidate region, and obtaining the position of a bounding box of the display scene included in the display scene candidate region as the position of the display scene. In the embodiment of the present invention, the representation form of the position of the display scene may be the diagonal coordinates of the bounding box (the outer frame of the region corresponding to the position of the display scene), such as the upper left corner coordinates and the lower right corner coordinates of the bounding box, or may be the center coordinates of the bounding box and the width and height of the bounding box. After the type and position of the show scene are identified, the type and position of the show scene in the show scene candidate region will be marked in the picture to be identified.

The principle and process of bounding box regression processing is as follows: after merging of the sub-regions, a predicted showcase candidate region is obtained, together with a position parameter of the predicted showcase candidate region, and the position of the predicted showcase candidate region needs to be adjusted because the positioning of the predicted showcase candidate region may be inaccurate, resulting in an inaccurate enclosure of the entire item/scene. In some embodiments, a frame regression method may be adopted to process the predicted display scene candidate region, learn the predicted display scene candidate region and the actual display scene candidate region in the sample data, and obtain each target parameter when a loss function is minimum through a gradient descent method or a least square method or the like, wherein the loss function is the sum of differences between each target function and the corresponding position parameter of the actual display scene candidate region, and the target function is a function related to the position parameter of the predicted display scene candidate region; and then, extracting feature vectors of the predicted display scene candidate regions to obtain position parameters of the predicted display scene candidate regions, and carrying out translation operation and scaling operation on the predicted display scene candidate regions based on the target parameters and the position parameters of the predicted display scene candidate regions to obtain actual display scene candidate regions, namely obtaining the positions of the display scenes.

Therefore, the type of the display scene is identified based on the image characteristics of the candidate region of the display scene, the situation that the non-display scene and the display scene do not accord with the preset type can be eliminated, and the accuracy of the type identification of the display scene is improved; the accuracy of the position of the display scene can be improved by finely adjusting the predicted candidate region of the display scene through a frame regression method.

In step 103, the server performs an item identification process corresponding to the type of the display scene based on the image feature of the region corresponding to the position of the display scene, to determine the type of the item displayed in the display scene and the position of the item.

In one possible example, step 103 may be implemented by the following sub-steps:

the server determines a plurality of subareas from the areas corresponding to the positions of the display scene to serve as article candidate areas; the server classifies the item candidate areas based on the image features of the item candidate areas to obtain the types of the items included in the item candidate areas, wherein the types of the items correspond to the types of the display scenes; and the server carries out bounding box regression processing on the item candidate region based on the image characteristics of the item candidate region to obtain the position of a bounding box included in the item candidate region as the position of an item included in the item candidate region. The server determines a plurality of candidate areas through an input layer in the trained object detection model as object candidate areas, specifically, may determine the object candidate areas through a selection search method, or may determine the candidate areas through a sliding window or a rule block method, which is not limited in the embodiment of the present invention. The following describes the determination of item candidate areas by the selection search method. In step 102, after the position of the display scene is identified, the position of the display scene is marked, that is, a bounding box of a region corresponding to the position of the display scene is marked, the server divides the region inside the bounding box into a plurality of sub-regions, merges the sub-regions continuously according to the similarity between the sub-regions, and performs circumscribed rectangle on the merged sub-regions, so as to obtain a plurality of circumscribed rectangles, that is, a plurality of article candidate regions. The training method of the object detection model is similar to the training method of the showcase detection model, and positive sample data, which is a picture of a plurality of types of objects corresponding to the type of showcase, and negative sample data, which is a picture excluding the objects and a picture of the objects not corresponding to the type of showcase, are also included in the training samples for training the object detection model. For example, when the showcase is a shelf, the positive sample data is a picture of various types of goods involved in the manifest, and the negative sample data is a picture irrelevant thereto; when the display scene is a refrigerator, the positive sample data is a picture of various types of goods involved in the refrigerated/frozen manifest, and the negative sample data is a picture irrelevant thereto.

After determining the item candidate region, the server extracts image features of the item candidate region through a convolution layer in the trained item detection model, then classifies the item candidate region through a support vector machine or softmax logistic regression based on the image features of the item candidate region to obtain probabilities of a plurality of items corresponding to the type of the display scene corresponding to the item candidate region, determines the item type corresponding to the maximum probability (which is required to be greater than an item probability threshold) as the item type corresponding to the item candidate region, for example, the types of the items corresponding to the display scene of the supermarket are milk, chocolate, pencil and rubber, determines the probability of the items corresponding to each item candidate region being milk, chocolate, pencil and rubber, and determines the item type with the highest probability and higher than the item probability threshold as the item type corresponding to the item candidate region of the display scene of the supermarket. If the maximum probability is less than the item probability threshold, indicating that any one of the items corresponding to the type of display scene is not included in the item candidate region, it is not necessary to continue identifying the picture containing the item candidate region. The method for determining the position of the object is similar to the method for determining the position of the display scene, and reference is made to the description in step 2, and will not be repeated here.

In step 104, the server performs an audit operation based on the type and location of the item in the display scenario, resulting in an audit result for the display scenario.

The preferred processing procedure of the auditing operation may refer to steps S406-S408 in fig. 4, and after the auditing result of the display scenario is obtained, the auditing result is packaged to generate an auditing data packet, and sent to the terminal. The server may perform the following audits for the item: spatial arrangement checking, material checking and lack checking. The space arrangement checking can comprise display layer checking and arrangement checking, namely the space arrangement checking can check the display layer and the arrangement of the objects at the same time, and can check the arrangement mode of the integral position of the objects in the display scene or the position distribution of different types of objects in the space; the material auditing is used for auditing the position and the type of the material so as to determine whether the position of the material is correct or whether the type of the article in the article imaging area corresponding to the material is correct or not; the missing audit will audit the number and location of certain types of items in the display scenario simultaneously to determine missing items and to determine items whose display location and type are not satisfactory.

In one possible example, when a plurality of display scenes are included in the picture to be identified, and the audit operation target display scene is a part of the display scenes in the plurality of display scenes, determining a bounding box corresponding to the position of the target display scene;

Filtering out the articles which are not in the bounding box from the articles which are identified by the pictures to be identified, and taking the filtered remaining articles as the articles for executing the auditing operation.

It should be noted that in the embodiment of the present invention, only a specific one or more target display scenes in the picture to be identified may be identified, or all display scenes may be identified, for example, 3 rows of shelves in one picture to be identified may be identified, and then all articles on the 3 rows of shelves may be identified, so as to further audit the articles, or only articles on one row of shelves may be identified and audited, where only one row of shelves is the target display scene, so that only articles in the one row of shelves are in the bounding box. After the position of the target display scene is determined, all the articles can be identified in the whole picture to be identified, then the articles in the bounding box corresponding to the position of the target display scene are selected from the images, or only the articles in the bounding box corresponding to the position of the target display scene can be identified, and the articles are used as the articles for executing the auditing operation.

Therefore, the result of the article detection can be optimized by filtering the articles which are not in the bounding box, so that the detected articles are in the same scene, and the confusion between the background articles and the target articles is avoided.

In one possible example, the server performs the auditing operation based on the type and location of the item in the display scenario, and obtains the auditing result of the display scenario, which may be implemented in the following manner: determining a longitudinal distance between any two items based on an ordinate in a position of each item in the display scene; identifying any two items whose longitudinal distance does not exceed a longitudinal distance threshold as being in the same display layer, and identifying any two items whose longitudinal distance is greater than the longitudinal distance threshold as being in a different display layer; generating a display layer table based on the identified display layer and the items in the display layer; the distribution of different types of items in the display layer table is determined based on the types of items in the display layer as a display layer audit result of the display scenario.

The central coordinate of each article and the width and height of the surrounding frame where the article is located can be determined according to the position of each article, and the central coordinate comprises an abscissa and an ordinate. The transverse direction is the direction parallel to the display layer and the longitudinal direction is the direction perpendicular to the display layer. The longitudinal distance is the absolute difference of the ordinate of the two items and the transverse distance is the absolute difference of the abscissa of the two items.

It should be noted that if any two items whose lateral distance does not exceed the lateral distance threshold are identified as being in the same display layer, this may result in the false identification of two items in the same column, in different layers, as being in the same layer; if any two articles whose linear distance does not exceed the linear distance threshold are identified as being in the same display layer, when the layer height is small, it is also possible to erroneously identify that two articles in the same column and two adjacent layers are in the same layer. Because the size of the articles in the same layer is generally not different, the longitudinal distance is smaller, so that the longitudinal distance threshold is used as a standard for measuring whether the two articles are in the same layer, the display layer table generated according to the standard is more reliable, and the accuracy of the display layer table is higher.

In some embodiments, the distribution range may be divided into a plurality of longitudinal sections according to the distribution range of the ordinate of the positions of all the items in the display scene, the longitudinal distance threshold is determined according to the longitudinal coordinate sections, each longitudinal section represents one layer, and when the ordinate between any two items is in the same longitudinal section, that is, when the ordinate distance between the two items does not exceed the longitudinal distance threshold, the two items are considered to be in the same layer. For example, the longitudinal coordinates of the article are: 1. 1,2, 5,7, 10, 11, 12), then three longitudinal intervals can be divided according to the nine longitudinal coordinates: (1, 2), (5,5,7), (10, 11, 12) it is possible to determine that the longitudinal distance threshold is 3, and when the longitudinal distance of two articles is 3 or more, they are considered to belong to different layers, for example, when the longitudinal coordinates of two articles are known to be 5 and 6, respectively, because the longitudinal distance is 1 less than the longitudinal distance threshold 3, and because 5 and 6 belong to the same longitudinal section, they belong to the same layer.

After determining the presentation layer to which each item belongs, a presentation layer table may be generated from the individual items in the known presentation layer, one row of the presentation layer table being as follows: layer number 1, article 2, article 3. The type dimension of the object is not added in the display layer table, so that the distribution condition of different types of objects in the display layer table needs to be determined according to the types of the objects, for example: layer number 1, article 1-cola, article 2-mineral water and article 3-chocolate.

Therefore, by adding the information of the dimension of the type of the article into the display list and recording the distribution condition of the article, not only can the arrangement of the article be known, but also the specific distribution of the different types of articles in the arrangement can be intuitively perceived.

In some embodiments, the display layer table may also be generated by: determining the order of the ith article in the vertical space based on the position and the type of the article until determining the order of all the articles in the vertical space, and generating a display layer table based on the order of all the articles in the vertical space, wherein i is a positive integer; wherein, the sorting of the ith article in the vertical space is determined based on the position and the type of the article, and can be realized in the following way: determining at least one item with a difference value from the abscissa of the ith item in the picture to be identified smaller than a first threshold value from the items with determined vertical spatial ordering; determining an article, of which the difference value of the vertical coordinate of the at least one article and the ith article in the picture to be identified is smaller than a second threshold value; the ordering of the items in the vertical space is taken as the ordering of the ith item in the vertical space.

The method for generating the display layer table ensures that the horizontal distance between two articles is small enough by ensuring that the difference value of the horizontal coordinates between the two articles is smaller than a first threshold value, a second threshold value is used for ensuring that the two articles are on the same layer in the vertical space, and articles which are on the same layer and are adjacent or close to the ith article in the horizontal direction are obtained through screening under the screening condition that the difference value of the vertical coordinates is smaller than the second threshold value, so that the sorting (layer number) of the ith article in the vertical space can be determined according to the sorting (layer number) of the ith article in the vertical space.

Wherein, the sorting of the ith article in the vertical space is determined based on the position and the type of the article, and can be realized in the following way: determining a target object with the smallest difference value with the ordinate of the ith object in the picture to be identified and the difference value of the ordinate being smaller than a second threshold value from the objects with the determined vertical space ordering; the order of the ith target item in the vertical space is determined according to the order of the target items in the vertical space.

Wherein, because the ordinate of different articles may be different, even though the articles in the same layer may be different in type and size, and the ordinate is different, by finding the target article whose difference from the ordinate of the i-th article is the smallest and whose difference from the ordinate is smaller than the second threshold value, it is possible to determine that the target article is in the same layer as the i-th article, and thus it is possible to determine the order (number of layers) of the i-th target article in the vertical space according to the order (number of layers) of the target articles in the vertical space.

In one possible example, after the display layer table is obtained, the following processing is performed for the items in each display layer: determining the transverse distance between any two articles according to the abscissa in the positions of any two articles; identifying any two articles with the transverse distance not exceeding the transverse distance threshold as being on the same row surface, and identifying any two articles with the transverse distance greater than the transverse distance threshold as being on different row surfaces; generating a surface-by-surface table based on the identified surface and the items in the surface; and determining the distribution condition of different types of articles in the row surface table based on the types of the articles in the row surface, and taking the distribution condition as a row surface auditing result of a display scene.

Wherein, the row faces of the articles with the difference values of the horizontal and the vertical coordinates within a certain range are considered to be the same, the general types of the articles are the same, for example, the articles are all instant noodles which are placed from inside to outside in one compartment on a goods shelf, and the instant noodles are considered to belong to the same row face. Upon identifying the respective row facets and the items included in the row facets, a row facet table for each display layer may be generated, the contents of the row facet table in one display layer being as follows: noodle 1, article 2, article 3. The type dimension of the article is not added in the row surface table, so that the distribution condition of the different types of articles in the display layer table needs to be determined according to the types of the articles, for example: noodle 1, item 1-cola, item 2-cola, item 3-cola.

Therefore, by adding the information of the dimension of the type of the article into the surface arrangement table and recording the distribution condition of the article, the arrangement of the article can be known, and the specific distribution of the articles of different types in the arrangement can be intuitively perceived.

In some embodiments, the surface rank table may also be generated according to the vertical spatial rank table in the foregoing embodiments, and the specific steps are as follows: at least one article corresponding to each serial number in the vertical space sequencing table is sequenced according to the abscissa, so that an article sequence table corresponding to each serial number is obtained; and re-determining the sequence of the jth article in the article sequence list until the sequence of each article in the article sequence list is re-determined, and generating a surface arrangement table according to the sequence of each article in the article sequence list, wherein j is a positive integer greater than or equal to 2. In some embodiments, the re-ordering of the jth item in the item sequence list may be accomplished by: if the article sequence table comprises n articles, determining a horizontal coordinate difference value between a jth article and a jth-1 article in the article sequence table, wherein n is a positive integer greater than or equal to 2; if the horizontal coordinate difference value is smaller than the third threshold value, subtracting one from the sequence of the jth article to the nth article; and if the horizontal coordinate difference value is not smaller than the third threshold value, maintaining the ordering of the n articles unchanged.

Wherein, the serial numbers in the vertical space ordering table represent the layer number, and the object sequence table is obtained according to the abscissa of the object for each layer, for example: the object sequence table of the first layer is: the sorting of the 5 articles is sequentially 1, 2, 3, 4 and 5 from left to right, the sorting of the article 2 is redetermined, the horizontal coordinate difference value a of the article 1 and the article 2 is determined, if a is smaller than a third threshold value, the sorting of the article 2, the article 3, the article 4 and the article 5 is reduced by 1, and then the article sequence table is: (item 1, item 2), item 3, item 4, item 5, the 5 items being ordered from left to right in sequence 1, 2, 3, i.e. item 1 is in the same row as item 2. And then, continuing to redetermine the sequence of the subsequent articles until the sequence of all articles in the article sequence list is redetermined, so that the article sequence list of each layer can be obtained, and generating a surface arrangement table according to the article sequence list of each layer on the basis of determining the types of the articles.

In one possible example, the server performs the auditing operation based on the type and location of the item in the display scenario, and obtains the auditing result of the display scenario, which may be implemented in the following manner: the following is performed for each type of item in the display scenario: determining a corresponding imaging area in the picture to be identified based on the position of the object, and determining a plurality of material candidate areas corresponding to the type of the display scene from the imaging area; the type of the material associated with the type of the item and the position of the material are identified from each material candidate area as a material audit result of the display scene.

Wherein, the materials in different scenes are different, for example, in a bookshelf scene, the materials can be bookmarks, banners, labels and the like; in a supermarket scene, the materials can be price tags, card inserting posters, brands and trademarks and the like. The positions of the materials are in one-to-one correspondence with the positions of the corresponding objects, and the materials comprise information of the corresponding objects, for example, the materials comprise the following contents: * Mineral water, 550ml, 1.5. Because the materials are in one-to-one correspondence with the positions of the corresponding articles, a plurality of corresponding material candidate areas can be determined based on the imaging areas of the articles, then one material candidate area is selected from the plurality of material candidate areas, the types and the positions of the materials are identified, and the text identification module is further called to identify the content of the materials so as to determine whether the types of the articles corresponding to the materials are correct.

In some embodiments, identifying the type of material associated with the type of item, and the location of the material, from each material candidate region may be accomplished as follows: classifying the material candidate region based on the image characteristics of the material candidate region to obtain the types of materials included in the material candidate region; performing bounding box regression processing on the material candidate region based on the image characteristics of the material candidate region to obtain the position of a bounding box included in the material candidate region as the position of a material included in the material candidate region; wherein the details of the determination of the material type and position may be referred to the description of the determination section of the type and position of the showcase in the foregoing embodiment.

Therefore, the embodiment of the invention not only has the function of detecting the materials except the target object, but also can further check the object based on the specific information of the detected materials so as to improve the checking accuracy.

In one possible example, performing an audit operation based on the type and location of the item in the display scenario, resulting in an audit result for the display scenario, may be implemented as follows: determining the total number and position distribution of the different types of articles based on the type and position of each article in the display scene; comparing the total number and the position distribution situation with the display total number and the distribution situation appointed by different types of articles in the auditing rule to obtain the type of the articles with quantity missing, the quantity missing and the difference of the distribution situation in the display scene, and taking the differences as the missing auditing result of the display scene.

For example, the supermarket cooperates with the mineral water manufacturer a who requires the supermarket to place the brand mineral water in the 4 th-6 th layer of the specified shelf and the number is not less than 30 bottles, and based on the detected position and number of the brand mineral water in the display scene, it can be determined that the brand mineral water is actually located in the 4 th-6 th layer of the shelf, but the number is only 28 bottles, so that there is a lack of leakage, 2 bottles of mineral water need to be replenished in the 4 th-6 th layer of the shelf, and the server will generate a lack of leakage checking result according to this situation.

Therefore, the embodiment of the invention realizes automatic checking of the articles, does not need manual checking, and has high accuracy and high checking efficiency.

In one possible example, if the picture to be identified is an infrared picture, after performing an audit operation based on the type and the position of the object in the display scene to obtain an audit result of the display scene, a mean value of the color values of the candidate region in the picture to be identified may also be obtained; and determining the temperature environment of the candidate region according to the mean value of the color values.

The candidate area comprises a plurality of pixel points, color values of the pixel points are obtained, and the color values are averaged, so that the temperature environment can be determined according to the average value of the color values.

Therefore, the temperature environment where the article is located can be identified through the identification of the infrared picture, if the article is identified as a freezing environment, the scene can be a refrigerator, and if the article is identified as a higher-temperature environment, the scene can be an oven, and the embodiment of the invention can realize different article auditing and calculating logics under different display scenes such as normal temperature, low temperature, high temperature and the like.

In the following, an exemplary application of the embodiment of the present invention in an application scenario of an actual supermarket commodity audit will be described.

The embodiment of the scheme is mainly visualized output results on the product side, and the automatic auditing results are output after the complete calculation of the automatic auditing device for the displayed goods for the input commodity display scene graph. Referring to fig. 5, fig. 5 is a schematic diagram of module components of an automatic checking device for displaying articles, which is provided by the embodiment of the present invention, and the automatic checking device for displaying articles includes a scene positioning and identifying module 501, a commodity detecting and identifying module 502, a supermarket element detecting and identifying module 503, a commodity filtering module 504, and a business checking module 505.

Referring to fig. 6, fig. 6 is a schematic diagram of commodity audit provided in the embodiment of the present invention, after inputting a commodity display scene diagram to a displayed article automatic audit device, an automatic audit result will be output after audit, and in fig. 6, the left side is a schematic diagram of a shelf 601, and the right side is a spatial arrangement table 602 of identified commodities on the shelf.

Referring to fig. 7, fig. 7 is a schematic diagram of a commodity auditing flow provided by an embodiment of the present invention, and for each input picture (701), the position and category of a target scene are obtained through display scene positioning and identification (702); secondly, carrying out commodity detection and identification (703) and supermarket element detection and identification (704) based on the position and the category of the target scene respectively, wherein the supermarket element comprises brands, plug-in cards and the like of brands; thirdly, filtering the detected commodities according to the target scene (705); then calculating corresponding business audit logic (706) based on the filtered commodity and supermarket elements; finally, the result is outputted (707).

The following describes a specific auditing process.

And inputting the picture to be identified into an automatic auditing device for the display articles, determining a target scene through a scene positioning and identifying module in the scene positioning and identifying stage, and determining the type and the position of the target scene. The method comprises the steps of training in advance by adopting a deep learning method to obtain a display scene detection model, predicting an input picture through the display scene detection model to obtain a positioning rectangular frame and a specific scene type of a target scene (such as a goods shelf, a stacking head, a refrigerator and the like) in the picture, filtering non-target scene pictures, and returning unrecognizable information in result output if a display article automatic auditing device receives other natural environment pictures, wherein the target scene cannot be recognized in the display scene detection module.

And in the commodity detection and identification stage, determining the target commodity through a commodity detection and identification module, and determining the type and the position of the target commodity. And training in advance by adopting a deep learning method to obtain an article detection model, predicting an input picture by using the model to obtain positioning rectangular frames of all commodities in the picture and specific commodity categories, and marking the commodity categories.

In the stage of supermarket element detection and identification, a supermarket element is determined through a supermarket element detection and identification module, and the type and the position of the supermarket element are determined. And training in advance by adopting a deep learning method to obtain an element detection model. And predicting the input picture by using the model to obtain a positioning rectangular frame of the supermarket element in the picture, the type of the supermarket element and information related to the object in the supermarket element.

And in the commodity filtering stage, filtering the result output in the commodity detection and identification stage by a commodity filtering module. According to the positioning rectangular frame of the target scene, the commodities which are not in the positioning rectangular frame can be filtered out, so that the target commodities in the positioning rectangular frame can be obtained, and commodity filtering can optimize the commodity detection result, so that the detected target commodities are in a unified scene, and the confusion between the background commodities and the target commodities is avoided. The specific filtering process is as follows: and establishing a target commodity list, initializing the target commodity list, acquiring a scene positioning frame S and positioning rectangular frames G of all commodities, wherein the coordinates of the upper left point of the positioning rectangular frame G of the commodity are (x 1, y 1), the coordinates of the lower right point of the positioning rectangular frame G of the commodity are (x 2, y 2), if the positioning rectangular frame G of the commodity is positioned in the field Jing Dingwei frame S, adding the commodity into the target commodity list, and otherwise, detecting the next commodity. Because the supermarket elements are in one-to-one correspondence with the positions of the corresponding articles, for example, the commodity labels are in one-to-one correspondence with the positions of the corresponding articles, whether the articles exist in the corresponding positions can be confirmed according to the types and the positions of the supermarket elements determined in the supermarket element detection and identification stage, the articles in the target scene can be determined more quickly, and the article filtering accuracy is improved.

In the business auditing stage, the business auditing module performs business logic calculation according to the display scene type (such as a shelf). Business logic is used to determine the manner in which a product is displayed, such as determining the number of layers, arrangement, count, etc. of a particular product.

The method for determining the commodity layer number comprises the following steps:

establishing a display layer table, initializing the display layer table, wherein each element in the display layer table is { layer number x: commodity id list }.

A longitudinal distance threshold t1 is set, and when the absolute difference (longitudinal distance) of the ordinate among the center coordinates of the two target commodities is smaller than t1, the two target commodities are considered to be located in the same layer.

For each target commodity: if the number of layers of the target commodity is determined, continuing to determine the number of layers of the next target commodity; otherwise, calculating the distance (absolute difference of ordinate) between the target commodity and the target commodity with the determined layer number in the display layer list, so as to determine the commodity k closest to the target commodity;

if the distance between the commodity k and the target commodity is less than t1 and the number of layers of the commodity k is i, adding the target commodity to the ith layer in the display layer list, as shown in fig. 6, the number of layers of the commodity 605 is 2, and the absolute difference between the ordinate of the target commodity 604 and the commodity 605 is less than t1, adding the target commodity 604 to the 2 nd layer in the display layer list; if the distance between the commodity k and the target commodity is greater than t1, a layer number j is newly built in the display layer number table, and the target commodity is added into the j-th layer in the display layer number table.

The method for determining the commodity noodle arrangement comprises the following steps:

establishing a row surface total table, and initializing the row surface total table, wherein each element in the row surface total table is { layer number x: commodity surface table corresponding to layer number x, such as { layer number 1: mineral water 2-1, water cups 4-2, chocolate 3-3}, wherein "-x" represents the noodle of the commodity on the layer, and 2 bottles of mineral water on the noodle 1,4 water cups on the noodle 2,3 chocolate on the noodle 3 are known;

and setting a transverse distance threshold t2, and considering that two target commodities belong to the same row surface when the absolute difference (transverse distance) of the transverse coordinates in the central coordinates of the two target commodities in the same layer is smaller than t 2.

For each layer of target commodities, determining a row face of each target commodity in the layer:

acquiring the center coordinates of each target commodity in the layer, and setting n commodities in the layer;

sequentially ordering each target commodity according to the size of the abscissa in the center coordinate to obtain a surface sequence number list of the layer: 0-commodity 1, 1-commodity 2, 2-commodity 3, … i-commodity i …, n-1-commodity n;

for commodity i: if the absolute difference between the abscissas of the commodity i and the commodity i+1 is smaller than t2, it is determined that the commodity i and the commodity i+1 belong to the same row (e.g. in the second layer of commodities in fig. 6, the absolute difference between the abscissas of the commodity 603 and the commodity 604 is smaller than t2, and it is determined that the commodity 603 and the commodity 604 belong to the same row), the serial numbers of the commodities i+1 to n are all decremented by 1, and the row number list is updated as follows: 0-commodity 1, 1-commodity 2, 2-commodity 3, … i-commodity i, i-commodity i+1 …, n-2-commodity n. If the absolute difference between the abscissas of the commodity i and the commodity i+1 is greater than t2, the row number list of the layer is maintained unchanged (e.g. in the commodity of the second layer in fig. 6, the absolute difference between the abscissas of the commodity 605 and the commodity 604 is greater than t2, and it is determined that the commodity 605 and the commodity 604 do not belong to the same row). Thus, the serial number of each target commodity in the layer is redetermined, an updated surface sequence number list is obtained, and a surface sequence table is obtained according to the surface sequence number list of each layer.

In summary, the embodiment of the invention provides a solution for automatically identifying display scenes, identifying commodities and determining the number of display layers and arrangement surfaces, thereby reducing the manpower consumption of traditional manpower calculation. Aiming at the input pictures, besides the basic commodity detection and identification function, a solution for calculating the number of layers and the arrangement of faces is provided, and the problem that the number of layers and the arrangement of faces are calculated still by manual intervention in the traditional commodity identification flow is solved. Through the scene positioning function, whether the input picture comprises a target display scene or not is automatically identified, the target display scene is positioned, and the target commodity in the picture is identified, and meanwhile, the background commodity can be filtered, so that the accuracy of commodity statistics counting in the target display scene is improved. Through supermarket element detection and identification functions, various target detection can be supported.

Continuing with the description below of an exemplary architecture implemented as a software module for an artificial intelligence based display item auditing apparatus 453 provided by embodiments of the present invention, in some embodiments, as shown in FIG. 2, the software modules stored in the artificial intelligence based display item auditing apparatus 453 of the memory 440 may include:

a showcase detection module 4531 configured to determine a plurality of showcase candidate regions included in a picture to be identified, and extract image features of each showcase candidate region;

A showcase localization module 4532 for identifying a type of showcase and a location of the showcase from each showcase candidate region based on the image characteristics of each showcase candidate region;

an item identification module 4533 for performing item identification processing corresponding to the type of the display scene based on the image characteristics of the region corresponding to the position of the display scene, so as to determine the type of the item displayed in the display scene and the position of the item;

and the auditing module 4534 is used for executing auditing operation based on the type and the position of the object in the display scene to obtain the auditing result of the display scene.

In the above-mentioned scheme, the display scene detection module 4531 is configured to:

dividing a picture to be identified into a plurality of subareas;

A display scene locating module 4532 for:

Classifying the display scene candidate regions based on the image characteristics of each display scene candidate region to obtain probabilities of the display scene candidate regions corresponding to a plurality of candidate scene types, and determining the candidate scene type corresponding to the maximum probability as the display scene type included in the display scene candidate regions;

and carrying out bounding box regression processing on the display scene candidate region based on the image characteristics of the display scene candidate region to obtain the positions of bounding boxes of the display scenes included in the display scene candidate region as the positions of the display scenes.

An item identification module 4533 for:

determining a plurality of subareas from the areas corresponding to the positions of the display scene to serve as article candidate areas;

classifying the item candidate region based on the image characteristics of the item candidate region to obtain the types of the items included in the item candidate region, wherein the types of the items correspond to the types of the display scenes;

and carrying out bounding box regression processing on the item candidate region based on the image characteristics of the item candidate region to obtain the position of a bounding box included in the item candidate region as the position of an item included in the item candidate region.

An auditing module 4534 for:

Determining a longitudinal distance between any two items based on an ordinate in a position of each item in the display scene;

the distribution of different types of items in the display layer table is determined based on the types and positions of the items in the display layer as a display layer audit result of the display scene.

The auditing module 4534 is further configured to:

the following processing is performed for each display layer:

determining a lateral distance between any two items in the display layer based on the abscissa in the position of any two items in the display layer;

generating a surface-by-surface table based on the identified surface and the items in the surface;

Based on the types and positions of the articles in the row surface, the distribution condition of the articles of different types in the row surface table is determined to serve as a row surface auditing result of the display scene.

The auditing module 4534 is further configured to:

the following is performed for each type of item in the display scenario:

the type of the material associated with the type of the item and the position of the material are identified from each material candidate area as a material audit result of the display scene.

The auditing module 4534 is further configured to:

determining the total number and position distribution of the different types of articles based on the type and position of each article in the display scene;

comparing the total number and the position distribution situation with the display total number and the distribution situation appointed by different types of articles in the auditing rule to obtain the type of the articles with quantity missing, the quantity missing and the difference of the distribution situation in the display scene, and taking the differences as the missing auditing result of the display scene.

The display scene detection module 4531 is further configured to:

When the picture to be identified comprises a plurality of display scenes and the auditing operation target scene is a part of the display scenes in the plurality of display scenes, determining a bounding box corresponding to the position of the target display scene;

Embodiments of the present invention provide a storage medium having stored therein executable instructions which, when executed by a processor, cause the processor to perform a method provided by embodiments of the present invention, such as an artificial intelligence based display item auditing method as shown in fig. 3.

In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEP ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the application, the types and the positions of the objects are determined by identifying the display scenes in the pictures and the objects in the display scenes, and the automatic auditing of the object display is realized based on the types and the positions of the objects, so that the pressure of the conventional manual auditing is reduced; the full-automatic article auditing capability is realized, only one picture is input, the contents of the articles in the picture can be obtained through background calculation, and the display auditing result is output; the method can detect scene materials such as price tags, card inserting posters, brands and trademarks, and can support the calculation requirements of different business layers; the accuracy of business calculation such as final article counting and space arrangement is enhanced by adopting display scene detection and material detection assistance.

The above is merely an example of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An artificial intelligence based display article auditing method, the method comprising:

the following is performed for each type of item in the display scenario:

Classifying the material candidate region based on the image characteristics of the material candidate region to obtain the types of materials included in the material candidate region;

performing bounding box regression processing on the material candidate region based on the image characteristics of the material candidate region to obtain the position of a bounding box included in the material candidate region as the position of the material included in the material candidate region;

and taking the type of the material and the position of the material as a material auditing result of the display scene, wherein the material auditing result is used for determining whether the position of the material is correct or not and determining whether the type of the object in the imaging area corresponding to the type of the material is correct or not.

2. The method of claim 1, wherein the determining a plurality of showscene candidate regions comprised by the picture to be identified comprises:

dividing the picture to be identified into a plurality of subareas;

The extracting the image feature of each display scene candidate region comprises the following steps:

3. The method of claim 1, wherein identifying the type of showcase and the location of the showcase from the each showcase candidate region based on the image characteristics of the each showcase candidate region comprises:

4. The method according to claim 1, wherein the performing an item identification process corresponding to the type of the display scene based on the image feature of the region corresponding to the position of the display scene to determine the type of the item displayed in the display scene and the position of the item includes:

Determining a plurality of subareas from the areas corresponding to the positions of the display scenes to serve as article candidate areas;

5. The method according to claim 1, wherein the method further comprises:

6. The method of claim 5, wherein the method further comprises:

the following is performed for each of the display layers:

7. The method according to claim 1, wherein the method further comprises:

8. The method according to any one of claims 1 to 7, further comprising:

9. An artificial intelligence based display article auditing apparatus, comprising:

an audit module for performing the following operations for each type of item in the display scenario: determining a corresponding imaging area in the picture to be identified based on the position of the object, and determining a plurality of material candidate areas corresponding to the type of the display scene from the imaging area; classifying the material candidate region based on the image characteristics of the material candidate region to obtain the types of materials included in the material candidate region; performing bounding box regression processing on the material candidate region based on the image characteristics of the material candidate region to obtain the position of a bounding box included in the material candidate region as the position of the material included in the material candidate region; and taking the type of the material and the position of the material as a material auditing result of the display scene, wherein the material auditing result is used for determining whether the position of the material is correct or not and determining whether the type of the object in the imaging area corresponding to the type of the material is correct or not.

10. An electronic device, the electronic device comprising:

a memory for storing computer executable instructions or computer programs;

a processor for implementing the artificial intelligence based display article auditing method of any one of claims 1 to 8 when executing computer executable instructions or computer programs stored in the memory.

11. A computer readable storage medium storing computer executable instructions or a computer program, wherein the computer executable instructions or computer program when executed by a processor implement the artificial intelligence based display item auditing method of any of claims 1 to 8.