CN111507253A

CN111507253A - Method and device for auditing displayed articles based on artificial intelligence

Info

Publication number: CN111507253A
Application number: CN202010300775.6A
Authority: CN
Inventors: 郭卉; 黄飞跃; 袁豪磊; 郭晓威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-07
Anticipated expiration: 2040-04-16
Also published as: CN111507253B

Abstract

The invention provides a displayed article auditing method and device based on artificial intelligence, electronic equipment and a computer-readable storage medium; the method comprises the following steps: determining a plurality of exhibition scene candidate areas included in the picture to be identified, and extracting the image characteristics of each exhibition scene candidate area; identifying the type of the display scene and the position of the display scene from each display scene candidate area based on the image characteristics of each display scene candidate area; performing item identification processing corresponding to the type of the display scene based on the image features of the area corresponding to the position of the display scene to determine the type and position of the items displayed in the display scene; and performing auditing operation based on the type and the position of the goods in the display scene to obtain an auditing result of the display scene. By the method and the device, the articles in the picture can be accurately identified, so that the identified articles can be automatically checked.

Description

Method and device for auditing displayed articles based on artificial intelligence

Technical Field

The invention relates to an artificial intelligence technology, in particular to a displayed article auditing method and device based on artificial intelligence, electronic equipment and a computer readable storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

Computer vision technology is an important technical content in artificial intelligence software technology, and has been rapidly developed in recent years, image recognition technology is an important branch in computer vision technology, and various types of pictures can be recognized through image recognition technology, such as face recognition, animal and plant variety recognition, article recognition and the like, but these recognition are all based on the premise that the scene of the picture is known or the scene of the picture has no influence on the recognition target object, and when the scene has influence on the recognized target object, it is difficult to accurately recognize the scene to which the picture belongs and the target object in the scene at the same time.

Disclosure of Invention

The embodiment of the invention provides a displayed article auditing method and device based on artificial intelligence, electronic equipment and a computer readable storage medium, which can accurately identify articles with pictures and automatically audit the identified articles.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a displayed article auditing method based on artificial intelligence, which comprises the following steps:

determining a plurality of exhibition scene candidate areas included in the picture to be identified, and extracting the image characteristics of each exhibition scene candidate area;

identifying the type of the display scene and the position of the display scene from each display scene candidate area based on the image characteristics of the display scene candidate area;

performing item identification processing corresponding to the type of the display scene based on the image features of the area corresponding to the position of the display scene to determine the type of the items displayed in the display scene and the position of the items;

and performing auditing operation based on the type and the position of the goods in the display scene to obtain an auditing result of the display scene.

The embodiment of the invention provides a displayed article auditing device based on artificial intelligence, which comprises:

the display scene detection module is used for determining a plurality of display scene candidate areas included in the picture to be identified and extracting the image characteristics of each display scene candidate area;

the display scene positioning module is used for identifying the type of the display scene and the position of the display scene from each display scene candidate area based on the image characteristics of each display scene candidate area;

an item identification module, configured to perform item identification processing corresponding to the type of the display scene based on an image feature of an area corresponding to the position of the display scene, so as to determine the type of an item displayed in the display scene and the position of the item;

and the auditing module is used for executing auditing operation based on the type and the position of the articles in the display scene to obtain an auditing result of the display scene.

In the foregoing solution, the display scene detection module is configured to:

dividing the picture to be identified into a plurality of sub-regions;

determining a plurality of groups of sub-regions of the plurality of sub-regions that satisfy a similarity condition for at least one dimension, wherein the type of the dimension includes size, texture, edge, and color;

combining each group of sub-regions in the multiple groups of sub-regions respectively to correspondingly obtain multiple display scene candidate regions;

and performing convolution operation on each display scene candidate area to obtain image characteristics including at least one of texture, edges and color.

The display scene positioning module is configured to:

classifying the exhibition scene candidate regions based on the image features of each exhibition scene candidate region to obtain the probabilities of the exhibition scene candidate regions corresponding to a plurality of candidate scene types, and determining the candidate scene type corresponding to the maximum probability as the type of the exhibition scene included in the exhibition scene candidate regions;

performing bounding box regression processing on the exhibition scene candidate region based on the image features of the exhibition scene candidate region to obtain the position of a bounding box of the exhibition scene included in the exhibition scene candidate region as the position of the exhibition scene.

The item identification module is to: determining a plurality of sub-areas from the area corresponding to the position of the display scene to be used as item candidate areas;

classifying the item candidate area based on the image characteristics of the item candidate area to obtain the type of the item included in the item candidate area, wherein the type of the item corresponds to the type of the display scene;

performing bounding box regression processing on the item candidate region based on the image features of the item candidate region to obtain the position of a bounding box included in the item candidate region, wherein the position is used as the position of the item included in the item candidate region.

The auditing module is used for:

determining a longitudinal distance between any two items based on a vertical coordinate in a location of each item in the display scene;

identifying any two items whose longitudinal distance does not exceed a longitudinal distance threshold as being in the same display level, and identifying any two items whose longitudinal distance is greater than the longitudinal distance threshold as being in different display levels;

generating a display layer table based on the identified display layers and items in the display layers;

and determining the distribution of different types of articles in the display layer table based on the types and positions of the articles in the display layer as a display layer auditing result of the display scene.

The auditing module is further configured to:

performing the following for each of the display layers:

determining a lateral distance between any two items in the display layer based on the abscissa in the position of any two items in the display layer;

identifying any two items in the display layer as being in the same row when the lateral distance does not exceed a lateral distance threshold, and identifying any two items in the display layer as being in a different row when the lateral distance does not exceed a lateral distance threshold;

generating a listing table based on the identified listing and the items in the listing;

and determining the distribution condition of different types of articles in the ranking table based on the types and the positions of the articles in the ranking table to serve as a ranking auditing result of the display scene.

The auditing module is further configured to:

performing the following for each type of item in the display scenario:

determining a corresponding imaging area in the picture to be identified based on the position of the article, and determining a plurality of material candidate areas corresponding to the type of the display scene from the imaging area;

and identifying the type of the material associated with the type of the item and the position of the material from each material candidate area as a material auditing result of the display scene.

The auditing module is further configured to:

determining the total number of the items of different types and the position distribution condition based on the type and the position of each item in the display scene;

and comparing the total number and the position distribution condition with the display total number and the distribution condition specified for different types of articles in the auditing rule to obtain the differences of the types, the missing quantity and the distribution condition of the articles with missing quantity in the display scene to serve as the missing and missing auditing result of the display scene.

The display scene detection module is further configured to:

when the picture to be identified comprises a plurality of display scenes and the auditing operation target scene is a partial display scene in the display scenes, determining a bounding box corresponding to the position of the target display scene;

and filtering out the articles which are not in the surrounding frame from the articles identified by the picture to be identified, and taking the remaining articles after filtering as the articles for executing the auditing operation.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the displayed article auditing method based on artificial intelligence provided by the embodiment of the invention when executing the executable instructions stored in the memory.

The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the method for auditing displayed goods based on artificial intelligence provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

by dividing the article identification into the display scene identification and the article identification based on the display scene, diversified scenes can be compatible for efficient and accurate identification, so that the efficiency and accuracy of article auditing are ensured.

Drawings

Fig. 1A is a schematic diagram of an architecture of an automated audit system for displayed goods according to an embodiment of the present invention;

fig. 1B is another schematic structural diagram of an automated audit system for displayed goods according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a server according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram of a method for auditing displayed items based on artificial intelligence, according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of a method for auditing displayed items based on artificial intelligence, according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a module of an automatic audit device for displayed goods according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a commodity audit provided by an embodiment of the present invention;

fig. 7 is a schematic diagram of a commodity audit process according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Display scene: the environment in which the items are placed, such as a supermarket shelf, refrigerator, counter, etc., may include several different types of items in the display scenario.

2) Material: the specific type of articles in the display scene is different from the materials in different display scenes, and the specific information of the articles corresponding to the materials can be determined according to the information displayed in the materials, and the representation forms of the articles can be price tags, plug-in cards, posters, brand trademarks and the like.

The current article detection system can identify articles at different positions in an input image of a specific scene, but the identification is based on the specific scene, and cannot identify the scene, so that when the scene of the article changes, the identification requirement changes correspondingly, if the identification method is still used, the identification result has a large difference from the actual situation, and because the identification mode can only identify the articles of a specific type, and does not consider the practical constraints outside the articles, such as display shelves or counters, the problem of target misjudgment or background misrecall is easy to occur, such as identifying the articles outside the shelf range as target articles. Moreover, such item detection systems also fail to identify materials, such as tags, cards, etc., in the display environment that are closely related to the target item.

To solve the above problems, an embodiment of the present invention provides an automatic audit system for displayed goods, which, for a photographed goods display picture or video, implements recognition of a display scene, goods recognition and audit based on goods recognition results based on the bottom layer capabilities of display scene positioning, goods detection, material detection, and the like, and the audit function includes: and the business capabilities of counting articles, displaying and auditing, checking missing goods and the like.

Referring to fig. 1A, fig. 1A is a schematic diagram of an architecture of an automated audit system 100 for displayed goods according to an embodiment of the present invention, where the automated audit system 100 for displayed goods includes: the terminal comprises a server 200, a network 300 and a terminal 400, wherein the server 200 is connected with the terminal 400 through the network 300, and the network 300 can be a wide area network or a local area network, or a combination of the two. The automatic audit system 100 for displayed goods is used for identifying scenes in pictures or videos and goods in the scenes, auditing the arrangement, quantity and other aspects of the goods based on the types and positions of the identified goods, and finally sending the audit result to the terminal 400 through the network 300. The automatic audit system 100 for displayed goods can audit goods in various scenes, such as a supermarket scene, a bookstore scene, a warehouse scene, a shop shelf scene, and the like.

The method for auditing the displayed goods based on artificial intelligence provided by the embodiment of the invention can be realized through the following processes: firstly, a plurality of network services are deployed in advance in a server 200, then a terminal 400 calls a camera to obtain a photo or a video, the terminal 400 sends an audit request and a data packet to the server 200 through a network 300, the data packet comprises the photo or the video obtained by the terminal, after receiving the audit request and the data packet, the server 200 searches for a network service (audit service) adapted to the audit request from the plurality of network services deployed in advance, the photo or the video is identified and audited through the network service to obtain an audit result, and finally, the audit result is returned to the terminal 400.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CD N, big data and an artificial intelligence platform. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart camera, and the like, but is not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In the embodiment of the invention, the auditing service provided by the server can be packaged into the cloud service, so that a user can audit the articles displayed in the pictures or videos, and the user logs in the cloud service in the form of a browser/cloud client and the like to submit the auditing request and the pictures (videos); the cloud service responds to the audit request, identifies the display scene in the picture (video), further identifies the articles in the display scene based on the identified display scene, finally audits the identified articles to obtain an audit result, and sends the audit report to the user, and the user can determine whether the articles in the display scene are displayed according to a preset rule according to the audit result.

Specifically, Cloud technology (Cloud technology) refers to a hosting technology for unifying resources of hardware, software, network, and the like in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data.

Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

In other embodiments, the method for auditing displayed items based on artificial intelligence provided by the embodiments of the present invention may also be implemented in combination with a block chain technique.

Referring to fig. 1B, fig. 1B is a schematic diagram of another architecture of an automated audit system 100 for displayed goods according to an embodiment of the present invention. Wherein, the automatic auditing system 100 of the displayed goods includes: server 200, network 300, terminal 400, and blockchain network 500 (the blockchain network 500 is illustratively shown to include node 510-1, node 510-2, and node 510-3). The block chain network 500 is configured to receive the audit results sent by the terminals 400, and perform comprehensive analysis on the audit results sent by each terminal to determine that the target display items all display compliance in the same display scene in different places.

The method for auditing the displayed goods based on artificial intelligence provided by the embodiment of the invention can be realized by the following modes:

firstly, a plurality of network services are deployed in advance in a server 200, in order to determine whether a target display article is in compliance in the same display scene of a plurality of places, a camera is called by terminals 400 of the plurality of places to obtain a photo or a video including the target display article, the plurality of terminals 400 send an audit request and a data packet to the server 200 through a network 300, the data packet includes the photo or the video obtained by the terminal, after the server 200 receives the audit request and the data packet from the plurality of terminals, a network service (audit service) adapted to the audit request is searched in the plurality of network services deployed in advance, the display scene in the picture or the video is identified through the network service, the article in the display scene is further identified based on the identified display scene, and finally the identified article is audited to obtain a plurality of audit results, and sending the plurality of audit results to the corresponding terminals 400, then sending the plurality of audit results to the blockchain network 500 through the network 300 by the plurality of terminals 400, and after receiving the plurality of audit results sent by the terminals 400, the blockchain network 500 integrating the plurality of audit results to determine the display compliance of the target displayed item in the same display scene of a plurality of places.

Continuing to describe the server 200 shown in fig. 1A-1B, referring to fig. 2, fig. 2 is a schematic structural diagram of the server 200 according to an embodiment of the present invention, where the server 200 shown in fig. 2 includes: at least one processor 410, memory 440, at least one network interface 420. The various components in server 200 are coupled together by a bus system 430. It is understood that the bus system 430 is used to enable connected communication between these components. The bus system 430 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 430 in fig. 2.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 440 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 440 optionally includes one or more storage devices physically located remote from processor 410.

Memory 440 includes volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 440 described in connection with embodiments of the present invention is intended to comprise any suitable type of memory.

In some embodiments, memory 440 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 441 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 442 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiF i), and Universal Serial Bus (USB), etc.;

in some embodiments, the apparatus provided by the embodiments of the present invention may be implemented in software, and fig. 2 shows an artificial intelligence based displayed item auditing apparatus 453 stored in the memory 440, which may be software in the form of programs and plug-ins, etc., and includes the following software modules: a display scene detection module 4531, a display scene location module 4532, an item identification module 4533, and an audit module 4534, which are logical and thus may be arbitrarily combined or further separated depending on the functions implemented. The functions of the respective modules will be explained below.

In other embodiments, the display item auditing Device provided by the embodiments of the present invention may be implemented in hardware, and by way of example, the display item auditing Device provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the methods provided by the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may be implemented as one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable logic devices (P L D, Programmable L logic devices), Complex Programmable logic devices (CP L D, Complex Programmable L logic devices), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The method for auditing displayed goods based on artificial intelligence provided by the embodiment of the invention will be described in connection with the exemplary application and implementation of the system for auditing displayed goods provided by the embodiment of the invention.

Referring to fig. 3, fig. 3 is a schematic flow chart of an artificial intelligence-based displayed item auditing method according to an embodiment of the present invention, which will be described with reference to the steps shown in fig. 3.

In step 101, the server determines a plurality of exhibition scene candidate regions included in the picture to be recognized, and extracts an image feature of each exhibition scene candidate region.

When the picture to be recognized is a plurality of pictures, the server sequentially determines a plurality of display scene candidate areas in each picture to be recognized and extracts image features. The form of the picture to be recognized can be a photo, a video frame obtained by decoding a video, or other forms. Each picture to be recognized at least comprises one display scene candidate area, for example, if the picture is a picture of a refrigerator, only one display scene of the refrigerator is provided, and for example, if the picture of a supermarket comprises a plurality of side-by-side shelves, the side-by-side shelves form a plurality of display scene candidate areas.

Referring to fig. 4, fig. 4 is a schematic flow chart of a displayed item auditing method based on artificial intelligence according to an embodiment of the present invention, where a picture to be identified is sent to a server by a terminal, the terminal calls a camera to take a picture of a target scene after receiving an image acquisition request to obtain a picture or a video, and sends the picture or the video to the server in the form of a data packet together with the auditing request, and the server processes the data packet after receiving the auditing request and the data packet to obtain the picture to be identified.

In some embodiments, the server determines a plurality of exhibition scene candidate regions included in the picture to be identified, and extracts an image feature of each exhibition scene candidate region, which may be implemented as follows:

the method comprises the steps that a server divides a picture to be identified into a plurality of sub-areas; the server determines a plurality of groups of sub-regions which meet a similarity condition of at least one dimension in the plurality of sub-regions, wherein the type of the dimension comprises size, texture, edge and color; the server combines each group of sub-areas to obtain a plurality of display scene candidate areas; and the server performs convolution operation on each exhibition scene candidate area through a convolution layer in the exhibition scene detection model to obtain image characteristics including at least one of texture, edges and color.

Parameters in similar conditions such as size, texture, edge and color are closely related to a certain type of display scene, so that the display scene candidate region can be obtained by screening multiple groups of sub-regions according to the similar conditions of at least one dimension such as size, texture, edge and color. For example, if a supermarket scene needs to be identified, the parameters of the similar conditions are related to the supermarket scene. If the picture to be identified is a supermarket scene, a supermarket scene candidate area can be obtained by identifying the picture; if the picture to be identified is not a supermarket scene, the plurality of groups of sub-regions of the picture cannot meet the similar conditions, and a supermarket scene candidate region cannot be obtained, so that the picture is filtered.

A plurality of display scene candidate areas can be determined through an input layer in a trained display scene detection model, and because certain similarity and connectivity exist in each area in a picture, areas containing specific articles and scenes can be selected and combined to obtain a proposal box based on the property of the areas, namely the display scene candidate areas.

Firstly, a picture to be identified is segmented to obtain a plurality of sub-regions. Then, a plurality of similar sub-regions in the picture are used as a group of sub-regions, so that a plurality of groups of similar sub-regions can be obtained, wherein similarity refers to that the similarity of at least one dimension of the dimensions of size, texture, edge, color and the like of two or more regions is larger than a similarity threshold value. And thirdly, performing circumscribed rectangle on each group of the subareas, namely combining each group of the subareas to obtain a plurality of display scene candidate areas (proposal boxes). And finally, scaling the display scene candidate area to a specified size, and extracting the image characteristics of the display scene candidate area with the specified size through a convolutional layer in a trained display scene detection model, wherein the display scene detection model is used for identifying the display scene in the picture to determine the type and the position of the display scene, and the model is a deep learning model and comprises an input layer, a convolutional layer, a full-link layer and an output layer.

The training process of the display scene detection model is as follows: the server is in communication connection with the cloud big data storage center, can apply for calling data to the cloud big data storage center in real time, trains the display scene detection model by calling sample data related to the display scene, and adjusts model parameters (learning rate, iteration times, batch size and the like) of the display scene detection model according to a training result to obtain the trained display scene detection model. The method comprises the steps of obtaining sample data, analyzing the sample data, and obtaining a training sample and a testing sample, wherein the sample data comprises the training sample and the testing sample, the sample data is divided into the training sample and the testing sample, the display scene detection model is trained through the training sample, and the model parameters are adjusted through the testing sample to detect whether the display scene detection model is accurate and adaptive. The training samples comprise positive sample data and negative sample data, the positive sample data accounts for 75% of the whole sample data, the positive sample data is related data of each display scene, and the negative sample data is data unrelated to the display scenes.

In some embodiments, a Region candidate network (RPN) may also be utilized to generate a plurality of exposure scene candidate regions.

Therefore, by adopting a sub-region merging strategy, suspected object frames with various sizes can be determined, so that sub-regions possibly belonging to display scene candidate regions are not omitted, the diversity of dimensions of similar regions is merged, and the region merging precision is improved.

In step 102, the server identifies the type of the display scene and the position of the display scene from each display scene candidate area based on the image features of each display scene candidate area.

The preferred processing method of step 102 can be seen in step S404 in fig. 4. The display scene detection model classifies the display scene candidate regions through Support Vector Machine (SVM) or softmax logistic regression based on the image characteristics of each display scene candidate region to obtain the probability that the display scene candidate regions correspond to a plurality of preset types of candidate scenes, for example, the preset types of the display scene candidate scenes include four types of shelves, refrigerators, counters, and bookshelves, then the probability that each display scene candidate area is a shelf scene, a refrigerator scene, a counter scene, and a bookshelf scene is determined respectively, and determines the candidate scene type corresponding to the maximum probability (which needs to be greater than the scene probability threshold) as the type of the exhibition scene included in the exhibition scene candidate region, if the maximum probability is smaller than the scene probability threshold, the picture to be recognized does not include any one of the candidate scenes in the preset type, and the picture does not need to be recognized continuously.

After the type of the display scene is determined, performing bounding box regression processing on the display scene candidate region based on the image features of the display scene candidate region to obtain the position of a bounding box of the display scene included in the display scene candidate region as the position of the display scene. In the embodiment of the present invention, the representation form of the position of the display scene may be a diagonal coordinate of the enclosure frame (the outer frame of the area corresponding to the position of the display scene), such as an upper left corner coordinate and a lower right corner coordinate of the enclosure frame, or may be a center coordinate of the enclosure frame and a width and a height of the enclosure frame. After the type and position of the display scene are identified, the type and position of the display scene in the display scene candidate area are marked in the picture to be identified.

The principle and process of bounding box regression processing is as follows: after the sub-regions are merged, the predicted display scene candidate region is obtained, and the position parameter of the predicted display scene candidate region is obtained, because the predicted display scene candidate region may not be accurately located, which may result in the entire item/scene not being accurately surrounded, the position of the predicted display scene candidate region needs to be adjusted. In some embodiments, a frame regression method may be adopted to process the predicted display scene candidate region, learn the predicted display scene candidate region and the actual display scene candidate region in sample data, and obtain each target parameter with the smallest loss function by a gradient descent method, a least square method, or the like, where the loss function is a sum of differences between each target function and corresponding position parameters of the actual display scene candidate region, and the target function is a function related to the position parameters of the predicted display scene candidate region; then, the feature vector of the predicted exhibition scene candidate area is extracted to obtain the position parameter of the predicted exhibition scene candidate area, and the translation operation and the scale scaling operation are carried out on the predicted exhibition scene candidate area based on the target parameter and the predicted position parameter of the exhibition scene candidate area to obtain the actual exhibition scene candidate area, namely the position of the exhibition scene.

Therefore, the type of the display scene is identified based on the image characteristics of the display scene candidate area, the conditions that the display scene does not accord with the preset type and the non-display scene can be eliminated, and the accuracy of type identification of the display scene is improved; the predicted display scene candidate area is finely adjusted through a frame regression method, so that the accuracy of the position of the display scene can be improved.

In step 103, the server performs item recognition processing corresponding to the type of the display scene based on the image feature of the area corresponding to the position of the display scene to determine the type of the item displayed in the display scene and the position of the item.

In one possible example, step 103 may be implemented by the following sub-steps:

the server determines a plurality of sub-areas from the areas corresponding to the positions of the display scenes to serve as item candidate areas; the server classifies the item candidate area based on the image characteristics of the item candidate area to obtain the type of an item included in the item candidate area, wherein the type of the item corresponds to the type of the display scene; and the server performs bounding box regression processing on the item candidate region based on the image characteristics of the item candidate region to obtain the position of a bounding box included in the item candidate region, wherein the position is used as the position of the item included in the item candidate region. The server determines a plurality of candidate regions as article candidate regions through an input layer in a trained article detection model, specifically, the article candidate regions may be determined by selecting a search method, or the candidate regions may be determined by a sliding window or a rule block method, which is not limited in the embodiment of the present invention. The following describes the determination of the item candidate region by the selection search method. In step 102, after the position of the display scene is identified, the position of the display scene is marked, that is, a bounding box of a region corresponding to the position of the display scene is marked, the server divides the region inside the bounding box into a plurality of sub-regions, continuously merges the sub-regions according to the similarity between the sub-regions, and performs a circumscribed rectangle on the merged sub-regions, so as to obtain a plurality of circumscribed rectangles, that is, a plurality of article candidate regions. The training method of the article detection model is similar to the training method of the display scene detection model, and the training samples used for training the article detection model also include positive sample data and negative sample data, the positive sample data being pictures of a plurality of types of articles corresponding to the type of the display scene, and the negative sample data being pictures not including the articles and pictures of the articles not corresponding to the type of the display scene. For example, when the display scene is a shelf, the positive sample data is a picture of various types of goods involved in the manifest, and the negative sample data is a picture unrelated thereto; when the display scene is a refrigerator, the positive sample data is a picture of various types of goods involved in the cold-stored/frozen goods list, and the negative sample data is a picture irrelevant thereto.

After determining the article candidate area, the server extracts the image characteristics of the article candidate area through the convolution layer in the trained article detection model, then classifying the item candidate region by a support vector machine or softmax logistic regression based on the image characteristics of the item candidate region to obtain the probability that the item candidate region corresponds to a plurality of items corresponding to the type of the display scene, and determining the type of the article corresponding to the maximum probability (which needs to be greater than the article probability threshold) as the type of the article corresponding to the article candidate region, for example, the types of the items corresponding to the supermarket display scene include four types, namely milk, chocolate, pencil and eraser, the probability that the items corresponding to each item candidate area are milk, chocolate, pencil and eraser is determined, and the item type with the highest probability and higher than an item probability threshold value is determined as the corresponding item type in the item candidate area of the supermarket display scene. If the maximum probability is less than the item probability threshold, which indicates that any item corresponding to the type of the display scene is not included in the item candidate region, the picture including the item candidate region does not need to be continuously identified. The method for determining the position of the article is similar to the method for determining the position of the display scene, and reference may be made to the description in step 2, which is not repeated herein.

In step 104, the server performs an audit operation based on the type and location of the item in the display scene, resulting in an audit result of the display scene.

The preferred processing procedure of the audit operation can refer to steps S406 to S408 in fig. 4, and after the audit result of the display scene is obtained, the audit result is packaged to generate an audit data packet, and the audit data packet is sent to the terminal. The server may perform the following audits on the item: spatial arrangement auditing, material auditing and omission auditing. The spatial arrangement audit can comprise display layer audit and arrangement audit, namely the spatial arrangement audit can simultaneously audit the display layer and the arrangement of the articles, and can also audit the arrangement mode of the overall position of the articles in a display scene or the position distribution of different types of articles in the space; the material auditing is to audit the position and the type of the material so as to determine whether the position of the material is correct or not, or determine whether the type of the article in the article imaging area corresponding to the material is correct or not; the missing audit will simultaneously audit the number and location of items of a particular type in the display scenario to determine missing items and to determine items whose display location and type are not satisfactory.

In one possible example, when the picture to be identified comprises a plurality of display scenes and the auditing operation target display scene is a partial display scene in the plurality of display scenes, determining a bounding box corresponding to the position of the target display scene;

and filtering out the articles which are not in the surrounding frame from the articles identified by the picture to be identified, and taking the remaining articles after filtering as the articles for performing the auditing operation.

It should be noted that, in the embodiment of the present invention, only a specific one or more target display scenes in the picture to be recognized may be recognized, or all display scenes may be recognized, for example, there are 3 rows of shelves in one picture to be recognized, and 3 rows of shelves may be recognized, so as to further check the items, or only the items on one of the 3 rows of shelves may be recognized and checked, at this time, only the one row of shelves is the target display scene, and therefore, only the items in the one row of shelves are in the enclosure frame. After the position of the target display scene is determined, all the articles can be identified in the whole picture to be identified, and then the articles in the bounding box corresponding to the position of the target display scene are selected from the identified articles, or only the articles in the bounding box corresponding to the position of the target display scene can be identified and used as the articles for performing the auditing operation.

Therefore, by filtering the articles which are not in the surrounding frame, the article detection result can be optimized, the detected articles are in the same scene, and the confusion between the background articles and the target articles is avoided.

In one possible example, the server performs an audit operation based on the type and location of the item in the display scenario, and obtains an audit result of the display scenario, which may be implemented as follows: determining a longitudinal distance between any two items based on a vertical coordinate in a position of each item in the display scene; identifying any two items whose longitudinal distance does not exceed the longitudinal distance threshold as being in the same display level, and identifying any two items whose longitudinal distance is greater than the longitudinal distance threshold as being in different display levels; generating a display layer table based on the identified display layer and the items in the display layer; the distribution of different types of items in the display layer table is determined based on the type of items in the display layer as a result of a display layer audit of the display scene.

The center coordinate of each article and the width and height of the bounding box where the article is located can be determined according to the position of each article, and the center coordinate comprises an abscissa and an ordinate. The transverse direction is the direction parallel to the display layer and the longitudinal direction is the direction perpendicular to the display layer. The longitudinal distance is the absolute difference of the ordinates of the two articles and the transverse distance is the absolute difference of the abscissas of the two articles.

It should be noted that if any two items whose lateral distance does not exceed the lateral distance threshold are identified as being on the same display level, this may result in two items in the same column and on different levels being misidentified as being on the same level; if any two items whose linear distance does not exceed the linear distance threshold are identified as being on the same display floor, two items in the same row and two adjacent floors may be erroneously identified as being on the same floor when the floor height is small. Because the size of the articles in the same layer generally has a small difference and the longitudinal distance is small, the longitudinal distance threshold value is used as a standard for measuring whether two articles are in the same layer, so that the reliability is high, and the accuracy of the display layer table generated according to the measurement is high.

In some embodiments, the distribution range may be divided into a plurality of longitudinal sections according to the distribution range of the ordinate of the positions of all the items in the display scene, a longitudinal distance threshold is determined according to the longitudinal coordinate sections, each longitudinal section represents a floor, and when the ordinate between any two items is in the same longitudinal section, that is, the ordinate distance between the two items does not exceed the longitudinal distance threshold, the two items are considered to be in the same floor. For example, the longitudinal coordinates of the article are: 1. 1, 2, 5, 7, 10, 11, 12, then three longitudinal intervals can be obtained according to the nine longitudinal coordinates: (1, 1, 2), (5, 5, 7), (10, 11, 12), it may be determined that the longitudinal distance threshold is 3, and when the longitudinal distance of two articles is equal to or greater than 3, they are considered to belong to different layers, for example, when it is known that the ordinates of two articles are 5 and 6, respectively, since the longitudinal distance is 1 is less than the longitudinal distance threshold 3, and since 5 and 6 belong to the same longitudinal section, they belong to the same layer.

After determining the display level to which each item belongs, a display level table may be generated from the individual items in the known display levels, one row in the display level table being: layer number 1, article 2, article 3. The dimension of the type of the items is not added in the display layer table, so that the distribution of different types of items in the display layer table needs to be determined according to the type of each item, such as: layer number 1, article 1-cola, article 2-mineral water, article 3-chocolate.

Therefore, the information of the dimension of the type of the article is added into the display list, and the distribution condition of the article is recorded, so that the arrangement of the article can be known, and the specific distribution of different types of articles in the arrangement can be intuitively perceived.

In some embodiments, the display layer table may also be generated by: determining the ordering of the ith item in the vertical space based on the position and the type of the item until the ordering of all items in the vertical space is determined, and generating a display layer table based on the ordering of all items in the vertical space, wherein i is a positive integer; wherein, the order of the ith item in the vertical space is determined based on the position and the type of the item, and the method can be realized by adopting the following modes: determining at least one item, the difference value of which with the abscissa of the ith item in the picture to be identified is smaller than a first threshold value, from the items with the determined vertical spatial ordering; determining an article of the at least one article, wherein the difference value of the vertical coordinate of the article in the picture to be identified and the vertical coordinate of the ith article is smaller than a second threshold value; the order of the items in the vertical space is taken as the order of the ith item in the vertical space.

The method for generating the display layer table ensures that the horizontal distance between two articles is small enough by ensuring that the difference of the abscissas between the two articles is smaller than a first threshold value, a second threshold value is used for ensuring the same layer of the two articles in a vertical space, and the articles which are at the same layer and are adjacent or close to the ith article in the transverse direction are obtained by screening through the screening condition that the difference of the ordinates is smaller than the second threshold value, so that the sequencing (the number of layers) of the ith article in the vertical space can be determined according to the sequencing (the number of layers) of the ith article in the vertical space.

Wherein, the order of the ith item in the vertical space is determined based on the position and the type of the item, which can also be realized by adopting the following mode: determining a target article which has the smallest difference value with the ordinate of the ith article in the picture to be identified and has the ordinate difference value smaller than a second threshold value from the articles with the determined vertical spatial ordering; and determining the sequence of the ith target item in the vertical space according to the sequence of the target item in the vertical space.

Wherein, because the vertical coordinates of different articles may be different, even if the articles in the same layer are different in type and size, and the vertical coordinates are different, by finding the target article with the smallest difference value with the vertical coordinate of the ith article and the difference value of the vertical coordinates smaller than the second threshold value, it can be determined that the target article is in the same layer as the ith article, and therefore, the rank (number of layers) of the ith target article in the vertical space can be determined according to the rank (number of layers) of the target article in the vertical space.

In one possible example, after obtaining the display layer table, the following is performed for the items in each display layer: determining a transverse distance between any two articles according to the abscissa in the positions of any two articles; identifying any two articles of which the transverse distance does not exceed the transverse distance threshold value as being on the same row surface, and identifying any two articles of which the transverse distance is greater than the transverse distance threshold value as different row surfaces; generating a listing table based on the identified listing and the items in the listing; and determining the distribution of different types of articles in the ranking table based on the types of the articles in the ranking table as a ranking audit result of the display scene.

The row planes of the articles with the horizontal and vertical coordinate difference values within a certain range are considered to be the same, the general types of the articles are the same, if the articles are instant noodles placed from inside to outside in a compartment on a shelf, the instant noodles are considered to belong to the same row plane. Upon identifying the various rows and the items included in the rows, a row table for each display layer may be generated, the contents of the row table in one display layer being such as: noodle 1, article 2, article 3. The dimension of the type of the article is not added in the ranking table, so that the distribution of different types of articles in the display layer table needs to be determined according to the type of each article, such as: noodle 1, item 1-cola, item 2-cola, item 3-cola.

Therefore, the information of the dimension of the type of the article is added into the arrangement surface, the distribution condition of the article is recorded, the arrangement of the article can be known, and the specific distribution of different types of articles in the arrangement can be intuitively perceived.

In some embodiments, the ranking table may also be generated according to the vertical spatial ranking table in the foregoing embodiments, and the specific steps are as follows: sorting at least one article corresponding to each serial number in the vertical spatial sorting table according to the size of the abscissa to obtain an article sequence table corresponding to each serial number; and re-determining the sequence of the jth item in the item sequence list until the sequence of each item in the item sequence list is re-determined, and generating a list according to the sequence of each item in the item sequence list, wherein j is a positive integer greater than or equal to 2. In some embodiments, the determining the rank of the jth item in the item sequence list again may be implemented as follows: if the article sequence table comprises n articles, determining a horizontal coordinate difference value between the jth article and the jth-1 article in the article sequence table, wherein n is a positive integer greater than or equal to 2; if the difference value of the horizontal coordinates is smaller than a third threshold value, the sequence from the jth article to the nth article is reduced by one; and if the horizontal coordinate difference value is not smaller than the third threshold value, maintaining the sequencing of the n articles unchanged.

Wherein, the serial number in the vertical spatial sorting table represents the number of layers, and an article sequence table is obtained for each layer according to the abscissa of the article, for example: the article sequence table of the first layer is: the method comprises the following steps that 1, 2, 3, 4 and 5 are arranged in the sequence of 5 articles, the sequence of the 5 articles is 1, 2, 3, 4 and 5 from left to right, the sequence of the article 2 is determined again, the horizontal coordinate difference value a between the article 1 and the article 2 is determined, if a is smaller than a third threshold value, the sequence of the article 2, 3, 4 and 5 is reduced by 1, and then the article sequence table is as follows: (article 1, article 2), article 3, article 4, and article 5, the 5 articles are ordered from left to right as 1, 2, and 3, that is, article 1 and article 2 are in the same row. And then, continuously re-determining the sequence of the subsequent articles until the sequence of all the articles in the article sequence list is re-determined, obtaining the article sequence list of each layer, and generating the sequence list according to the article sequence list of each layer on the basis of determining the type of each article.

In one possible example, the server performs an audit operation based on the type and location of the item in the display scenario, and obtains an audit result of the display scenario, which may be implemented as follows: the following operations are performed for each type of item in the display scenario: determining a corresponding imaging area in the picture to be identified based on the position of the article, and determining a plurality of material candidate areas corresponding to the type of the display scene from the imaging area; the type of material associated with the type of item and the location of the material are identified from each material candidate area as a material review result for the display scene.

The materials in different scenes are different, for example, in a bookshelf scene, the materials can be bookmarks, banners, labels and the like, in a supermarket scene, the materials can be price tags, card-inserted posters, brand trademarks and the like, the positions of the materials correspond to the positions of the corresponding articles one by one, the materials comprise information of the corresponding articles, for example, the material content is mineral water, 550m L and 1.5 ￥.

In some embodiments, identifying the type of material associated with the type of the item and the location of the material from each material candidate region may be accomplished as follows: classifying the material candidate region based on the image characteristics of the material candidate region to obtain the type of the material included in the material candidate region; performing bounding box regression processing on the material candidate region based on the image characteristics of the material candidate region to obtain the positions of bounding boxes included in the material candidate region, wherein the positions are used as the positions of materials included in the material candidate region; the details of the determination of the material type and position may refer to the description of the determination section of the type and position of the exhibition scene in the foregoing embodiment.

Therefore, the method and the device have the function of detecting materials except for the target article, and can further check the article based on the specific information of the detected materials so as to improve the checking accuracy.

In one possible example, the auditing operation is executed based on the type and position of the item in the display scene, and the auditing result of the display scene can be obtained by adopting the following modes: determining the total number of different types of items and the distribution situation of the positions of the items based on the type and the position of each item in the display scene; and comparing the total number and the position distribution condition with the display total number and the distribution condition specified for different types of articles in the auditing rule to obtain the differences of the types, the missing quantity and the distribution condition of the articles with missing quantity in the display scene to serve as the missing auditing result of the display scene.

For example, a supermarket cooperates with a mineral water manufacturer A, which requires the supermarket to place the brand of mineral water in the 4 th to 6 th floors of a designated shelf, and the number of the brand of mineral water is not less than 30 bottles, and based on the detected position and the number of the brand of mineral water in the display scene, the brand of mineral water can be determined to be actually located in the 4 th to 6 th floors of the shelf, but the number of the brand of mineral water is only 28 bottles, so that a missing mineral water exists, and 2 bottles of mineral water need to be supplemented into the 4 th to 6 th floors of the shelf, and the server generates a missing audit result according to the condition.

Therefore, the embodiment of the invention realizes the automatic auditing of the articles, does not need manual checking and checking, and has high accuracy and high auditing efficiency.

In one possible example, if the picture to be identified is an infrared picture, after an auditing operation is executed based on the type and the position of an article in a display scene to obtain an auditing result of the display scene, the average value of color values of a candidate area in the picture to be identified can be obtained; and determining the temperature environment of the candidate area according to the average value of the color values.

The candidate area comprises a plurality of pixel points, the color values of the pixel points are obtained, the average value of the pixel points is calculated, and the temperature environment can be determined according to the average value of the color values.

Therefore, the temperature environment of the article can be identified through identifying the infrared picture, if the environment is identified as a freezing environment, the scene can be a refrigerator, and if the environment is identified as a higher temperature environment, the scene can be an oven.

In the following, an exemplary application of the embodiment of the present invention in an application scenario of a real supermarket commodity audit will be described.

The embodiment form of the scheme on the product side is mainly a visual output result, and the input commodity display scene graph is completely calculated by the automatic audit device for the displayed commodities and then the automatic audit result is output. Referring to fig. 5, fig. 5 is a schematic diagram of a module composition of an automatic audit device for displayed goods according to an embodiment of the present invention, where the automatic audit device for displayed goods includes a scene positioning and identifying module 501, a goods detecting and identifying module 502, a supermarket element detecting and identifying module 503, a goods filtering module 504, and a business audit module 505.

Referring to fig. 6, fig. 6 is a schematic view of commodity audit provided by an embodiment of the present invention, after a commodity display scene diagram is input to an automatic audit device for displayed goods, an automatic audit result is output after the audit, a schematic view of a shelf 601 is shown on the left side in fig. 6, and a spatial arrangement table 602 of the identified commodities on the shelf is shown on the right side.

Referring to fig. 7, fig. 7 is a schematic diagram of a commodity auditing process according to an embodiment of the present invention, where for each input picture (701), the position and category of a target scene are obtained through display scene positioning and recognition (702); secondly, commodity detection and identification (703) and supermarket element detection and identification (704) are carried out respectively based on the position and the category of the target scene, and the supermarket elements comprise trademarks, plug-in cards and the like of brand merchants; thirdly, filtering the detected commodities according to the target scene (705); then, calculating corresponding business auditing logics (706) based on the commodities and supermarket elements obtained after filtering; finally, the result is output (707).

The following describes a specific audit process.

And inputting the pictures to be identified into an automatic examination and verification device for the displayed goods, determining a target scene through a scene positioning and identification module in a scene positioning and identification stage, and determining the type and the position of the target scene. The method comprises the steps of training in advance by adopting a deep learning method to obtain a display scene detection model, predicting an input picture by the display scene detection model to obtain a positioning rectangular frame and a specific scene category of a target scene (such as a shelf, a pile head, a freezer and the like) in the picture, filtering non-target scene pictures, and returning unidentifiable information in result output if an automatic examination and verification device for displayed goods receives other natural environment pictures and the target scene cannot be identified in a display scene detection module.

And in the commodity detection and identification stage, the target commodity is determined through the commodity detection and identification module, and the type and the position of the target commodity are determined. And pre-training by adopting a deep learning method to obtain an article detection model, predicting the input picture by using the model to obtain positioning rectangular frames and specific commodity categories of all commodities in the picture, and labeling the categories of the commodities.

And determining supermarket elements by a supermarket element detection and identification module in a supermarket element detection and identification stage, and determining the types and positions of the supermarket elements. And pre-training by adopting a deep learning method to obtain an element detection model. The input picture is predicted by the model, and a positioning rectangular frame of supermarket elements in the picture, types of the supermarket elements and information related to articles in the supermarket elements are obtained.

And in the commodity filtering stage, filtering the result output in the commodity detecting and identifying stage through a commodity filtering module. According to the positioning rectangular frame of the target scene, commodities which are not in the positioning rectangular frame can be filtered, so that target commodities in the positioning rectangular frame are obtained, commodity filtering can optimize commodity detection results, detected target commodities are enabled to be in a uniform scene, and background commodities and the target commodities are prevented from being mixed up. The specific filtration process is as follows: establishing a target commodity list, initializing the target commodity list, obtaining a scene positioning frame S and a positioning rectangular frame G of each commodity, wherein the coordinates of the upper left point and the lower right point of the positioning rectangular frame G of each commodity are (x1, y1) and (x2, y2), if the positioning rectangular frame G of each commodity is in the scene positioning frame S, adding the commodity into the target commodity list, and otherwise, detecting the next commodity. Because the supermarket elements are in one-to-one correspondence with the positions of the corresponding articles, for example, the positions of the commodity labels are in one-to-one correspondence with the positions of the corresponding commodities, whether the commodities exist in the corresponding positions can be determined according to the types and the positions of the supermarket elements determined in the stage of supermarket element detection and identification, the articles in a target scene can be determined more quickly, and the commodity filtering accuracy is improved.

In the service auditing stage, the service auditing module performs service logic calculation according to the display scene type (such as a shelf). The business logic is used for determining the display mode of the commodities, such as determining the layer number, the ranking, the counting and the like of a specific commodity.

The method for determining the number of the layers of the commodity is as follows:

a display layer number table is established and initialized, and each element in the display layer number table is { layer number x: item id list }.

A longitudinal distance threshold t1 is set, and when the absolute difference value of the vertical coordinates in the center coordinates of the two target commodities (longitudinal distance) is smaller than t1, the two are considered to be located at the same layer.

For each target commodity: if the number of layers of the target commodity is determined, continuously determining the number of layers of the next target commodity; otherwise, calculating the distance (absolute difference value of ordinate) between the target commodity and the target commodity with the determined number of layers in the display layer number table, thereby determining the commodity k closest to the target commodity;

if the distance between item k and the target item is less than t1 and the number of layers of item k is i, adding the target item to the i-th layer in the display layer number table, and if the number of layers of item 605 is 2 and the absolute difference between the ordinate of target item 604 and the ordinate of item 605 is less than t1 as in fig. 6, adding target item 604 to the 2-th layer in the display layer number table; if the distance between the commodity k and the target commodity is larger than t1, a floor j is newly built in the display floor table, and the target commodity is added to the jth floor in the display floor table.

The method for determining the commodity ranking comprises the following steps:

establishing a ranking summary table, and initializing the ranking summary table, wherein each element in the ranking summary table is { layer number x: the commodity arrangement table corresponding to the layer number x, such as { layer number 1: 2-1 parts of mineral water, 4-2 parts of water cup and 3-3 parts of chocolate, wherein '-x' represents the row surface of the product on the layer, and 2 bottles of mineral water are arranged on the row surface, 1, 4 water cups are arranged on the row surface, and 2, 3 chocolates are arranged on the row surface, 3;

and setting a transverse distance threshold t2, and when the absolute difference value (transverse distance) of the abscissa in the central coordinates of the two target commodities in the same layer is smaller than t2, determining that the two target commodities belong to the same row plane.

For each layer of target commodities, determining the ranking of each target commodity in the layer:

acquiring the central coordinates of each target commodity in the layer, and setting n commodities in the layer;

sequencing each target commodity according to the size of an abscissa in the central coordinate in sequence to obtain a sequence number list of the ranking of the layer: 0-commercial 1, 1-commercial 2, 2-commercial 3, … i-commercial i …, n-1-commercial n;

for item i: if the absolute difference between the abscissa of article i and article i +1 is smaller than t2, it is determined that article i and article i +1 belong to the same ranking (for example, in the article on the second layer in fig. 6, the absolute difference between the abscissa of article 603 and article 604 is smaller than t2, it is determined that article 603 and article 604 belong to the same ranking), the rank numbers of articles i +1 to article n are all decremented by 1, and the ranking number list is updated as: 0-Commodity 1, 1-Commodity 2, 2-Commodity 3, … i-Commodity i, i-Commodity i +1 …, n-2-Commodity n. If the absolute difference between the abscissa of article i and the abscissa of article i +1 is greater than t2, the list of ranking numbers for that layer is maintained (e.g., in the article on the second layer in fig. 6, the absolute difference between the abscissas of article 605 and article 604 is greater than t2, it is determined that article 605 and article 604 do not belong to the same rank). Thus, the serial number of each target commodity in the layer is determined again, an updated ranking serial number list is obtained, and a ranking summary list is obtained according to the ranking serial number list of each layer.

In summary, the embodiment of the invention provides a solution for automatic display scene recognition, commodity recognition and display layer number and ranking determination, and reduces the manpower consumption of traditional manpower calculation. Aiming at the input picture, besides the basic commodity detection and identification function, a solution for layer number and ranking calculation is provided, and the problem that the layer number and ranking calculation still needs manual intervention in the traditional commodity identification process is solved. Through the scene positioning function, whether the input picture comprises the target display scene or not is automatically identified, the target display scene is positioned, the target commodities in the picture are identified, and meanwhile, the background commodities can be filtered, so that the accuracy of counting the commodities in the target display scene is improved. By means of the supermarket element detection and identification function, various target detection can be supported.

Continuing with the exemplary architecture of the artificial intelligence based display item audit device 453 as implemented as software modules provided by embodiments of the present invention, in some embodiments, as shown in fig. 2, the software modules stored in the artificial intelligence based display item audit device 453 of the memory 440 may include:

a display scene detection module 4531, configured to determine a plurality of display scene candidate regions included in the picture to be identified, and extract an image feature of each display scene candidate region;

a display scene positioning module 4532 configured to identify a type of a display scene and a position of the display scene from each display scene candidate region based on an image feature of each display scene candidate region;

an item identification module 4533 configured to perform item identification processing corresponding to the type of the display scene based on an image feature of an area corresponding to the position of the display scene to determine the type of an item displayed in the display scene and the position of the item;

and the auditing module 4534 is used for executing auditing operation based on the type and the position of the goods in the display scene to obtain an auditing result of the display scene.

In the foregoing solution, the display scene detection module 4531 is configured to:

dividing a picture to be identified into a plurality of sub-regions;

determining a plurality of groups of sub-regions of the plurality of sub-regions that satisfy a similarity condition of at least one dimension, wherein the types of the dimensions include size, texture, edge, and color;

combining each group of subareas in the multiple groups of subareas respectively to correspondingly obtain multiple display scene candidate areas;

performing convolution operation on each display scene candidate area to obtain image characteristics including at least one of texture, edge and color.

A display scene positioning module 4532 configured to:

and performing enclosure frame regression processing on the display scene candidate region based on the image characteristics of the display scene candidate region to obtain the position of an enclosure frame of the display scene included in the display scene candidate region as the position of the display scene.

An item identification module 4533 configured to:

determining a plurality of sub-areas from the area corresponding to the position of the display scene to be used as item candidate areas;

and performing bounding box regression processing on the item candidate region based on the image features of the item candidate region to obtain the positions of bounding boxes included in the item candidate region, wherein the positions are used as the positions of the items included in the item candidate region.

An audit module 4534 configured to:

determining a longitudinal distance between any two items based on a vertical coordinate in a position of each item in the display scene;

identifying any two items whose longitudinal distance does not exceed the longitudinal distance threshold as being in the same display level, and identifying any two items whose longitudinal distance is greater than the longitudinal distance threshold as being in different display levels;

generating a display layer table based on the identified display layer and the items in the display layer;

the distribution of different types of items in the display layer table is determined based on the type and location of items in the display layer as a result of a display layer audit of the display scene.

The audit module 4534 is further configured to:

the following processing is performed for each display layer:

identifying any two items in the display layer as being in the same row level when the lateral distance does not exceed the lateral distance threshold, and identifying any two items in the display layer as being in a different row level when the lateral distance does not exceed the lateral distance threshold;

The audit module 4534 is further configured to:

the following operations are performed for each type of item in the display scenario:

the type of material associated with the type of item and the location of the material are identified from each material candidate area as a material review result for the display scene.

The audit module 4534 is further configured to:

determining the total number of different types of items and the distribution situation of the positions of the items based on the type and the position of each item in the display scene;

and comparing the total number and the position distribution condition with the display total number and the distribution condition specified for different types of articles in the auditing rule to obtain the differences of the types, the missing quantity and the distribution condition of the articles with missing quantity in the display scene to serve as the missing auditing result of the display scene.

The display scene detection module 4531 is further configured to:

when the picture to be identified comprises a plurality of display scenes and the auditing operation target scene is a part of the display scenes, determining a bounding box corresponding to the position of the target display scene;

Embodiments of the present invention provide a storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present invention, for example, an artificial intelligence based displayed item audit method as illustrated in fig. 3.

In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEP ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, such as in one or more scripts stored in a hypertext markup language (HTM L, HyperTextMarkup L engine) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the embodiment of the application identifies the display scene in the picture and the articles in the display scene, determines the types and positions of the articles, and realizes automatic examination and verification of article display based on the types and positions of the articles, thereby reducing the pressure of manual examination and verification in the past; the full-automatic article auditing capability is realized, only one picture is input, the article content in the picture can be obtained through background calculation, and the display auditing result is output; scene materials such as price tags, card-inserted posters, brand trademarks and the like can be detected, and the calculation requirements of different business layers can be supported; the accuracy of business calculation such as final article counting, spatial arrangement and the like is enhanced by adopting display scene detection and material detection assistance.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An artificial intelligence based displayed item auditing method, characterized in that the method comprises:

2. The method of claim 1, wherein the determining a plurality of exhibition scene candidate areas included in the picture to be recognized comprises:

dividing the picture to be identified into a plurality of sub-regions;

combining each group of the subareas in the multiple groups of subareas respectively to correspondingly obtain multiple display scene candidate areas;

the extracting of the image features of each exhibition scene candidate area comprises the following steps:

3. The method of claim 1, wherein identifying the type of the display scene and the position of the display scene from each display scene candidate region based on the image features of the display scene candidate region comprises:

4. The method of claim 1, wherein performing item identification processing corresponding to the type of the display scene based on image features of an area corresponding to the location of the display scene to determine the type of items displayed in the display scene and the location of the items comprises:

5. The method of claim 1, wherein the performing an audit operation based on the type and location of the item in the display scene, resulting in an audit result of the display scene, comprises:

6. The method of claim 5, further comprising:

performing the following for each of the display layers:

7. The method of claim 1, wherein the performing an audit operation based on the type and location of the item in the display scene, resulting in an audit result of the display scene, comprises:

performing the following for each type of item in the display scenario:

8. The method of claim 1, wherein the performing an audit operation based on the type and location of the item in the display scene, resulting in an audit result of the display scene, comprises:

9. The method of any of claims 1 to 8, wherein prior to performing an audit operation based on the type and location of items in the display scene, the method further comprises:

10. An audit device for displayed goods based on artificial intelligence, comprising: