WO2021220342A1 - 物体認識装置、物体認識方法、学習装置、学習方法、及び、記録媒体 - Google Patents
物体認識装置、物体認識方法、学習装置、学習方法、及び、記録媒体 Download PDFInfo
- Publication number
- WO2021220342A1 WO2021220342A1 PCT/JP2020/017967 JP2020017967W WO2021220342A1 WO 2021220342 A1 WO2021220342 A1 WO 2021220342A1 JP 2020017967 W JP2020017967 W JP 2020017967W WO 2021220342 A1 WO2021220342 A1 WO 2021220342A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- graph
- node
- product
- objects
- teacher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/84—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional [3D] objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the present invention relates to recognition of an object included in an image.
- Patent Document 1 describes a method of classifying a category of an object area and calculating a reliability by image recognition processing for a photographed image of a product shelf.
- One object of the present invention is to provide an object identification device capable of accurately recognizing a plurality of objects included in an image.
- the object recognition device is Image acquisition means to acquire images and An object recognition means that recognizes an object included in the image and generates a recognition result, Based on the recognition result, a graph generation means that defines each recognized object as one of a node and an edge and generates a graph in which the relationship between the objects is defined as the other of the node and the edge. A graph analysis means that analyzes the graph and generates an analysis result showing the relationship between the objects. To be equipped.
- the object recognition method Get the image, The object included in the image is recognized, the recognition result is generated, and the object is generated. Based on the recognition result, each recognized object is defined on one of the node and the edge, and a graph in which the relationship between the objects is defined on the other of the node and the edge is generated. The graph is analyzed to generate an analysis result showing the relationship between the objects.
- the recording medium is: Get the image, The object included in the image is recognized, the recognition result is generated, and the object is generated. Based on the recognition result, each recognized object is defined on one of the node and the edge, and a graph in which the relationship between the objects is defined on the other of the node and the edge is generated.
- a program that analyzes the graph and causes a computer to execute a process of generating an analysis result showing a relationship between the objects is recorded.
- the learning device A teacher data generation means that acquires a recognition result of an object included in an image, generates a recognition result including an error part as a teacher input data, and generates a teacher label indicating a correction part in the teacher input data.
- a graph analysis means that analyzes the graph using a graph analysis model and generates an analysis result showing the relationship between the objects.
- the learning method The recognition result of the object included in the image is acquired, the recognition result including a part of the error part is generated as the teacher input data, and the teacher label indicating the corrected part in the teacher input data is generated.
- An individual object included in the teacher input data is defined on one of the nodes and the edge, and a graph in which the relationship between the objects is defined on the other of the node and the edge is generated.
- the graph is analyzed using the graph analysis model, and an analysis result showing the relationship between the objects is generated.
- the graph analysis model is learned using the analysis result and the teacher label.
- the recording medium is: The recognition result of the object included in the image is acquired, the recognition result including a part of the error part is generated as the teacher input data, and the teacher label indicating the corrected part in the teacher input data is generated.
- An individual object included in the teacher input data is defined on one of the nodes and the edge, and a graph in which the relationship between the objects is defined on the other of the node and the edge is generated.
- the graph is analyzed using the graph analysis model, and an analysis result showing the relationship between the objects is generated.
- a program for causing a computer to execute a process of learning the graph analysis model is recorded.
- the object recognition apparatus which concerns on 1st Embodiment is shown.
- the hardware configuration of the object recognition device according to the first embodiment is shown.
- the functional configuration of the object recognition device according to the first embodiment is shown.
- An example of a photographed image of a product shelf is shown.
- An example of the object recognition result from the image of the product shelf and the graph is shown.
- An example of the object recognition result from the image of the product shelf and the graph is shown.
- An example of an edge addition rule when generating a graph is shown.
- An example of the analysis result output by the graph analysis unit is shown. It is a flowchart of the object recognition process by the object recognition device.
- the functional configuration of the object recognition device according to the application example of the first embodiment is shown.
- An example of how to correct the recognition result by the correction unit is shown.
- the functional configuration of the learning apparatus according to the first embodiment is shown. It is a flowchart of the learning process by a learning device. An example of the graph according to the modified example 4 is shown. The functional configuration of the object recognition device and the learning device according to the second embodiment is shown.
- the recognition accuracy of each product is improved by considering the above-mentioned display rule for the recognition result of the image on the product shelf.
- the display rules include rules common to the entire industry, rules for each store type (for example, for each supermarket, convenience store, etc.), local rules for each individual store, and the like. It is also possible to correspond to.
- FIG. 1 shows an outline of the object recognition device according to the first embodiment.
- the object recognition device 100 recognizes individual products displayed on the product shelves from the photographed images of the product shelves in the store. Specifically, the object recognition device 100 recognizes each product from an image of the product shelf, and outputs an analysis result obtained by analyzing the recognition result.
- FIG. 2 is a block diagram showing a hardware configuration of the object recognition device 100.
- the object recognition device 100 includes an interface (IF) 11, a processor 12, a memory 13, a recording medium 14, and a database (DB) 15.
- IF interface
- DB database
- IF11 inputs and outputs data to and from an external device. Specifically, the image of the product shelf is input via IF11. Further, the analysis result generated by the object recognition device 100 is output to an external device through IF11 as needed.
- the processor 12 is a computer such as a CPU (Central Processing Unit), and controls the entire object recognition device 100 by executing a program prepared in advance. Specifically, the processor 12 executes the object recognition process and the learning process described later.
- a CPU Central Processing Unit
- the memory 13 is composed of a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
- the memory 13 is also used as a working memory during execution of various processes by the processor 12.
- the recording medium 14 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or a semiconductor memory, and is configured to be removable from the object recognition device 100.
- the recording medium 14 records various programs executed by the processor 12. When the object recognition device 100 executes various processes, the program recorded in the recording medium 14 is loaded into the memory 13 and executed by the processor 12.
- the database 15 stores the captured image input through the IF 11, the recognition result by the object recognition device 100, the analysis result, and the like.
- the database 15 stores an object recognition model and a graph analysis model, which will be described later, and teacher data used for learning them.
- the object recognition device 100 may include an input unit such as a keyboard and a mouse for the user to give instructions and inputs, and a display unit such as a liquid crystal display.
- FIG. 3 is a block diagram showing a functional configuration of the object recognition device 100.
- the object recognition device 100 functionally includes an image acquisition unit 21, an object recognition unit 22, a graph generation unit 23, and a graph analysis unit 24.
- the image acquisition unit 21 acquires an image of the product shelf. Specifically, the image acquisition unit 21 may acquire an image directly from the camera used for photographing the product shelf, or may acquire an image from a database or the like in which the photographed image is stored in advance.
- FIG. 4 shows an example of an image of a product shelf.
- the product shelf 40 includes three stages, an upper stage 41a, a middle stage 41b, and a lower stage 41c, and a plurality of products are displayed on each stage. Each product is arranged according to the display rules illustrated above, such as the same products lined up next to each other and large products lined up at the bottom of the shelf.
- the object recognition unit 22 performs object recognition processing on the image acquired by the image acquisition unit 21 to recognize individual objects. For example, the object recognition unit 22 recognizes an object by using a trained object recognition model using a neural network or the like.
- FIG. 5A shows an example of the recognition result of each object by the object recognition unit 22.
- the object recognition unit 22 first detects a rectangular region corresponding to each object from the image. Next, the object recognition unit 22 extracts the position, size, and feature amount of each rectangular region, and based on these, each product type (for example, product name or product category (liquor, juice, milk, etc.). Also called "product class").
- product type for example, product name or product category (liquor, juice, milk, etc.). Also called "product class").
- the object recognition unit 22 determines that the products whose feature quantities match or similarity is equal to or higher than a predetermined threshold value are classified as the same product class.
- the products on the product shelf 40 are classified into five types of product classes, products A to E.
- the object recognition unit 22 outputs the position, size, type, feature amount, and the like of each product thus obtained to the graph generation unit 23 as a recognition result.
- the graph generation unit 23 generates a graph showing the relationship of each product based on the recognition result of each product input from the object recognition unit 22. Specifically, the graph generation unit 23 defines the position, size, type, feature amount, and the like of the products in the nodes and edges, and generates a graph showing the relationship between the products. In one example, the graph generation unit 23 uses each product recognized by the object recognition unit 22 as a node, and defines the position, size, type, feature amount, and the like of the product in each node. Then, the graph generation unit 23 defines edges between the nodes based on the position, size, type, feature amount, and the like of each product, and generates a graph. FIG. 5B shows an example of a graph generated based on the image of the product shelf shown in FIG.
- each product is a node Nd
- each node Nd indicates a product class (product A to E).
- an edge Ed is assigned between the nodes Nd corresponding to the same type of product. This graph represents the arrangement of multiple types of products displayed on the product shelves.
- FIG. 6 (A) shows an example in which some products are replaced in the image of the product shelf shown in FIG. 5 (A). Specifically, in FIG. 6 (A), the product A and the product B near the center of the upper 41a of the product shelf 40 shown in FIG. 5 (A) are exchanged, and the product C and the product D near the center of the middle 41b are exchanged. There is.
- FIG. 6B shows an example of a graph corresponding to the image of FIG. 6A. In FIG. 6B, as shown by the broken line ellipse 91, the positions of the upper product A and the product B are interchanged, and the edges of those products are eliminated.
- FIG. 7 shows an example of an edge imparting rule when the graph generation unit 23 generates a graph.
- the graph generation unit 23 assigns an edge based on any one or a combination of the following edge assignment rules.
- the point that each product is a node is the same.
- the presence or absence of an edge is determined based on the physical distance between products.
- the graph generation unit 23 imparts an edge between adjacent products, or imparts an edge between products existing within a certain distance.
- edges are added to the products adjacent to each other on the top, bottom, left and right.
- edges are provided between the products in the vertical positional relationship.
- an edge is provided between the products having a left-right positional relationship.
- the graph analysis unit 24 analyzes the graph generated by the graph generation unit 23 and outputs the analysis result.
- the graph analysis unit 24 analyzes the input graph using a graph analysis model such as a graph CNN (Convolutional Neural Network).
- a graph analysis model such as a graph CNN (Convolutional Neural Network).
- the graph input to the graph analysis unit 24 has the position, size, type, feature amount, etc. of each product in each node (hereinafter, these are collectively referred to as "feature amount (feature vector) X". ) Is given.
- feature amount (feature vector) X In the graph CNN, basically the same as in a normal neural network, the process of multiplying the feature vector (feature matrix) X of each node by the weight vector W and inputting this to the activation function is repeated.
- the connection destination of the node is further added.
- the adjacency matrix A indicating the connection relationship of the nodes is defined as follows.
- the adjacency matrix is an N ⁇ N matrix in which the intersection of the indexes corresponding to the connected nodes is “1” and the other points are “0”. For simplicity, we will use a 3x3 adjacency matrix below.
- the feature matrix X indicating the feature quantity of each node is defined as follows.
- the information of the surrounding nodes connected by the edge is added each time the layer is repeated. In this way, the analysis result showing the relationship of each node in the input graph is obtained.
- FIG. 8 shows an example of the analysis result output by the graph analysis unit 24. Now, as shown in FIG. 8A, it is assumed that the individual recognition results of the three products are the classes “product A”, “product B”, and “product B” of each product.
- FIG. 8B shows a first output example of the analysis result.
- the analysis results indicate the need for modification of individual products.
- the leftmost node is "correction required (Yes)"
- the other two nodes are "correction not required (No)”.
- FIG. 8C shows a second output example of the analysis result.
- the analysis result is a product label such as an individual product name
- the leftmost node is "tea X”
- the other two nodes are "coffee Y".
- the object recognition unit 22 recognizes the product label (product name, etc.) of each product based on the appearance feature amount of each product in the image, and supplies the product label (product name, etc.) to the graph analysis unit 24. Is required.
- the object recognition unit 22 generates a product label for each product based on the appearance characteristics of each product and outputs the product label to the graph analysis unit 24.
- the graph analysis unit 24 outputs the analysis result as shown in FIG. 8C using the input product label of each product.
- FIG. 8C shows a third output example of the analysis result.
- the analysis results indicate the presence or absence of edges between individual products. For example, in the case of an edge giving rule that gives an edge between products of the same class, as shown in FIG. 8C, an edge is given between two nodes in the center and on the right side.
- the graph analysis unit 24 can output the analysis results in several formats as described above, but any of the analysis results directly or indirectly indicates whether or not the product needs to be modified. Specifically, the first output example directly indicates whether or not the product needs to be modified, but the second and third output examples also indirectly "correct the leftmost product A to product B.” Can be understood as indicating that "is necessary”.
- the above graph CNN is only an example of a model that can be used in the graph analysis unit 24, and various GNNs (Graph Neural Network) and graph analysis models other than the graph CNN can be applied to the graph analysis unit 24. Is.
- FIG. 9 is a flowchart of the object recognition process by the object recognition device 100. This process is realized by the processor 12 shown in FIG. 2 executing a program prepared in advance and operating as each component shown in FIG.
- the image acquisition unit 21 acquires an image of the product shelf and outputs it to the object recognition unit 22 (step S11).
- the object recognition unit 22 detects the area of each product from the input image, and outputs the position, size, type, feature amount, and the like of each product to the graph generation unit 23 as a recognition result (step S12).
- the graph generation unit 23 generates a graph showing the relationship of each product based on the input recognition result of each product, and outputs the graph to the graph analysis unit 24 (step S13).
- the graph analysis unit 24 analyzes the input graph and outputs an analysis result showing the relationship between the individual products (step S14). Then, the process ends.
- FIG. 10 shows the functional configuration of the object recognition device 100x according to the application example of the first embodiment.
- the object recognition device 100x includes a correction unit 25 in addition to the configuration of the object recognition device 100.
- the object recognition device 100x corrects the recognition result output by the object recognition unit 22 based on the analysis result by the graph analysis unit 24.
- the object recognition unit 22 outputs the recognition result generated based on the image input from the image acquisition unit 21 to the correction unit 25. Further, the graph analysis unit 24 outputs the analysis result showing the relevance of the individual products to the correction unit 25 as illustrated in FIGS. 8 (B) to 8 (D).
- the correction unit 25 corrects and outputs the recognition result generated by the object recognition unit 22 based on the analysis result output by the graph analysis unit 24. For example, when the graph analysis unit 24 outputs the correction necessity of each product as an analysis result as shown in FIG. 8 (B), the correction unit 25 sets the product determined to be correction (Yes) as another product. Correct to. When the graph analysis unit 24 outputs the product label of each product as the analysis result as shown in FIG.
- the correction unit 25 determines whether or not any product needs to be corrected based on the product label. Judge and correct the product that is judged to need correction.
- the graph analysis unit 24 outputs the presence / absence of an edge between products as an analysis result as shown in FIG. 8 (D)
- the correction unit 25 corrects one of the products based on the presence / absence of an edge between each product. Judges whether or not is necessary, and corrects the product that is determined to require correction.
- FIG. 11 shows an example of a method of correcting the recognition result by the correction unit 25.
- 11 (A) and 11 (B) show the first modification method.
- the first correction method is applied when the correction necessity is obtained as a score for each node as the analysis result of the graph analysis unit 24.
- one node corresponds to one product.
- numbers 1 to 4 are assigned to each node for convenience of explanation, and the recognition result (product class) of each node is shown in a rectangular box corresponding to each node.
- the numerical value in parentheses in each box is a score indicating whether or not each node needs to be modified. This score indicates that the higher the value, the higher the need for correction.
- the correction unit 25 compares the score of each node with a predetermined first threshold value, and determines a node whose score is larger than the first threshold value as a correction target node. Then, the correction unit 25 corrects the correction target node to the one having the largest number among other products adjacent to the correction target node.
- "adjacent" means belonging to a predetermined range centered on the product, and in the example of FIG. 11, it is assumed that the adjacent products are the four products of nodes 1 to 4. Further, the first threshold value is assumed to be "0.5".
- the correction unit 25 compares the scores of the nodes 1 to 4 with the first threshold value, and the node 2 having a score of “0.8”. Is determined as the node to be modified. Then, the correction unit 25 corrects the product B of the node 2 to the product A, which is the largest number of the adjacent products. Further, when the graph analysis unit 24 outputs the analysis result shown in FIG. 11B, the correction unit 25 determines the node 2 as the correction target node. Then, the correction unit 25 corrects the product B of the node 2 to the product A, which is the largest number of the adjacent products.
- the modification unit 25 does not modify the modification target node. It may be a thing. Specifically, as shown in FIG. 11C, the node 2 is determined to be the modification target node, but among the adjacent product groups, two nodes including the modification target node are the product B, and the modification target node. When the other two nodes are the product A, the correction unit 25 may not modify the node 2.
- the modification unit 25 determines the node to be modified and performs the modification, but instead, it does not perform the modification and only presents the node to be modified. good.
- the correction unit 25 may output the node 2 as the correction target node and end the process. In this case, a human or the like may decide which product to modify the node to be modified by looking at the output of the modification unit 25.
- FIG. 12A shows a second modification method.
- the graph analysis unit 24 outputs a score (hereinafter, also referred to as “matching degree score”) indicating the same product-likeness of each node as an analysis result.
- the match score takes a value from "0" to "1".
- the correction unit 25 determines the nodes of the combination (pair). Make corrections so that they are aligned with the same product.
- the node to be modified is determined by the same or different method as the first modification method.
- the node to be modified is node 2.
- Node 2 has a different product class from nodes 1, 3, and 4.
- the degree of agreement score between node 1 and node 2 is "0.8"
- the degree of agreement score between node 2 and node 3 is "0.7”
- the degree of agreement score between node 2 and node 4 is "0. 7 ".
- the second threshold be "0.7”.
- the matching score for the pair of product A and product B is the matching score "0.8" between node 1 and node 2 and the matching score "0.7” between node 2 and node 3.
- Their maximum value is "0.8".
- the matching degree score for the pair of the product B and the product C is the matching degree score “0.7” of the node 2 and the node 3, and the maximum value thereof is “0.7”. Therefore, the maximum value of the matching degree score between different product labels is the matching degree score "0.8" of the product A and the product B, which exceeds the second threshold value "0.7”. Therefore, the correction unit 25 modifies the node 2 to the same product A as the node 1.
- the average value may be used instead of the maximum value.
- the matching score for the pair of product A and product B is the matching score "0.8" between node 1 and node 2 and the matching score "0.7” between node 2 and node 3.
- the average value is "0.75", which exceeds the second threshold "0.7". Therefore, the correction unit 25 corrects the node 2 which is the product B to the product A.
- the matching degree score for the pair of product B and product C is the matching degree score "0.7" of the node 2 and the node 3, and the average value thereof is "0.7", which is the second.
- the threshold value "0.7" is not exceeded. Therefore, the correction unit 25 does not make corrections between the product B and the product C.
- FIG. 12B shows a third modification method.
- the graph analysis unit 24 outputs a concordance score indicating the same product-likeness of each node as an analysis result, and the concordance score takes a value of "0" to "1.0".
- the node to be modified is determined by the same or different method as the first modification method.
- the correction unit 25 corrects the node to be corrected to the product having the largest number among the products showing the matching degree score equal to or higher than the predetermined third threshold value.
- the product label and the degree of agreement of each node are the same as those of FIG. 12 (A). Now, let the third threshold be "0.7". It is the nodes 1, 3 and 4 that the degree of agreement score with the node 2 which is the correction target node is the third threshold value "0.7" or more, and the node 1 and the node 3 are the products A, so that the number is the largest.
- Product is product A. Therefore, the correction unit 25 corrects the node 2 from the product B to the product A.
- the object recognition device 100x can obtain a corrected recognition result based on the analysis result.
- FIG. 13 is a block diagram showing a functional configuration of the learning device 200.
- the learning device 200 includes a graph generation unit 23, a graph analysis unit 24, a teacher data generation unit 31, and a learning unit 32.
- the graph generation unit 23 and the graph analysis unit 24 are the same as those of the object recognition device 100 shown in FIG.
- the teacher data generation unit 31 generates teacher data used for learning the graph analysis model based on the recognition result obtained by applying the object recognition process to the image of the product shelf.
- the arrangement of a plurality of products on the product shelves is called “shelf allocation”, and the recognition result for the image of the product shelves is also called “shelf allocation recognition result”.
- Teacher data is a set of teacher input data and a teacher label indicating the correct answer to it.
- the graph analysis model used by the graph analysis unit 24 is learned to output an analysis result for correcting the error when an erroneous shelf allocation recognition result for the image of the product shelf is input. Therefore, the teacher input data is a shelf allocation recognition result including an error, and the teacher data generation unit 31 generates a shelf allocation recognition result in which the recognition result of some products is incorrect as teacher input data, and graphs it. Output to the generation unit 23. First, a method of generating teacher input data will be described.
- the teacher data generation unit 31 acquires the recognition result of each product generated by the object recognition process from the actual image of the product shelf. Then, the teacher data generation unit 31 uses the acquired recognition result including an error as the teacher input data.
- the teacher data generation unit 31 obtains the recognition result of each product generated by the object recognition process from the actual image of the product shelf as in the first method. get.
- This recognition result includes a score indicating the reliability of each of a plurality of product candidates for each product.
- the teacher data generation unit 31 extracts products whose reliability is less than a predetermined fourth threshold value from the acquired recognition results, and N random products.
- the teacher input data is generated by exchanging the results of the 1st and 2nd places. That is, in this method, it is assumed that the recognition result generated by the object recognition unit 22 is correct, and the recognition result is intentionally changed to an error for some of the products.
- the teacher data generation unit 31 determines the reliability when the difference between the score of the product candidate having the first reliability and the score of the product candidate having the second reliability is less than the fourth threshold value. Instead of the product candidate with the highest reliability, the product candidate with the second highest reliability is used as the recognition result of the product.
- the teacher data generation unit 31 applies this process to N randomly selected products among a plurality of products in which the score difference between the first and second reliability is less than the fourth threshold value. As a result, in the recognition result generated by the object recognition process from one image, the teacher input data in which the recognition result of N randomly selected products is incorrect can be obtained.
- the teacher data generation unit 31 acquires the recognition results of a plurality of products to which the correct answer is given. Then, the teacher data generation unit 31 replaces N randomly selected products among the plurality of products to which the correct answer is given with another product. As a result, teacher input data in which the recognition result of N randomly selected products is incorrect can be obtained.
- the teacher data generation unit 31 acquires the recognition results of a plurality of products to which the correct answer is given, as in the third method.
- the teacher data generation unit 31 replaces the erroneous recognition candidate included in the recognition results of the plurality of products with another product.
- the teacher data generation unit 31 includes the product M and the product included in the recognition results of a plurality of products to which the correct answer is given. Replace N.
- teacher input data in which the recognition result of the erroneous recognition candidate is erroneous can be obtained.
- the teacher data generation unit 31 replaces the feature vector together with the product class for the node to be replaced.
- teacher input data is generated using the recognition result obtained by applying the object recognition process to the image.
- the image data itself is changed.
- the teacher data generation unit 31 individually cuts out individual products from the input image to generate individual product images, and rearranges the individual product images in the image according to a certain rule.
- the rearrangement rules as with the display rules mentioned above, "same products are lined up next to each other", “products of the same size are lined up in the same row”, “large products are lined up at the bottom of the shelf", etc. Can be used.
- the teacher data generation unit 31 applies object recognition to the image data obtained by rearranging the individual product images to generate a recognition result, and the first or the above-mentioned first or the above-mentioned recognition result is applied to the obtained recognition result.
- the second method is applied to generate teacher input data.
- the teacher data generation unit 31 generates a teacher label for the generated teacher input data.
- the teacher label indicates the corrected part in the teacher input data, that is, the position of the product that needs to be corrected.
- the teacher label is prepared in any of the forms of FIGS. 8 (B) to 8 (D) according to the form of the analysis result output by the graph analysis unit 24.
- the teacher data generation unit 31 calculates the difference between the recognition result including an error generated as the teacher input data by any of the above methods and the corresponding correct recognition result, and indicates the corrected portion.
- Generate a teacher label The teacher data thus obtained, that is, a set of teacher input data and a teacher label is stored in DB15 or the like.
- the teacher input data generated by the teacher data generation unit 31 is input to the graph generation unit 23.
- the graph generation unit 23 generates a graph based on the teacher input data and outputs it to the graph analysis unit 24.
- the graph analysis unit 24 analyzes the graph of the generated teacher input data and outputs the analysis result to the learning unit 32.
- the learning unit 32 compares the analysis result output by the graph analysis unit 24 with the teacher label prepared in advance, and learns the graph analysis model based on the difference (loss). For example, when the graph analysis model is GCNN using a neural network as described above, the learning unit 32 optimizes the parameters of the neural network constituting the graph analysis model based on the loss between the analysis result and the teacher label. ..
- FIG. 14 is a flowchart of the learning process. This process is realized by the processor 12 shown in FIG. 2 executing a program prepared in advance and operating as a component shown in FIG. In the following learning process, it is premised that the correct answer information of the recognition result of each product is prepared for the image of the actual product shelf.
- the teacher data generation unit 31 acquires an image of the product shelf (step S21) and acquires correct answer information of the recognition result of each product prepared for the image (step S22).
- the teacher data generation unit 31 performs image recognition processing on the acquired image to acquire the image recognition result, and generates teacher input data by using any of the above-mentioned first to fourth methods. (Step S23). Further, the teacher data generation unit 31 uses the generated teacher input data and the correct answer information acquired in step S22 to generate a teacher label indicating the corrected portion (step S24).
- the teacher input data is input to the graph generation unit 23, and the graph generation unit 23 generates a graph based on the teacher input data (step S25) and outputs it to the graph analysis unit 24.
- the graph analysis unit 24 analyzes the input graph and outputs the analysis result to the learning unit 32 (step S26).
- the learning unit 32 compares the analysis result output by the graph analysis unit 24 with the teacher label generated in step S24, and updates the graph analysis model based on the difference (loss) (step S27). The above processing is executed for the prepared images, and the learning process is completed.
- the feature amount of each product input from the object recognition unit 22 to the graph generation unit 23 includes the feature amount related to the appearance of the product, but does not include the feature amount related to the appearance of the product. You may use it. That is, the graph generation unit 23 may generate a graph using a feature vector that does not include features related to the appearance of the product, and the graph analysis unit 24 may analyze the graph.
- the object recognition device 100x shown in FIG. 12 outputs the recognition result after the correction is performed by the correction unit 25, and this output may be used for re-learning the object recognition model used by the object recognition unit 22. .. That is, the object recognition unit 22 performs object recognition from the image data input using the learned object recognition model, but the recognition result after the correction by the correction unit 25 corrects the erroneous recognition portion by the object recognition model. Since the recognition result is obtained, it is possible to improve the recognition accuracy of the object recognition model by re-learning the object recognition model using this.
- the graph generation unit 23 generates a graph having each product as a node, but in addition to this, a node for classifying each product by an attribute or the like (hereinafter, referred to as a "classification node”). ) May be provided.
- the size of the product is classified into “large product”, “medium product”, “small product”, etc., and “large product node”, “medium product node”, and “small product node” are provided as the classification nodes.
- a node for classifying product types and categories may be provided.
- classification nodes corresponding to each stage of the product shelf (upper stage, middle stage, lower stage, etc.) and connecting each product to any node, it is possible to express which stage of the product shelf each product is on. be able to.
- By using such a classification node it is possible to improve the recognition accuracy of products by utilizing various relationships between products arranged on the product shelves.
- each node indicates a product
- each edge recognizes an object using a graph showing the relationship between the products.
- the graph is a line graph of the graph. May be used.
- the "line graph” is a graph in which an edge of a certain graph is converted into a node and the node is converted into an edge.
- FIG. 15 (A) shows an example of a graph generated by the above embodiment
- FIG. 15 (B) shows a graph obtained by graphing the graph of FIG. 15 (A) into a line graph.
- each node shows a product and each edge shows a relationship between the products.
- each node shows the relationship between the products
- each edge shows the products by connecting the nodes including the common products. Even if such a line graph is used, features such as the type and positional relationship of each product can be expressed, so that object recognition similar to that of the above embodiment can be performed.
- the classification node in the above modification 3 can be expressed as a classification edge.
- FIG. 16A is a block diagram showing a functional configuration of the object recognition device according to the second embodiment.
- the object recognition device 70 includes an image acquisition means 71, an object recognition means 72, a graph generation means 73, and a graph analysis means 74.
- the image acquisition means 71 acquires an image such as a photographed image of the product shelf.
- the object recognition means 72 recognizes an object included in the image and generates a recognition result.
- the graph generation means 73 defines each recognized object as one of the node and the edge, and generates a graph in which the relationship between the objects is defined as the other of the node and the edge.
- the graph analysis means 74 analyzes the graph and generates an analysis result showing the relationship between the objects.
- FIG. 16B is a block diagram showing a functional configuration of the learning device according to the second embodiment.
- the learning device 80 includes a teacher data generating means 81, a graph generating means 82, a graph analysis means 83, and a learning means 84.
- the teacher data generation means 81 acquires the recognition result of the object included in the image, generates the recognition result including an error part as the teacher input data, and generates the teacher label indicating the correction part in the teacher input data. ..
- the graph generation means 82 defines each object included in the teacher input data on one of the node and the edge, and generates a graph in which the relationship between the objects is defined on the other of the node and the edge.
- the graph analysis means 83 analyzes the graph using the graph analysis model and generates an analysis result showing the relationship between the objects.
- the learning means 84 learns the graph analysis model using the analysis result and the teacher label.
- Appendix 2 The object recognition device according to Appendix 1, wherein the recognition result includes at least one of the size, position, type, and feature amount of the object.
- Appendix 3 The object according to Appendix 1 or 2, wherein the graph generating means defines the node or edge based on at least one of the positional relationship, the distance relationship, the type similarity relationship, and the feature amount similarity relationship of individual objects. Recognition device.
- Appendix 6 The object recognition device according to Appendix 5, further comprising a correction means for outputting correction candidates for an object based on the information indicating the necessity of correction.
- the analysis result includes a score indicating whether or not the individual object needs to be modified.
- the object recognition device wherein the correction means outputs the largest number of objects among other objects existing in a predetermined range from the object as the correction candidates for an object whose score is larger than the first threshold value.
- the analysis result includes a score indicating the same object-likeness between objects.
- the object recognition device according to Appendix 6, wherein the correction means outputs an object having the maximum value as a correction candidate when the maximum value or the average value of the score is larger than the second threshold value.
- the object recognition means recognizes an object by using a trained object recognition model.
- the object recognition device according to any one of Appendix 6 to 8, further comprising a re-learning means for re-learning the object recognition model using the correction candidate.
- the object recognition means recognizes the label of each object and The object recognition device according to any one of Supplementary note 1 to 4, wherein the graph analysis means generates an analysis result including labels of individual objects.
- the image acquisition means acquires a photographed image of the product shelf on which the product is displayed, and obtains a photographed image.
- the object recognition device according to any one of Supplementary note 1 to 11, wherein the object recognition unit recognizes a product in the captured image as the object.
- a recording medium recording a program that causes a computer to execute a process of analyzing the graph and generating an analysis result showing a relationship between the objects.
- a teacher data generation means that acquires a recognition result of an object included in an image, generates a recognition result including an error part as a teacher input data, and generates a teacher label indicating a correction part in the teacher input data.
- a graph generating means for generating a graph in which individual objects included in the teacher input data are defined on one of a node and an edge and the relationship between the objects is defined on the other of the node and the edge.
- a graph analysis means that analyzes the graph using a graph analysis model and generates an analysis result showing the relationship between the objects.
- a learning means for learning the graph analysis model using the analysis result and the teacher label, and A learning device equipped with.
- Appendix 16 The learning device according to Appendix 15, wherein the teacher data generation means acquires correct answer information of the recognition result of the object and generates the teacher label by using the difference between the teacher input data and the correct answer information.
- the recognition result of the object included in the image is acquired, the recognition result including a part of the error part is generated as the teacher input data, and the teacher label indicating the corrected part in the teacher input data is generated.
- An individual object included in the teacher input data is defined on one of the nodes and the edge, and a graph in which the relationship between the objects is defined on the other of the node and the edge is generated.
- the graph is analyzed using the graph analysis model, and an analysis result showing the relationship between the objects is generated.
- a recording medium on which a program for causing a computer to execute a process of learning the graph analysis model using the analysis result and the teacher label is recorded.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022518440A JP7491370B2 (ja) | 2020-04-27 | 2020-04-27 | 物体認識装置、物体認識方法、学習装置、学習方法、及び、プログラム |
| US17/920,468 US12524995B2 (en) | 2020-04-27 | 2020-04-27 | Object recognition device, object recognition method, learning device, learning method, and recording medium |
| PCT/JP2020/017967 WO2021220342A1 (ja) | 2020-04-27 | 2020-04-27 | 物体認識装置、物体認識方法、学習装置、学習方法、及び、記録媒体 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/017967 WO2021220342A1 (ja) | 2020-04-27 | 2020-04-27 | 物体認識装置、物体認識方法、学習装置、学習方法、及び、記録媒体 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021220342A1 true WO2021220342A1 (ja) | 2021-11-04 |
Family
ID=78373420
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/017967 Ceased WO2021220342A1 (ja) | 2020-04-27 | 2020-04-27 | 物体認識装置、物体認識方法、学習装置、学習方法、及び、記録媒体 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12524995B2 (https=) |
| JP (1) | JP7491370B2 (https=) |
| WO (1) | WO2021220342A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2024062632A1 (https=) * | 2022-09-22 | 2024-03-28 | ||
| WO2025135147A1 (ja) * | 2023-12-22 | 2025-06-26 | Telexistence株式会社 | 管理装置、商品移動装置及び商品移動システム |
| WO2025150133A1 (ja) * | 2024-01-11 | 2025-07-17 | 三菱電機株式会社 | 情報処理装置、グループ化方法、及びグループ化プログラム |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3770840A1 (en) * | 2020-02-07 | 2021-01-27 | ChannelSight Limited | Method and system for determining product similarity in digital domains |
| DE102021207849A1 (de) * | 2021-07-22 | 2023-01-26 | Robert Bosch Gesellschaft mit beschränkter Haftung | Verfahren zum Nachtrainieren einer Videoüberwachungsvorrichtung, Computerprogramm, Speichermedium und Videoüberwachungsvorrichtung |
| US20230289685A1 (en) * | 2022-03-14 | 2023-09-14 | Content Square SAS | Out of stock product missed opportunity |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170178060A1 (en) * | 2015-12-18 | 2017-06-22 | Ricoh Co., Ltd. | Planogram Matching |
| WO2019107157A1 (ja) * | 2017-11-29 | 2019-06-06 | 株式会社Nttドコモ | 棚割情報生成装置及び棚割情報生成プログラム |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2012176317A1 (ja) * | 2011-06-23 | 2012-12-27 | サイバーアイ・エンタテインメント株式会社 | 画像認識システムを組込んだ関連性検索によるインタレスト・グラフ収集システム |
| US20150262116A1 (en) * | 2014-03-16 | 2015-09-17 | International Business Machines Corporation | Machine vision technology for shelf inventory management |
| JP2016099835A (ja) * | 2014-11-21 | 2016-05-30 | キヤノン株式会社 | 画像処理装置、画像処理方法、及びプログラム |
| KR101771044B1 (ko) * | 2015-11-27 | 2017-08-24 | 연세대학교 산학협력단 | 공간-물체 관계 그래프에 기반한 객체 인식 방법 및 그 장치 |
| US11117744B1 (en) * | 2015-12-08 | 2021-09-14 | Amazon Technologies, Inc. | Determination of untidy item return to an inventory location |
| US10922541B2 (en) * | 2016-04-06 | 2021-02-16 | Nec Corporation | Object type identifying apparatus, object type identifying method, and recording medium |
| US9558265B1 (en) * | 2016-05-12 | 2017-01-31 | Quid, Inc. | Facilitating targeted analysis via graph generation based on an influencing parameter |
| US10846657B2 (en) * | 2017-04-07 | 2020-11-24 | Simbe Robotics, Inc. | Method for tracking stock level within a store |
| WO2019088223A1 (ja) | 2017-11-02 | 2019-05-09 | 株式会社Nttドコモ | 検出装置及び検出プログラム |
| CA3090092A1 (en) * | 2018-01-31 | 2019-08-08 | Walmart Apollo, Llc | Systems and methods for verifying machine-readable label associated withmerchandise |
| US11030486B2 (en) * | 2018-04-20 | 2021-06-08 | XNOR.ai, Inc. | Image classification through label progression |
-
2020
- 2020-04-27 US US17/920,468 patent/US12524995B2/en active Active
- 2020-04-27 WO PCT/JP2020/017967 patent/WO2021220342A1/ja not_active Ceased
- 2020-04-27 JP JP2022518440A patent/JP7491370B2/ja active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170178060A1 (en) * | 2015-12-18 | 2017-06-22 | Ricoh Co., Ltd. | Planogram Matching |
| WO2019107157A1 (ja) * | 2017-11-29 | 2019-06-06 | 株式会社Nttドコモ | 棚割情報生成装置及び棚割情報生成プログラム |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2024062632A1 (https=) * | 2022-09-22 | 2024-03-28 | ||
| WO2024062632A1 (ja) * | 2022-09-22 | 2024-03-28 | 日本電気株式会社 | 制御装置、制約条件選択装置、データ生成装置、制御方法、制約条件選択方法、データ生成方法及び記憶媒体 |
| JP7845487B2 (ja) | 2022-09-22 | 2026-04-14 | 日本電気株式会社 | 制御装置、制御方法及びプログラム |
| WO2025135147A1 (ja) * | 2023-12-22 | 2025-06-26 | Telexistence株式会社 | 管理装置、商品移動装置及び商品移動システム |
| WO2025150133A1 (ja) * | 2024-01-11 | 2025-07-17 | 三菱電機株式会社 | 情報処理装置、グループ化方法、及びグループ化プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| US12524995B2 (en) | 2026-01-13 |
| JP7491370B2 (ja) | 2024-05-28 |
| JPWO2021220342A1 (https=) | 2021-11-04 |
| US20230306717A1 (en) | 2023-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7491370B2 (ja) | 物体認識装置、物体認識方法、学習装置、学習方法、及び、プログラム | |
| JP4623387B2 (ja) | 学習装置および方法、認識装置および方法、並びにプログラム | |
| US8676740B2 (en) | Attribute estimation system, age estimation system, gender estimation system, age and gender estimation system and attribute estimation method | |
| JP4618098B2 (ja) | 画像処理システム | |
| JP6270182B2 (ja) | 属性要因分析方法、装置、およびプログラム | |
| US20210042588A1 (en) | Method and system for region proposal based object recognition for estimating planogram compliance | |
| JP2012160178A (ja) | オブジェクト認識デバイス、オブジェクト認識を実施する方法および動的アピアランスモデルを実施する方法 | |
| JP2021135898A (ja) | 行動認識方法、行動認識プログラム及び行動認識装置 | |
| CN113705092B (zh) | 基于机器学习的疾病预测方法及装置 | |
| TW202201275A (zh) | 手部作業動作評分裝置、方法及電腦可讀取存儲介質 | |
| JP2018106618A (ja) | 画像データ分類装置、オブジェクト検出装置及びこれらのプログラム | |
| CN116934747B (zh) | 眼底图像分割模型训练方法、设备和青光眼辅助诊断系统 | |
| CN117351504A (zh) | 一种电子病历表格抽取的方法、装置、设备及介质 | |
| JP6623851B2 (ja) | 学習方法、情報処理装置および学習プログラム | |
| US12062229B2 (en) | Identification process of a dental implant visible on an input image by means of at least one convolutional neural network | |
| US11875901B2 (en) | Registration apparatus, registration method, and recording medium | |
| US20220067480A1 (en) | Recognizer training device, recognition device, data processing system, data processing method, and storage medium | |
| WO2022003982A1 (ja) | 検出装置、学習装置、検出方法及び記憶媒体 | |
| JP2021033378A (ja) | 情報処理装置、情報処理方法 | |
| CN118201554A (zh) | 认知功能评估系统和训练方法 | |
| Ramyashree et al. | Application of AI technology and image processing to smart agriculture: Detection and classification of plant diseases | |
| JP7347539B2 (ja) | 前景抽出装置、前景抽出方法、及び、プログラム | |
| Kaden et al. | Mitigating the bias in data for fairness using an advanced generalized learning vector quantization approach–FA (IR) 2MA-GLVQ. | |
| Kumar et al. | Enhancing kyphosis disease prediction: evaluating machine learning algorithms effectiveness | |
| KR20220067387A (ko) | 이미지의 레이아웃 분석 방법 및 시스템 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20934073 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022518440 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20934073 Country of ref document: EP Kind code of ref document: A1 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 17920468 Country of ref document: US |