CN113627512A

CN113627512A - Picture identification method and device

Info

Publication number: CN113627512A
Application number: CN202110896038.1A
Authority: CN
Inventors: 吴圣健; 李铁铮
Original assignee: Shanghai Goua Technology Co ltd
Current assignee: Shanghai Goua Technology Co ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-09
Anticipated expiration: 2041-08-05
Also published as: CN113627512B

Abstract

The embodiment of the invention provides a picture identification method and device. The method comprises the following steps: and carrying out object identification on the first picture and the second picture to respectively obtain a plurality of first detection frames and a plurality of second detection frames. Aiming at any first detection frame, respectively determining a first detection frame and a second detection frame with a position matching relationship as a first abnormal frame and a second abnormal frame; determining a position overlapping area of the first abnormal frame and the second abnormal frame; determining a first similarity between a first sub-image corresponding to the position overlapping region in the first picture and a second sub-image corresponding to the position overlapping region in the second picture; and if the first similarity meets the first set condition, updating the second recognition result of the second abnormal frame into the first recognition result of the first abnormal frame. Therefore, the recognition results of the first picture and the second picture are consistent for the same commodity, and the payment amount of the user can be prevented from being calculated by mistake, so that better experience is brought to the user.

Description

Picture identification method and device

Technical Field

The embodiment of the invention relates to the technical field of computer vision, in particular to a picture identification method, a picture identification device, a computing device and a computer readable storage medium.

Background

In recent years, with the rapid development of deep learning technology, many intelligent vending machines different from the traditional vending machine appear in the market, and the intelligent vending machines are the applications of the deep learning technology in computer vision. In FIG. 1A, a smart vending machine is shown, divided into 4-5 shelves, each of which may be equal or different in height, for holding retail items. A camera is arranged on the top of each layer for shooting the commodities on the layer, and in order to enable the camera to shoot the commodities on all positions of the layer as completely as possible, the camera needs to be arranged on the top of each layer of the shelf and be centered, and the shot picture is as shown in fig. 1B. The target detection algorithm is used for carrying out target detection on the pictures shot before and after the cabinet door is opened and closed every time, and the number change of various types of commodities before and after the cabinet door is opened and closed is compared, so that the types and the number of the commodities purchased by a user are determined, and the deducted money amount is automatically settled. For example, when detecting that the user closes the cabinet door, the camera takes a picture of the commodity, and the existing mineral water 5 bottle and the fruit juice 2 bottle in the layer are determined by using an object detection algorithm. And the user can know that the user buys 1 bottle of mineral water and 1 bottle of fruit juice by performing target detection on the picture shot before the user opens the cabinet door, so that the price corresponding to 1 bottle of mineral water and 1 bottle of fruit juice can be displayed on the display screen of the intelligent vending machine, and the user can pay conveniently.

However, if there is a commodity, the type detected by the commodity in the two target detections changes, which may result in a wrong calculation of the deduction amount, for example, the user does not take any commodity in the process of opening and closing the cabinet door, but when the intelligent vending machine performs the target detection on the pictures before and after the user opens and closes the cabinet door, the same commodity is identified as mineral water before the user opens the door, and the user closes the cabinet door and then identifies as fruit juice, and then the calculation is performed to determine that there is one bottle of mineral water less, so that the deduction amount of the mineral water in the bottle is displayed for the user. Such an occurrence of an error in the target detection causes a decrease in the accuracy of the result of the automatic settlement, greatly reduces the shopping experience of the user, and also affects the inventory management of the goods.

In summary, the embodiment of the present invention provides a picture identification method, so as to improve accuracy of identifying pictures before and after a user opens and closes a cabinet door.

Disclosure of Invention

The embodiment of the invention provides a picture identification method, which is used for improving the accuracy of identifying pictures before and after a user opens and closes a cabinet door.

In a first aspect, an embodiment of the present invention provides an image identification method, including:

carrying out object identification on a first picture acquired at a first moment to obtain each first detection frame at different positions and each corresponding first identification result;

carrying out object recognition on a second picture acquired at a second moment to obtain each second detection frame at different positions and each corresponding second recognition result;

for any first detection frame, if the position matching relationship with any second detection frame is determined according to the position information of each first detection frame and each second detection frame, determining the first detection frame and the second detection frame with the position matching relationship as a first abnormal frame and a second abnormal frame respectively;

determining a position overlapping area of the first abnormal frame and the second abnormal frame; determining a first similarity between a corresponding first sub-image in the first picture and a corresponding second sub-image in the second picture of the position overlapping region; and if the first similarity meets a first set condition, updating the second identification result of the second abnormal frame into the first identification result of the first abnormal frame.

Since there may be a case where the product is not moved or is moved less before and after the user opens and closes the cabinet door, but the target detection algorithm identifies the product as a different category in the first picture and the second picture, it is necessary to change the identification result of the product in the second picture to the identification result of the product in the first picture. Specifically, the first detection frame and the second detection frame having a position matching relationship can be determined by the position information of the first detection frame and the second detection frame, and then a second detection frame most likely to be the same commodity can be preliminarily determined for the first detection frame according to the position information, that is, the first abnormal frame and the second abnormal frame can be obtained. And then determining a first sub-image and a second sub-image according to the position overlapping area of the first abnormal frame and the second abnormal frame, wherein the first sub-image is an image of the position overlapping area in a first picture, and the second sub-image is an image of the position overlapping area in a second picture. Since the position of the commodity does not move much, the first similarity of the first sub-graph and the second sub-graph is high if the commodity is the same commodity. Therefore, whether the commodity circled in the first picture by the first abnormal frame and the commodity circled in the second picture by the second abnormal frame are the same commodity is determined according to whether the first similarity of the first sub-picture and the second sub-picture meets the first set condition. If the first similarity meets the first set condition, the same commodity is indicated, the identification result of the second abnormal frame is changed into the identification result of the first abnormal frame, and therefore the identification results of the first picture and the second picture are consistent for the same commodity, calculation of payment amount of a wrong user can be avoided, and better experience is brought to the user.

Optionally, the method further comprises:

for any first detection frame, if any second detection frame with a position matching relationship cannot be determined according to the position information of each first detection frame and each second detection frame, determining the first detection frame without the position matching relationship as a third exception frame; determining a virtual frame in the second picture according to the position information of the third abnormal frame;

for any third abnormal frame, determining a second similarity between a third sub-image corresponding to the third abnormal frame in the first picture and a fourth sub-image corresponding to the virtual frame in the second picture; and if the second similarity meets a second set condition, determining the first recognition result of the third abnormal frame as the recognition result of the virtual frame.

In the object recognition, the same commodity may not be moved or moved less, but the first picture is recognized and the second picture is not recognized, so that the commodity which is not recognized in the second picture is mistakenly deducted as the commodity bought by the user. In this case, since the second detection frame having a position matching relationship with the first detection frame may not be specified based on the position information, the first detection frame having no position matching relationship is specified as the third abnormal frame, the virtual frame is specified in the second picture based on the position information of the third abnormal frame, and if the position of the commodity is not moved or is moved a little, the second similarity between the third sub-picture corresponding to the first detection frame in the first picture and the fourth sub-picture corresponding to the virtual frame in the second picture is high. If the second similarity meets a second set condition, the commodity encircled by the third abnormal frame in the first picture and the commodity encircled by the virtual frame in the second picture are the same commodity, but the second picture is not identified, so that the first identification result of the third abnormal frame is determined as the identification result of the virtual frame in the second picture. Therefore, the recognition results of the first picture and the second picture are consistent for the same commodity, and the payment amount of the user can be prevented from being calculated by mistake, so that better experience is brought to the user.

Optionally, the method further comprises:

for any first detection frame, if any second detection frame with a position matching relationship cannot be determined according to the position information of each first detection frame and each second detection frame, determining the first detection frame without the position matching relationship as a fourth abnormal frame, and determining the second detection frame without the position matching relationship as a fifth abnormal frame;

and updating the second identification result of the fifth abnormal frame to be the first identification result of the fourth abnormal frame according to the attribute category to which the first identification result corresponding to the fourth abnormal frame and the second identification result corresponding to the fifth abnormal frame belong and/or the position offset of the fourth abnormal frame and the fifth abnormal frame.

Since certain displacement of the commodity occurs in the object recognition, for the same commodity, the object detection algorithm recognizes different categories when the object recognition is performed on the first picture and the second picture, and the deduction is also mistaken. In this case, the second detection frame having a position matching relationship with the first detection frame may not be specified according to the position information, and therefore the first detection frame having no position matching relationship is specified as the fourth abnormal frame, the second detection frame having no position matching relationship is specified as the fifth abnormal frame, and the relationship between the fifth abnormal frame and the fourth abnormal frame is comprehensively determined by the category to which the first recognition result corresponding to the fourth abnormal frame and the second recognition result corresponding to the fifth abnormal frame belong and/or the position offset between the fourth abnormal frame and the fifth abnormal frame, so that the second recognition result of the fifth abnormal frame is updated, so that it can be ensured that the recognition results of the first picture and the second picture are consistent although displacement occurs for the same commodity, and thus it is possible to avoid miscalculating the payment amount of the user, thereby giving the user a better experience.

Optionally, for any first detection frame, the method for determining, according to the position information of each first detection frame and each second detection frame, that there is a position matching relationship with any second detection frame includes:

determining a first position of the plurality of first detection frames in the first picture; determining second positions of the plurality of second detection frames in the second picture;

and determining a position overlapping area of the first detection frame and any second detection frame aiming at any first detection frame, and if the position overlapping area meets a third set condition, determining that the first detection frame and the second detection frame have a position matching relationship.

Since the user rarely moves the commodity on the shelf to a larger position in the process of opening and closing the cabinet door to take away the commodity, if the overlapping area between any first detection frame in the first picture and any second detection frame in the second picture meets the third setting condition, it is considered that the displacement of the commodity identified by the first picture and the second picture is small, and the probability that the same commodity is identified by the first picture and the second picture is high. Therefore, whether the first detection frame and the second detection frame have the position matching relationship can be determined through the position overlapping area of the first detection frame and the second detection frame, the first similarity of the first abnormal frame and the second abnormal frame is further judged through the similarity of the subsequent first sub-image and the second sub-image, calculation can be simplified, meanwhile, whether the commodities encircled by the first abnormal frame in the first picture and the commodities encircled by the second abnormal frame in the second picture are the same commodities can be determined more accurately, and the picture identification accuracy is improved.

Optionally, updating the second recognition result of the fifth abnormal box to the first recognition result of the fourth abnormal box according to the attribute category to which the first recognition result corresponding to the fourth abnormal box and the second recognition result corresponding to the fifth abnormal box belong and/or the position offset of the fourth abnormal box and the fifth abnormal box, including:

if the number of the fourth abnormal box and the number of the fifth abnormal box with the same attribute category are respectively larger than one, determining a plurality of matching schemes aiming at the fourth abnormal box and the fifth abnormal box;

determining an optimal matching scheme according to the position offset of each fourth abnormal frame and each fifth abnormal frame in each matching scheme;

and updating the second recognition result of the fifth abnormal frame into the first recognition result of the corresponding fourth abnormal frame based on the optimal matching scheme.

When determining how to update the identification result through the attribute category to which the first identification result corresponding to the fourth abnormal frame and the second identification result corresponding to the fifth abnormal frame belong and/or the position offset of the fourth abnormal frame and the fifth abnormal frame, determining a plurality of matching schemes in the same attribute category, selecting an optimal matching scheme according to the position offset of each fourth abnormal frame and each fifth abnormal frame in each matching scheme, and updating the identification result according to the optimal matching scheme.

Optionally, if there is more than one fourth exception box and one fifth exception box with the same attribute category, determining multiple matching schemes for the fourth exception box and the fifth exception box, including:

selecting one fifth abnormal frame from the fifth abnormal frames for combination and matching aiming at any fourth abnormal frame until no unmatched fourth abnormal frame or unmatched fifth abnormal frame exists, and obtaining a matching scheme; wherein the number of matching schemes is derived from a combination algorithm.

By determining the matching schemes through the method, the fourth abnormal frames and the fifth abnormal frames which do not have the position matching relationship can be matched in detail in each matching scheme, so that any fourth abnormal frame and any fifth abnormal frame are ensured not to be omitted.

Optionally, determining an optimal matching scheme according to the position offset of each fourth abnormal frame and each fifth abnormal frame in each matching scheme includes:

determining the position offset of any fourth abnormal frame and the corresponding fifth abnormal frame aiming at any matching scheme;

and determining the matching scheme with the minimum sum of the position offsets as the optimal matching scheme.

By calculating the sum of the position offsets of any fourth abnormal frame and the corresponding fifth abnormal frame in each matching scheme, the matching scheme with the minimum sum of the position offsets can be determined as the optimal matching scheme, so that the position offsets of each fourth abnormal frame and the corresponding fifth abnormal frame can be considered in the optimal matching scheme, and the accuracy of the finally determined optimal matching scheme is higher.

In a second aspect, an embodiment of the present invention further provides an image recognition apparatus, including:

the identification unit is used for carrying out object identification on the first picture acquired at the first moment to obtain each first detection frame at different positions and each corresponding first identification result; carrying out object recognition on a second picture acquired at a second moment to obtain each second detection frame at different positions and each corresponding second recognition result;

a determining unit, configured to determine, for any first detection frame, if it is determined that there is a position matching relationship with any second detection frame according to position information of the first detection frames and the second detection frames, the first detection frame and the second detection frame having the position matching relationship are respectively determined as a first abnormal frame and a second abnormal frame;

the processing unit is used for determining a position overlapping area of the first abnormal frame and the second abnormal frame; determining a first similarity between a corresponding first sub-image in the first picture and a corresponding second sub-image in the second picture of the position overlapping region; and if the first similarity meets a first set condition, updating the second identification result of the second abnormal frame into the first identification result of the first abnormal frame.

Optionally, the determining unit is further configured to:

the processing unit is further to:

Optionally, the determining unit is further configured to:

the processing unit is further to:

Optionally, the determining unit is specifically configured to:

Optionally, the processing unit is specifically configured to:

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory for storing a computer program;

and the processor is used for calling the computer program stored in the memory and executing the picture identification method listed in any mode according to the obtained program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer-executable program is stored in the computer-readable storage medium, and the computer-executable program is configured to enable a computer to execute the picture identification method listed in any of the above manners.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1A is a schematic diagram of an intelligent vending machine according to an embodiment of the present invention;

fig. 1B is a schematic diagram of a commodity in an intelligent vending machine captured by a camera according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a picture identification method according to an embodiment of the present invention;

fig. 3A is a possible process of object recognition for a first picture according to an embodiment of the present invention;

fig. 3B is a schematic diagram of each second detection frame obtained after object recognition is performed on a second picture according to an embodiment of the present invention;

FIG. 3C is a schematic diagram of a possible determination of the overlapping area of the positions according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a picture identification method according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating a virtual frame determined in a second picture according to position information of a third abnormal frame according to an embodiment of the present invention;

fig. 6 is a schematic flowchart of a picture identification method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a picture recognition apparatus according to an embodiment of the present invention;

fig. 8 is a computer device according to an embodiment of the present invention.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The user takes away the commodity that wants to purchase in the process of opening-closing the cabinet door, but the intelligent vending machine can cause the wrong judgement to the commodity that the user purchased because the picture recognition to the user before and after opening-closing the cabinet door is wrong, thereby has shown wrong deduction amount for the user. The embodiment of the invention provides a method for correcting the recognition error in a targeted manner by analyzing the reason that the intelligent vending machine may have the recognition error, thereby improving the accuracy of picture recognition and providing good shopping experience for users.

The execution main body of the method provided by the embodiment of the invention can be a server, and can also be various intelligent devices such as an intelligent vending machine, an intelligent refrigerator and the like. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Fig. 2 exemplarily shows a picture identification method provided by an embodiment of the present invention, which includes:

step 201, performing object recognition on the first picture acquired at the first moment to obtain each first detection frame at different positions and each corresponding first recognition result.

Step 202, performing object recognition on the second picture acquired at the second moment to obtain each second detection frame at different positions and each corresponding second recognition result.

Step 203, for any first detection frame, if it is determined that there is a position matching relationship with any second detection frame according to the position information of each first detection frame and each second detection frame, determining the first detection frame and the second detection frame having the position matching relationship as a first abnormal frame and a second abnormal frame, respectively.

Step 204, determining a position overlapping area of the first abnormal frame and the second abnormal frame; determining a first similarity between a first sub-image corresponding to the position overlapping region in the first picture and a second sub-image corresponding to the position overlapping region in the second picture; and if the first similarity meets the first set condition, updating the second recognition result of the second abnormal frame into the first recognition result of the first abnormal frame.

In step 201, a first picture is obtained for object recognition, and first detection frames at different positions and corresponding first recognition results are obtained. Fig. 3A illustrates a possible process of object recognition for a first picture, as shown in fig. 3A, the first picture taken by a camera device built in the smart vending machine is a left picture, circles represent various kinds of commodities, and dotted lines are drawn between the commodities to facilitate a reader to understand the movement of the commodities in the following. The object recognition is performed on the first picture, so that each first detection frame at different positions and each corresponding first recognition result can be obtained, as shown in the right figure, at this time, the commodities with the types of A, B, C, D, E, F, G, H and I are displayed in the layer of goods shelf of the intelligent vending machine, and the number of the commodities is 1 respectively.

The method for performing object identification may adopt various methods, such as a one-stage detector and a two-stage detector, which are not limited in this embodiment of the present invention.

In step 202, a second picture is obtained for object recognition, and the method for object recognition is the same as that in step 201, and is not described herein again.

The first picture and the second picture are pictures obtained by shooting the commodity in the intelligent vending machine at different moments, the shooting moments of the first picture and the second picture are not limited, the execution sequence of the step 201 and the step 202 can be changed or can be carried out simultaneously, and the embodiment of the invention is not limited to this.

For example, when the intelligent vending machine detects that the cabinet door is opened and then closed, the camera device in the intelligent vending machine is started to shoot to obtain a first picture, and the first picture is subject to object recognition to obtain each first detection frame and each first recognition result. Then, a second picture shot when the cabinet door is closed after being detected to be opened last time is obtained, and object recognition is carried out on the second picture to obtain each second detection frame and each second recognition result; after the object recognition is performed on each captured picture to obtain the detection frame and the recognition result, the detection frame and the recognition result are both stored, and then the object recognition is performed on the first picture to obtain each first detection frame and each first recognition result, and each second detection frame and each second recognition result are directly called. The above are merely examples, and embodiments of the present invention are not limited thereto.

Fig. 3B shows second detection frames obtained by performing object recognition on the second picture. For convenience of introducing the scheme, it is assumed that the user does not take any commodity in the process of opening and closing the cabinet door, that is, the types and the numbers of the commodities in the second picture are the same as those of the commodities in the first picture, but the recognition results of the two times are wrong.

Referring to fig. 3A and fig. 3B, the following 3 cases may occur in an error: 1. the commodity does not move or moves slightly, but the position of the detection frame changes, so that another commodity is identified, for example, the commodity D in the first picture is identified as a commodity J in the second picture; 2. the article does not move or moves very little, but is not recognized in the recognition of the second picture, for example, the article E in the first picture is not recognized in the second picture; 3. a small amount of displacement of the item has occurred resulting in another item being identified in the object recognition, such as item F, G, H, I in the first picture, being identified in the second picture as item K, L, M, N, respectively.

In the following, the embodiments of the present invention specifically propose solutions for the above three possible situations.

The methods proposed in step 203 and step 204 can make an effective solution to the problem occurring in case 1.

In step 203, the first detection frame and the second detection frame having the position matching relationship are determined to be the first abnormal frame and the second abnormal frame, respectively, according to the position information of each first detection frame and each second detection frame. Because the user rarely moves the commodity on the goods shelf to a larger position in the process of opening and closing the cabinet door to take away the commodity, according to the characteristic, the first detection frame and the second detection frame which are close in position can be determined as two frames with a position matching relationship, and the probability that the two frames are the same commodity is high. Thus, the calculation amount can be reduced in the subsequent calculation, and the accuracy of determining whether the commodities are the same is improved.

Specifically, the embodiments of the present invention provide the following two methods for determining the first detection frame and the second detection frame having a position matching relationship.

In a first mode

Determining first positions of a plurality of first detection frames in a first picture; determining second positions of a plurality of second detection frames in a second picture; and determining a position overlapping area of the first detection frame and any second detection frame aiming at any first detection frame, and determining that the first detection frame and the second detection frame have a position matching relationship if the position overlapping area meets a third set condition.

With reference to the right diagram of fig. 3A and fig. 3B, the coordinates of each first detection frame in the first picture and the coordinates of each second detection frame in the second picture are respectively determined in the same coordinate system. Numbering the detection frames from top to bottom in the order from left to right, comparing the coordinates of the upper left corner and the lower right corner of the 1 st detection frame in the first picture with the coordinates of the 1 st detection frame in the first picture, if the third setting condition is that the proportion of the position overlapping area accounts for more than 90% of the union of the two, determining that the second detection frame with the position overlapping area meeting the third setting condition is the 1 st detection frame in the second picture, and so on, so as to obtain that the 1 st detection frame in the first picture and the 1 st detection frame in the second picture have a position matching relationship, the 2 nd detection frame in the first picture and the 2 nd detection frame in the second picture have a position matching relationship, the 4 th detection frame in the first picture and the 4 th detection frame in the second picture have a position matching relationship, the 5 th detection frame in the first picture and the 5 th detection frame in the second picture have a position matching relationship, the 3 rd detection frame in the first picture and any detection frame in the second picture do not have a position matching relationship, the 6 th detection frame in the first picture and any detection frame in the second picture do not have a position matching relationship, the 7 th detection frame in the first picture and any detection frame in the second picture do not have a position matching relationship, the 8 th detection frame in the first picture and any detection frame in the second picture do not have a position matching relationship, and the 9 th detection frame in the first picture and any detection frame in the second picture do not have a position matching relationship. The above is merely an example, and if the third setting condition is set such that the proportion of the position overlap region occupies 60% or more of the union of the two, the number of the first detection frame and the second detection frame having the position matching relationship will be more. Of course, the more severe the third setting condition is, the fewer the first detection frame and the second detection frame having the position matching relationship, and the calculation amount in this step is reduced, but it may also result in missing some products without moving, but the detection frame moves too much, but if there is missing calculation, the later steps may be compensated. The third setting condition can be set as needed by those skilled in the art.

Mode two

The coordinates of the first detection frames and the coordinates of the second detection frames are not acquired, the first pictures and the second pictures are overlapped, the first detection frames and the second detection frames with the position overlapping regions meeting the set conditions are extracted as the detection frames with the position matching relationship, and by adopting the method, the coordinates of the second detection frames are not required to be compared with the first detection frames one by one, and the first detection frames and the second detection frames with the position matching relationship can be directly determined.

For convenience of description, the first detection frame and the second detection frame having a position matching relationship are determined as the first abnormality frame and the second abnormality frame, respectively.

In step 204, the overlapping area of the positions of the first and second exception boxes is determined. Taking the 1 st detection frame of the first picture and the 1 st detection frame of the second picture as examples, in step 203, it is determined that the coordinates of the upper left corner and the lower right corner of the first abnormal frame are (100, 1100), (300, 900), respectively, the coordinates of the upper left corner and the lower right corner of the second abnormal frame are (110, 1100), (310, 900), respectively, and then the coordinates of the upper left corner and the lower right corner of the position overlapping area are (110, 1100), (300, 900), respectively.

Then, a first sub-picture of the first picture in which the overlapping area is located is determined, and a second sub-picture of the second picture in which the overlapping area is located is determined, for example, in the above example, the first sub-picture of the first picture in which the overlapping area is determined is shown in the left diagram of fig. 3C, the overlapping area is indicated by diagonally shaded portions, the second sub-picture of the second picture in which the overlapping area is determined is shown in the right diagram of fig. 3C, and the overlapping area is indicated by diagonally shaded portions.

The similarity can be determined by a plurality of methods, for example, the similarity between two pictures can be determined by MSE (Mean Squared Error), SSIM (structural similarity index) and PSNR (Peak Signal to Noise Ratio). If the commodity does not move or moves less, the similarity between the first sub-graph and the second sub-graph is high.

If the first similarity meets a first set condition, the first abnormal frame and the second abnormal frame can be determined to enclose the same commodity. And if the first recognition result is different from the second recognition result, updating the second recognition result of the second abnormal frame to be the first recognition result of the first abnormal frame. In the above example, the recognition results of the first and second abnormal frames are both the article a, and thus the recognition is the same. By analogy, the above operations may be performed on the 2 nd detection frame in the first picture and the 2 nd detection frame in the second picture, the 4 th detection frame in the first picture and the 4 th detection frame in the second picture, the 5 th detection frame in the first picture and the 5 th detection frame in the second picture, which have a position matching relationship. Then the first similarity matches the first set condition for the 5 th detection frame in the first picture and the 5 th detection frame in the second picture, but the recognition result is different, so the article J in the second picture is updated to the article D.

Optionally, after the position matching relationship is determined, the recognition results of the two detection frames having the position matching relationship are determined to be different, and then whether the first similarity meets the first set condition is determined according to the position overlapping area of the two detection frames, so that the recognition results can be directly updated. The above are merely examples, and embodiments of the present invention are not limited thereto.

The problem that arises with respect to case 2 can be solved in the following manner. As shown in fig. 4, includes:

step 401, regarding any first detection frame, if any second detection frame having a position matching relationship cannot be determined according to the position information of each first detection frame and each second detection frame, determining the first detection frame having no position matching relationship as a third exception frame; and determining a virtual frame in the second picture according to the position information of the third abnormal frame.

Step 402, for any third abnormal frame, determining a second similarity between a third sub-picture corresponding to the third abnormal frame in the first picture and a fourth sub-picture corresponding to the virtual frame in the second picture; and if the second similarity meets a second set condition, determining the first recognition result of the third abnormal frame as the recognition result of the virtual frame.

Referring to the right diagram of fig. 3A and fig. 3B, the first detection frame and the second detection frame having a position matching relationship are specified in step 203, and if there is no second detection frame having a position matching relationship with any of the first detection frames, the first detection frame is determined as the third anomaly frame.

For example, if it is determined that the 3 rd, 6 th, 7 th, 8 th, and 9 th detection frames in the first picture do not have a position matching relationship with any detection frame in the second picture, the 3 rd, 6 th, 7 th, 8 th, and 9 th detection frames may be determined as third exception frames, and a virtual frame may be determined in the second picture based on position information of each third exception frame. Fig. 5 shows a schematic diagram of determining a virtual frame in the second picture according to the position information of the third normal frame. If the coordinates of the upper left corner and the lower right corner of the 3 rd detection frame in the first picture are (900, 1100) and (1100, 900), respectively, a virtual frame is determined at the corresponding position of the second picture, and so on, 5 virtual frames are determined in the second picture, respectively.

And comparing the similarity of the third sub-image corresponding to the third abnormal frame in the first picture with the similarity of the fourth sub-image corresponding to the corresponding virtual frame in the second picture, for example, for the 3 rd detection frame in the first picture, if the second similarity between the third abnormal frame and the corresponding virtual frame meets a second set condition, it may be determined that the fourth sub-image corresponding to the virtual frame is a commodity identical to the 3 rd detection frame in the first picture, and the recognition result of the virtual frame in the second picture may be determined as a commodity E. If the second similarity does not satisfy the second setting condition, it indicates that the commodity E does not exist at the position of the virtual frame of the second picture, or the commodity E is greatly displaced, so that the similarities of the third sub-picture and the fourth sub-picture do not satisfy the second setting condition.

By analogy, second similarities of the 6 th, 7 th, 8 th and 9 th detection frames in the first picture and the corresponding virtual frames may be determined, and if the second similarities satisfy the set condition, the identification result of the virtual frame in the second picture is determined as the identification result in the corresponding first picture, for example, the identification result of the virtual frame closest to the 6 th detection frame in the second picture is determined as F, and so on.

The problem that arises with respect to case 3 can be solved in the following manner. As shown in fig. 6, includes:

step 601, for any first detection frame, if any second detection frame with a position matching relationship cannot be determined according to the position information of each first detection frame and each second detection frame, determining the first detection frame without the position matching relationship as a fourth abnormal frame, and determining the second detection frame without the position matching relationship as a fifth abnormal frame;

step 602, updating the second recognition result of the fifth abnormal box to the first recognition result of the fourth abnormal box according to the attribute category to which the first recognition result corresponding to the fourth abnormal box and the second recognition result corresponding to the fifth abnormal box belong and/or the position offset of the fourth abnormal box and the fifth abnormal box.

For example, if it is determined that the 3 rd, 6 th, 7 th, 8 th, and 9 th detection frames in the first picture and any detection frame in the second picture do not have a position matching relationship, the 3 rd, 6 th, 7 th, 8 th, and 9 th detection frames may be determined as the fourth abnormal frame, and the 3 rd, 6 th, 7 th, 8 th, and 9 th detection frames in the second picture that do not have a position matching relationship may be determined as the fifth abnormal frame.

Optionally, the method may be further performed after the step shown in fig. 4, after the recognition result of the virtual frame in the second picture is determined as the recognition result of the corresponding third abnormal frame, before the method shown in fig. 6 is performed, the third abnormal frame is excluded, so that the fourth abnormal frame is the 6 th, 7 th, 8 th, and 9 th detection frames in the first picture, and the fifth abnormal frame is the 6 th, 7 th, 8 th, and 9 th detection frames in the second picture.

The above are merely examples, and embodiments of the present invention are not limited thereto.

In the following, the fourth abnormal frame is taken as the 6 th, 7 th, 8 th and 9 th detection frames in the first picture, and the fifth abnormal frame is taken as the 6 th, 7 th, 8 th and 9 th detection frames in the second picture as an example. The picture identification method provided by the embodiment of the invention is introduced.

The following provides 3 methods of determining whether to update the recognition result of the fifth abnormal box to the recognition result of the fourth abnormal box.

In a first mode

And updating the second recognition result of the fifth abnormal frame to the first recognition result of the fourth abnormal frame according to the attribute type to which the first recognition result corresponding to the fourth abnormal frame and the second recognition result corresponding to the fifth abnormal frame belong and the position offset of the fourth abnormal frame and the fifth abnormal frame.

Step one, grouping is carried out according to the attribute categories of the first identification result corresponding to the fourth abnormal frame and the second identification result corresponding to the fifth abnormal frame, and the abnormal frames belonging to the same attribute category are determined to be the same group. Wherein the attribute categories are predefined, e.g., grouped according to a probability of easy confusion, and if item a is often misidentified as items K and L, then item A, K, L is divided into the same group; the commodities can be grouped according to the colors and the styles of the commodities, for example, if the commodity A is bottled king girald, the commodity B is coca cola, and the colors of the commodity A and the commodity B are similar, the commodity A and the commodity B are divided into the same group.

For example, if it is preset that the commodity F and the commodity K belong to the same attribute category, the commodity G and the commodity B belong to the same attribute category, and the commodity H, I and the commodity L, M, N belong to the same attribute category, then in the right diagram of fig. 3A and the right diagram of fig. 3B, the 6 th, 7 th, 8 th, and 9 th detection frames in the first picture and the 6 th, 7 th, 8 th, and 9 th detection frames in the second picture are grouped according to the preset attribute category, and then the following results are obtained: determining a fourth abnormal frame corresponding to the commodity F and a fifth abnormal frame corresponding to the commodity K as a same group; if the commodity G finds out commodities belonging to the same category from the 6 th, 7 th, 8 th and 9 th detection frames of the second picture, the fourth abnormal frame corresponding to the commodity G does not have a corresponding group; the fourth exception box for item H, I and the fifth exception box for item L, M, N are determined to be in the same group.

And step two, determining a matching scheme in each group. If only the fourth exception box or only the fifth exception box exists in the grouping, no matching scheme exists; if the group has a fourth abnormal frame and a fifth abnormal frame, there is one allocation scheme, that is, the fourth abnormal frame and the fifth abnormal frame are matched, and the second identification result of the fifth abnormal frame is directly updated to the first identification result of the fourth abnormal frame, for example, in the above-mentioned grouping scheme, if the fourth abnormal frame corresponding to the commodity F and the fifth abnormal frame corresponding to the commodity K are determined to be the same group, the commodity K in the second picture is updated to the commodity F; if there are more than one fourth exception box in the group and the number of the fifth exception boxes is not 0, or there are more than one fifth exception box and the number of the fourth exception boxes is not 0, then a plurality of matching schemes are determined, for example, in the above grouping scheme, the fourth exception box corresponding to item H, I and the fifth exception box corresponding to item L, M, N are determined to be the same group, then a plurality of matching schemes may be determined according to a combinatorial algorithm, as shown in table one.

Watch 1

And step three, determining the optimal matching scheme according to the position offset of each fourth abnormal frame and each fifth abnormal frame in each matching scheme.

Alternatively, the coordinates of the fourth abnormal frame in which commercial product H, I is located and the coordinates of the fifth abnormal frame in which commercial product L, M, N is located are obtained, and the commercial product with the smallest coordinate offset from the fourth abnormal frame in which commercial product H is located is specified in the fifth abnormal frame in which commercial product L, M, N is located. For example, the coordinate offset of the commodity M and the coordinate offset of the commodity H are minimum, and similarly, the coordinate offset of the commodity I and the coordinate offset of the commodity N are minimum, the optimal matching scheme is that the fourth abnormal frame where the commodity H is located is matched with the fifth abnormal frame where the commodity M is located, and the fourth abnormal frame where the commodity I is located is matched with the fifth abnormal frame where the commodity N is located.

Alternatively, the total position offset amount in each matching scheme may also be calculated to determine the optimal matching scheme. For example, in the matching schemes in 6 shown in table one, the total position offset of each matching scheme is calculated, and for the matching scheme 1, the coordinates of the upper left corner and the lower right corner of the fourth abnormal frame where the commodity H is locatedAre respectively (X)_hl1,Y_hl1)，(X_hr1,Y_hr1) The coordinates of the fifth abnormal frame in which the commodity L is located are (X)_ll1,Y_ll1)，(X_lr1,Y_lr1) The coordinates of the upper left corner and the lower right corner of the fourth abnormal frame where the commodity I is located are (X)_il1,Y_il1)，(X_ir1,Y_ir1) The coordinates of the fifth abnormal frame in which the article M is located are (X)_ml1,Y_ml1)，(X_mr1,Y_mr1).

The amount of positional deviation between the fourth abnormal frame in which the article H is located and the fifth abnormal frame in which the article L is located is p1 (X)_ll1-X_hl2)²+(Y_ll1-Y_hl2)²+(X_lr1-X_hr2)²+(Y_lr1-Y_hr2)²(ii) a The positional offset between the fourth abnormal frame in which the article I is located and the fifth abnormal frame in which the article M is located is p2 ═ X (X)_ml1-X_il2)²+(Y_ml1-Y_il2)²+(X_mr1-X_ir2)²+(Y_mr1-Y_ir2)². Then the total amount of position offset for matching scheme 1 is S1 ═ p1+ p 2; in this way, the total amount of position offset in each matching scheme can be calculated, as shown in table two.

Watch two

The matching scheme with the minimum total position offset, i.e., the sum of the offsets, is determined as the optimal matching scheme.

And fourthly, updating the second recognition result of the fifth abnormal frame into the first recognition result of the corresponding fourth abnormal frame based on the optimal matching scheme.

For example, in the third step, by calculating the total position offset, it is determined that the matching solution 4 is the optimal matching solution, the identification result of the fifth abnormal frame in which the commodity M is located in the second picture is updated to the commodity H, and the identification result of the fifth abnormal frame in which the commodity N is located in the second picture is updated to the commodity I.

Mode two

And updating the second recognition result of the fifth abnormal box to the first recognition result of the fourth abnormal box only according to the attribute categories to which the first recognition result corresponding to the fourth abnormal box and the second recognition result corresponding to the fifth abnormal box belong.

In the method, each preset attribute category only includes two commodity categories, for example, the commodity a and the commodity B are of the same attribute category, the commodity I and the commodity N are of the same attribute category, and the like, when the fourth exception frame and the fifth exception frame are grouped according to the attribute categories, at most only the fourth exception frame or the fifth exception frame corresponding to two commodities exists in each group. The two exception boxes may be directly matched and the second recognition result of the fifth exception box in the group may be updated to the first recognition result of the fourth exception box. For a specific implementation, reference may be made to the first implementation, which is not described herein again.

Mode III

And updating the second recognition result of the fifth abnormal frame to the first recognition result of the fourth abnormal frame according to the position offset of the fourth abnormal frame and the fifth abnormal frame.

In the method, the fourth exception frames and the fifth exception frames do not need to be grouped in advance according to the attribute types, and the fifth exception frame corresponding to each fourth exception frame can be determined according to the position offset.

By the method provided by the embodiment of the invention, even if the commodity type identification before the user purchases is wrong, the problem of multi-calculation deduction amount of money for the user can be avoided as long as the user keeps the same type after purchasing. Therefore, the probability of recognition errors is greatly reduced, the goods loss caused by the recognition errors is further reduced, the user experience is improved, and the operation is helped to control the goods replenishment demand more accurately.

Based on the same technical concept, fig. 7 exemplarily shows a structure of a picture recognition apparatus provided by an embodiment of the present invention, and the structure can perform a flow of picture recognition.

As shown in fig. 7, the apparatus specifically includes:

an identifying unit 701, configured to perform object identification on a first picture acquired at a first time to obtain first detection frames at different positions and corresponding first identification results; carrying out object recognition on a second picture acquired at a second moment to obtain each second detection frame at different positions and each corresponding second recognition result;

a determining unit 702, configured to determine, for any first detection frame, if it is determined that there is a position matching relationship with any second detection frame according to position information of the first detection frames and the second detection frames, the first detection frame and the second detection frame having the position matching relationship are respectively determined as a first abnormal frame and a second abnormal frame;

a processing unit 703, configured to determine a position overlapping area of the first exception box and the second exception box; determining a first similarity between a corresponding first sub-image in the first picture and a corresponding second sub-image in the second picture of the position overlapping region; and if the first similarity meets a first set condition, updating the second identification result of the second abnormal frame into the first identification result of the first abnormal frame.

Based on the same technical concept, the embodiment of the present application provides a computer device, as shown in fig. 8, including at least one processor 801 and a memory 802 connected to the at least one processor, where a specific connection medium between the processor 801 and the memory 802 is not limited in the embodiment of the present application, and the processor 801 and the memory 802 are connected through a bus in fig. 8 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.

In the embodiment of the present application, the memory 802 stores instructions executable by the at least one processor 801, and the at least one processor 801 may execute the steps of the image recognition method by executing the instructions stored in the memory 802.

The processor 801 is a control center of the computer device, and may connect various parts of the computer device by using various interfaces and lines, and perform picture recognition by executing or executing instructions stored in the memory 802 and calling data stored in the memory 802. Optionally, the processor 801 may include one or more processing units, and the processor 801 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801. In some embodiments, the processor 801 and the memory 802 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 801 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 802 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 802 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Based on the same technical concept, an embodiment of the present invention further provides a computer-readable storage medium, where a computer-executable program is stored in the computer-readable storage medium, and the computer-executable program is configured to enable a computer to execute the method for picture identification listed in any of the above manners.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A picture recognition method is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 1, wherein the method for determining, for any first detection frame, a position matching relationship with any second detection frame according to the position information of the first detection frames and the second detection frames comprises:

5. The method according to claim 3, wherein updating the second recognition result of the fifth abnormal box to the first recognition result of the fourth abnormal box according to the attribute category to which the first recognition result corresponding to the fourth abnormal box and the second recognition result corresponding to the fifth abnormal box belong and/or the position offset of the fourth abnormal box and the fifth abnormal box comprises:

6. The method of claim 5, wherein if there is more than one fourth exception box and fifth exception box with the same attribute class, determining a plurality of matching schemes for the fourth exception box and the fifth exception box comprises:

7. The method of claim 6, wherein determining the optimal matching scheme according to the position offset of the fourth exception box and the fifth exception box in each matching scheme comprises:

8. An image recognition apparatus, comprising:

9. A computing device, comprising:

a memory for storing a computer program;

a processor for calling a computer program stored in said memory, for executing the method of any one of claims 1 to 7 in accordance with the obtained program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer-executable program for causing a computer to execute the method of any one of claims 1 to 7.