CN115273063A

CN115273063A - Method and device for determining object information, electronic equipment and storage medium

Info

Publication number: CN115273063A
Application number: CN202210776545.6A
Authority: CN
Inventors: 倪子涵; 安容巧; 孙逸鹏; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-11-01

Abstract

The disclosure provides a method and a device for determining object information, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the field of image processing. The specific implementation scheme is as follows: determining at least one image pair from the at least two target images, wherein the image pair comprises: a first image and a second image, and the first image and the second image have an overlap characterizing a same location; respectively carrying out target detection on the first image and the second image aiming at each image pair to obtain a first target detection frame corresponding to the first image and a second target detection frame corresponding to the second image; and in the first object detection frame and the second object detection frame, the object detection frame indicating the same object is subjected to duplication elimination to obtain a third object detection frame corresponding to each image pair; and determining object information related to the target image according to the third object detection frame.

Description

Method and device for determining object information, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to the field of image processing, and more particularly, to a method, an apparatus, an electronic device, a storage medium, and a computer program product for determining object information.

Background

In the fast-moving industry, the commodities on the shelf can be counted to obtain the information such as the commodity type, the commodity quantity and the like, and then the information such as the sales volume, the goods intake volume and the like of various commodities can be determined based on the information.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, a storage medium, and a computer program product for determining object information.

According to an aspect of the present disclosure, there is provided a method of determining object information, including: determining at least one image pair from the at least two target images, wherein the image pair comprises: a first image and a second image, and the first image and the second image have an overlap characterizing a same location; respectively carrying out target detection on the first image and the second image aiming at each image pair to obtain a first target detection frame corresponding to the first image and a second target detection frame corresponding to the second image; and in the first object detection frame and the second object detection frame, the object detection frame indicating the same object is subjected to duplication elimination to obtain a third object detection frame corresponding to each image pair; and determining object information related to the target image according to the third object detection frame.

According to another aspect of the present disclosure, there is provided an apparatus for determining object information, including: the image pair determination module is configured to determine at least one image pair from at least two target images, wherein the image pair comprises: the first image and the second image, and the first image and the second image have an overlap characterizing a same location. The de-duplication module is used for respectively carrying out target detection on the first image and the second image aiming at each image pair to obtain a first object detection frame corresponding to the first image and a second object detection frame corresponding to the second image; and performing duplication removal on the object detection frame indicating the same object in the first object detection frame and the second object detection frame to obtain a third object detection frame corresponding to each image pair. The information determining module is used for determining object information related to the target image according to the third object detection frame.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by the present disclosure.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method provided by the disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a method and apparatus for determining object information according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a method of determining object information in accordance with an embodiment of the present disclosure;

FIG. 3A is a schematic diagram of a candidate movement trajectory according to an embodiment of the present disclosure;

FIG. 3B is a schematic diagram of another candidate movement trajectory according to an embodiment of the present disclosure;

FIG. 3C is a schematic diagram of another candidate movement trajectory according to an embodiment of the present disclosure;

FIG. 3D is a schematic diagram of another candidate movement trajectory according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram of acquiring a plurality of images according to an embodiment of the present disclosure;

FIG. 5A is a schematic diagram of stitching two images arranged one above the other according to an embodiment of the disclosure;

FIG. 5B is a schematic diagram of stitching two images arranged left and right according to an embodiment of the present disclosure;

FIG. 5C is a schematic diagram of stitching two columns of left and right images according to an embodiment of the present disclosure;

FIG. 6A is a schematic illustration of a deduplication operation in accordance with an embodiment of the present disclosure;

FIG. 6B is a schematic illustration of a deduplication operation in accordance with an embodiment of the present disclosure;

fig. 7 is a schematic configuration block diagram of an apparatus for determining object information according to an embodiment of the present disclosure; and

fig. 8 is a block diagram of an electronic device for implementing a method of determining object information according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

With the development of image recognition capability, more and more enterprises can identify and count commodities on a single picture by means of target identification technology. However, when the shelf is long and the shelf pitch is narrow, it is difficult to capture the entire shelf as one image.

In some technical solutions, multiple image acquisitions may be performed on a target area (for example, a shelf on which goods are placed), so as to obtain multiple target images, where the multiple target images may reflect information of the goods placed in multiple sub-areas on the shelf, and acquisition viewing angles of the multiple target images may be different. The mapping relationship of points of the same plane between different images can be described by using a homography matrix, and then a plurality of target images are spliced into a complete image based on the homography matrix. Based on the spliced complete image, the commodity information on the shelf can be counted.

It should be noted that, when a plurality of target images are spliced into a complete image, the splicing effect is poor. Since the commodity information needs to be counted based on the spliced complete image, the accuracy of the counting result is directly influenced by the splicing effect of the multiple target images.

The embodiment of the disclosure aims to provide a method for determining object information, which does not need to splice a target image into a complete image, but obtains a first object detection frame and a second object detection frame included in the target image through target detection, removes the first object detection frame and the second object detection frame, and then counts the object information according to a third object detection frame after duplication removal, thereby improving the accuracy of a statistical result.

The technical solutions provided by the present disclosure will be described in detail below with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a schematic view of an application scenario of a method and an apparatus for determining object information according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and otherwise process data such as the received user request, and feed back a processing result (for example, an image obtained by stitching adjacent images, object information determined according to the target image, and the like) to the terminal device.

It should be noted that the method for determining object information provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the apparatus for determining object information provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method for determining object information provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the apparatus for determining object information provided by the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 is a schematic flow chart diagram of a method of determining object information according to an embodiment of the present disclosure.

As shown in fig. 2, the method 200 of determining object information may include operations S210 to S230.

In operation S210, at least one image pair is determined according to at least two target images, wherein the image pair includes: a first image and a second image, and the first image and the second image have an overlap characterizing a same location.

In some embodiments, a video capture device such as a video recorder may be used to capture video data for a target area, and then video frames corresponding to at least two sub-areas in the target area are captured from the video data as a target image.

For example, the target area may include a shelf on which objects are placed, which may be merchandise, work pieces, courier packages, and the like. The sub-regions may be arranged in a predetermined direction, and the predetermined direction may include at least one of a horizontal direction and a vertical direction. The areas and shapes of the sub-regions may be different.

In other embodiments, an image acquisition device such as a camera may be used to acquire images of at least two sub-regions in the target region to obtain the target image.

For example, at least two sub-regions in the target region may be respectively subjected to image acquisition based on the target movement trajectory, so as to obtain at least two target images, where the target movement trajectory is described in detail below and is not described herein again. In other examples, the image may also be acquired without following the target movement trajectory, for example, the target area is divided into at least two sub-areas in advance, and adjacent sub-areas have overlapping areas, and then the image of the sub-areas is acquired randomly.

For example, when the sub-regions are arranged in the horizontal direction, two target images obtained by image acquisition for two adjacent sub-regions located in the same row may be determined as an image pair.

For another example, when the sub-regions are arranged in the vertical direction, two target images obtained by image acquisition for two sub-regions that are located in the same column and adjacent to each other may be determined as an image pair.

For another example, in the image capturing process, if two adjacent captured images are required to have an overlapping ratio, two adjacent images based on the capturing order may be determined as an image pair.

The overlapping portion may represent a partial image of the same physical area in the first image and the second image, and the partial image may represent the same area in a shelf, the same product on a shelf, or the like, for example.

It should be noted that different image pairs may include the same target image, for example, one image pair includes image pic1 and image pic2, and another image pair includes image pic2 and image pic3.

In operation S220, target detection is performed on the first image and the second image for each image pair, respectively, to obtain a first target detection frame corresponding to the first image and a second target detection frame corresponding to the second image. And performing de-duplication on the object detection frame indicating the same object in the first object detection frame and the second object detection frame to obtain a third object detection frame corresponding to each image pair.

For example, the object detection model may be used to perform object detection on the objects in the first image and the second image, respectively, to obtain a first object detection frame and a second object detection frame. The target detection model may be a YOLO (You Only Look one) model. The number of the first object detection frame and the second object detection frame may each be at least one.

For example, the first image includes first object detection frames box11, box12, and box13, the second image includes second object detection frames box21 and box22, where the first object detection frame box11 and the second object detection frame box21 represent the same object, one of the first object detection frame box1 and the second object detection frame box2 may be deleted, and the third object detection frame after the duplication removal includes box12, box13, box21, and box22.

In operation S230, object information related to the target image is determined according to the third object detection frame.

For example, after the deduplication operation, statistics may be performed on the corresponding third object detection frame for each image pair, thereby obtaining object information. The object information may include information of the kind of the object, the number of each kind of the object, and the like.

In practical applications, when the target area is a shelf and the object is an article placed on the shelf, information such as the stock amount of the article, the sales amount of the article, and the like can be determined from the object information, thereby providing a reference for the merchant to stock.

According to the technical scheme provided by the embodiment of the disclosure, the first object detection frame and the second object detection frame included in the target image are obtained through target detection, the first object detection frame and the second object detection frame are removed, and then the object information is counted according to the third object detection frame after the duplication is removed. The target image does not need to be spliced into a complete image, and the object information is determined according to the complete image, so that the problem that the statistical result of the object information is influenced due to poor splicing effect can be solved, and the accuracy of the object information is improved.

Fig. 3A to 3D are schematic diagrams of candidate movement trajectories according to an embodiment of the disclosure.

The method for determining the object information is used for processing the target image so as to obtain the object information related to the target image. It can be seen that the target image may be acquired prior to processing the target image.

In some embodiments, in order to facilitate the user to capture the target image, the front-end guiding module may be used to guide the image capturing position of the image capturing device, and the front-end guiding module may be a software module.

First, candidate movement trajectories of the image capturing apparatus may be set in advance. For a single row of shelves, the candidate movement trajectory may include one of a "left" trajectory and a "right" trajectory. For multiple rows of shelves, the candidate movement trajectories may include a "downward right" trajectory as shown in FIG. 3A, an "upward right" trajectory as shown in FIG. 3B, a "downward left" trajectory as shown in FIG. 3C, and an "upward left" trajectory as shown in FIG. 3D. In fig. 3A to 3D, broken-line arrows indicate candidate movement trajectories, and

images

301, 302, 303, and 304 indicate four target images sequentially acquired along the candidate movement trajectories.

It can be seen that, by adopting the candidate movement tracks for the multi-row shelf, the movement track of the image acquisition device is shorter, so that the images in two adjacent columns have overlapping rate more easily.

Secondly, the target movement trajectory can be determined from a plurality of candidate movement trajectories through information when the image is acquired. The following describes a process of determining a target movement trajectory, taking as an example that the target area includes a plurality of sub-areas arranged in a predetermined direction, and the predetermined direction may include at least one of a horizontal direction and a vertical direction.

The moving direction of the image acquisition device from a first position to a second position can be detected, the first position is a physical position where the image acquisition device is located when the ith image is acquired, the second position is a physical position where the image acquisition device is located when the (i + 1) th image is acquired, and i is an integer greater than or equal to 1.

After the image acquisition device acquires the first image, the image acquisition device can move in any direction of up, down, left and right, and acquires the second image.

If it is detected that after the first image is captured, the image capture device moves in a horizontal direction (e.g., left or right) and captures a second image, which indicates that the single-layer shelf is image captured. At this time, the moving direction at the time of the first horizontal movement may be set as the target moving direction.

If it is detected that the image capture device moves in a vertical direction (e.g., up or down) after capturing the first image and captures the second image, it indicates that the image capture is performed on the multi-shelf. If the vertical direction is upward, the first direction is recorded as "upward". If the vertical direction is downward, the first direction is recorded as "downward".

If the situation that the image acquisition device moves along the horizontal direction for the first time and acquires the (i + 1) th image after the ith image is acquired is detected, wherein i is an integer greater than or equal to 2 and indicates that the acquisition of the first column of sub-regions is completed. If the horizontal direction is to the left, then the second direction is recorded as "to the left". If the horizontal direction is to the right, then the second direction is recorded as "to the right".

Then, the target moving track of the image acquisition device can be determined through the first direction and the second direction. For example, the first direction is "upward" and the second direction is "leftward", the target movement trajectory is the "upward leftward" trajectory shown in fig. 3D.

In addition, the number of images that have been acquired before the image acquisition device first appeared to move horizontally and acquire an image may also be recorded and taken as the target number. It may be defined that the number of images corresponding to each column of sub-regions is the same as each other.

Next, the front end guiding module may guide an image capturing position of the image capturing device through the target moving trajectory. In addition, when the image is captured, if the moving direction of the image capturing device is different from the target moving direction indicated in the target moving trajectory, it is also possible to prohibit the image from being captured.

The embodiment of the disclosure guides the user to acquire the images through the front end guide module, is suitable for the scene of image acquisition on multi-row goods shelves and single-row goods shelves, and can more easily obtain adjacent images with overlapping rates meeting requirements through guiding the moving track of the image acquisition device, thereby improving the accuracy rate of de-duplication on the target images.

Fig. 4 is a schematic flow diagram of acquiring multiple images according to an embodiment of the present disclosure.

As shown in fig. 4, in this embodiment, in order to make two adjacent images have some overlapping ratio based on the acquisition order, the method 400 of acquiring a target image may include operations S401 to S406.

In operation S401, it is determined whether a previous image exists in a set to which a target image corresponds. If not, it indicates that the first image is not acquired, and operation S402 may be performed. If so, operation S403 may be performed.

In operation S402, in response to receiving an image capture instruction, a viewing position of an image capture device is captured, and the captured image is added to a corresponding set of target images. It can be seen that the image acquired by this operation is the first image in the set.

In operation S403, an overlap ratio between the current estimated image and the previous image for the target position is determined.

For example, the previous image may be the last acquired image in the set based on the acquisition order. A sub-area in the target area may be framed to obtain the current estimated image.

In some embodiments, features may be extracted from the current estimated image and the previous image, respectively, using SIFT (Scale-Invariant Feature Transform) or other means. Features in the current estimated image are then matched with features in the previous image using an ANN (Artificial Neural Network) or other means, homography matrices for the current estimated image and the previous image are then determined from the matched feature pairs, and an overlap ratio of the current estimated image and the previous image is determined based on the homography matrices.

In some embodiments, in the process of extracting features from the current estimation image and the previous image respectively, the features may be extracted from the local image of the current estimation image and the local image in the previous image respectively according to the relative position relationship between the image capture region corresponding to the current estimation image and the image capture region corresponding to the previous image, so as to improve the feature extraction efficiency. For example, when the image capture region corresponding to the current estimated image is located on the left side of the image capture region corresponding to the previous image, the feature may be extracted from the partial image close to the right boundary in the current estimated image, the feature may be extracted from the partial image close to the left boundary in the previous image, and the area ratio of the partial image to the complete image may be 0.3, 0.5, or the like.

Further, in order to avoid the shaking or abrupt change of the overlapped portion of the current estimated image and the previous image in moving the image pickup device, the overlapped portion may be stably tracked using an LK (Lucas-Kanade) -based optical flow tracking algorithm.

Furthermore, in the current estimated image, the overlapping portion may be displayed by a Mask (Mask), and the size and the display position of the Mask may vary with the viewing position of the current estimated image.

In operation S404, it is determined whether the overlap rate is less than the overlap rate threshold. If so, operation S405 may be performed. If not, operation S406 may be performed.

The current overlap rate may be determined every predetermined time period, which may be 1 millisecond.

In operation S405, prompt information indicating a target moving direction is generated.

For example, if the overlap ratio is less than the overlap ratio threshold, then acquisition of the current estimated image for the target location may be prohibited. And a prompt instruction can be generated so as to prompt a user to adjust the view finding position of the image acquisition device along the target moving direction indicated by the target moving track, so that the overlapping rate is improved.

In operation S406, image capturing is performed on the target location according to the received image capturing instruction, and the captured image is determined as the target image and added to the set.

According to the technical scheme provided by the embodiment of the disclosure, two adjacent images based on the acquisition sequence have an overlapping part, so that the problem that a local area in the target area is missed in the process of acquiring the images can be alleviated. In addition, the overlapping rate between two adjacent acquired images can be larger than or equal to the overlapping rate threshold, and the larger overlapping rate can improve the accuracy rate of the de-duplication in the process of de-duplicating the object detection frame based on the characteristics of the overlapping part.

According to another embodiment of the present disclosure, the method of determining object information may further include the operations of: splicing two target images obtained by image acquisition of two sub-regions adjacent to each other in a preset direction in the target images to obtain a spliced image, and then outputting the spliced image. And in response to receiving the canceling instruction, deleting the target image corresponding to the canceling instruction in the target image according to the canceling instruction.

As shown in fig. 5A, the predetermined direction may be a vertical direction, and it can be seen that two sub-regions adjacent to each other along the predetermined direction are arranged up and down, and the two images corresponding to the two sub-regions may be the first image 501 and the second image 502 in fig. 5.

In the stitching process, when the two sub-regions are arranged up and down, the first image corresponds to the upper sub-region, and the second image corresponds to the lower sub-region, the feature points may be extracted from the partial image of the lower half portion in the first image and the partial image of the upper half portion in the second image.

For example, as shown in fig. 5A, feature points may be extracted from the first image 501 and the second image 502, respectively. Then, the feature points are matched to obtain a feature point pair, and two points connected by the same line segment in fig. 5A represent the matched feature point pair. The feature point pairs may then be used to determine homography matrices for the first image 501 and the second image 502, and the first image 501 and the second image 502 may be stitched using the homography matrices.

As shown in fig. 5B, the predetermined direction may be a horizontal direction, and it can be seen that two sub-regions adjacent in the predetermined direction are arranged left and right, and the two images corresponding to the two sub-regions may be the first image 503 and the second image 504 in fig. 5B.

In the stitching process, when the two sub-regions are arranged up and down, the first image corresponds to the left sub-region, and the second image corresponds to the right sub-region, the feature points may be extracted from the right half image of the first image and the left half image of the second image.

For example, as shown in fig. 5B, feature points may be extracted from the distribution of the dashed box region of the first image 503 and the dashed box region of the second image 504. Two points connected by the same line segment in fig. 5B represent a matched pair of characteristic points. Then, a homography matrix is determined through the matched feature point pairs, and the first image 503 and the second image 504 are spliced in rows based on the homography matrix.

As shown in fig. 5C, the predetermined direction may include a vertical direction and a horizontal direction, in which case the plurality of sub-regions are arrayed. At least two images obtained by image acquisition of at least two sub-regions located in the same column in the plurality of images can be spliced to be used as a spliced image. In addition, two adjacent columns of images can be spliced again to serve as another spliced image.

For example, the first sub-region is located on the left side of the second sub-region, the third sub-region is located on the left side of the fourth sub-region and both have an overlap, the first sub-region is located above the third sub-region and both have an overlap, and the second sub-region is located above the fourth sub-region and both have an overlap. And respectively acquiring images of the four sub-areas to obtain

images

505, 506, 507 and 508.

The

images

505 and 507 may be stitched together to obtain a first column of stitched images, and the image 506 and the image 508 may be stitched together to obtain a second column of stitched images, in a manner referred to as stitching the first image 501 and the second image 502. The first column of stitched images and the second column of stitched images may be stitched together by referring to a manner of stitching the first image 503 and the second image 504, so as to obtain a target stitched image.

For example, at least one of the first column of stitched images, the second column of stitched images, and the target stitched image may be presented. Through displaying the spliced images, a user can intuitively know whether the acquired images can be aligned or not. If the alignment effect is poor, a user can cancel the acquired image by operating a triggering cancellation instruction, and then can acquire the image again.

According to the embodiment of the disclosure, the spliced images are displayed for the user, so that the user can delete the images with poor splicing effect and acquire the images again, and the accuracy of duplicate removal is improved.

In addition, the embodiment of the disclosure can only splice the images in the same column, the same row and two adjacent columns without splicing the images in multiple columns into one image. For example, a first column of images and a second column of images may be stitched together, and a second column of images and a third column of images may be stitched together without stitching together the first column of images, the second column of images, and the third column of images. Therefore, the splicing process can be simplified, resources required by splicing are saved, and the real-time performance of displaying the spliced images is improved.

According to another embodiment of the present disclosure, the operation of removing the duplicate of the object detection frame indicating the same object in the first object detection frame and the second object detection frame may include the following operations: and determining matched feature point pairs according to the first object detection frame and the second object detection frame, wherein each feature point pair comprises a first feature point in the first object detection frame and a second feature point in the second object detection frame. Homography matrices for the first image and the second image are then determined based on the pairs of feature points. The position of the overlap in the first image is determined from the homography matrix. Then, the first object detection frames in the overlapping portion among the first object detection frames are deleted.

For example, in order to avoid the influence of the surrounding environment on the determination of the overlap portion, the present embodiment filters the feature points according to the object detection frame, and performs feature point matching based on only the feature points inside the object detection frame. And then calculating a homography matrix according to the plurality of matched characteristic point pairs, wherein the method for calculating the homography matrix is not limited in the embodiment.

Next, the second image may be mapped onto the first image based on the homography matrix, resulting in an overlap of the second image on the first image. If the first object detection frame is completely in the overlapping portion, the first object detection frame can be deleted. If a part of the first object detection frame is located in the overlapping portion and the other part is located outside the overlapping portion, the first object detection frame may be left or deleted.

For example, as shown in fig. 6A, in the first image, the gray area indicates the overlapping portion, and it can be seen that the first object detection frames 601, 602, and 605 are located outside the overlapping portion, and therefore, deletion is not necessary. Since the first object detection frames 604 and 606 are located inside the overlapping portion, the first object detection frames 604 and 606 can be deleted. A part of the first object detection frame 603 is located inside the overlapping portion, and the first object detection frame 603 may be deleted or left.

According to the embodiment of the disclosure, the position of the overlapping portion of the first image and the second image is determined according to the matched feature point pair in the object detection frame, and then the duplication elimination is performed according to the position of the overlapping portion and the position of the object detection frame, so that the duplication elimination accuracy can be improved.

According to another embodiment of the present disclosure, the operation of removing the duplicate of the object detection frame indicating the same object in the first object detection frame and the second object detection frame may include the following operations: and determining a matching relation between the first object detection frame and the second object detection frame, and then performing de-duplication on the first object detection frame according to the matching relation, the first relative position information and the second relative position information. The first relative position information indicates a relative position between an image capturing region corresponding to the first image and an image capturing region corresponding to the second image; the second relative position information indicates a relative position of the first object detection frame in the first image.

For example, the operation of determining the matching relationship may be performed in response to detection of failure in execution of the operation of determining the homography matrix from the first object detection box and the second object detection box. For example, it may be determined that the homography calculation failed in the case where the number of matched pairs of feature points is less than a first number threshold, which may be 4, 5, 10, etc.

For example, for the first object detection frame Di in the first image and the second object detection frame Dj in the second image, the similarity between the partial image corresponding to the first object detection frame Di and the partial image corresponding to the second object detection frame Dj may be calculated, and in the case where the similarity is equal to or greater than the similarity threshold value, it is determined that the first object detection frame Di and the second object detection frame Dj match. For example, the similarity threshold may be set to 0.6.

For another example, the feature points in the first object detection frame Di may be matched with the feature points in the second object detection frame Dj, and if the number of matched feature point pairs satisfies a predetermined condition, it may be determined that the first object detection frame Di and the second object detection frame Dj are matched. The predetermined condition may include at least one of: the number of matched pairs of feature points is equal to or greater than the second number threshold, and the ratio of the number of matched pairs of feature points to the number of feature points in the first object detection frame Di is equal to or greater than the ratio threshold. The second quantity threshold may be 2, 3, 8, etc., and the ratio threshold may be 0.3, etc.

For example, if the first relative position information indicates: and in the preset direction, the image acquisition area corresponding to the first image is positioned behind the image acquisition area corresponding to the second image. The first object detection frame Di can be deleted and also the first object detection frame located behind the first object detection frame Di in the predetermined direction can be deleted.

For example, as shown in fig. 6B, the first image includes first

object detection boxes

607, 608, 609, 610 arranged in sequence from left to right, and the second image includes second

object detection boxes

609', 611, 612 arranged in sequence from left to right, wherein the first object detection box 609 matches the second object detection box 609'. Since the image pickup area corresponding to the first image is located on the left side of the image pickup area corresponding to the second image, the first object detection frames 609 and 610 may be deleted.

According to the matching characteristic point pairs in the object detection frames, the object detection frames matched in the first image and the second image are determined, and then duplication elimination is carried out based on the first relative position information, the second relative position information and the matching relation. Therefore, even if a certain first object detection frame on the first image fails to match a second object detection frame on the second image, whether to delete the first object detection frame can be determined according to the first relative position information and the second relative position information, thereby improving the deduplication accuracy.

According to another embodiment of the present disclosure, the operation of removing the duplicate of the object detection frame indicating the same object in the first object detection frame and the second object detection frame may include the following operations: determining first object sequence information according to the relative position of the first object detection frame in the first image, determining second object sequence information according to the relative position of the second object detection frame in the second image, and then determining repeated sequence segment information in the first object sequence information and the second object sequence information. And then deleting the object detection frame indicated by the sequence segment information according to the first relative position information and the repeated sequence segment information. The first relative position information indicates a relative position between an image pickup area corresponding to the first image and an image pickup area corresponding to the second image.

For example, the operation of determining the first object sequence information and the second object sequence information may be performed after performing the determination of the matching relationship between the first object detection frame and the second object detection frame, and in a case where the execution result indicates that the first object detection frame and the second object detection frame cannot be matched.

An SKU (Stock Keeping Unit) may be used to represent an object corresponding to the object detection box. For example, if the plurality of objects are a plurality of products and the plurality of products differ in information such as color and specification, SKUs of the plurality of products differ. And sequentially arranging the SKUs of the plurality of objects according to the placing sequence of the objects to obtain object sequence information. In addition, the number of layers of the object on the shelf may be determined by the shelf line, and the repetitive sequence piece information may be determined based on the first object sequence information and the second object sequence information of the object on the same shelf.

For example, the first object sequence information is ABCDE, that is, the objects in the first image include objects a, B, C, D, and E arranged in order from left to right. The second object sequence information is DEBCA, that is, the objects in the second image include objects D, E, B, C, A arranged in sequence from left to right. It can be seen that the repeated sequence fragment information is BC and DE. Further, the first relative position information indicates that the image capturing region corresponding to the first image is located on the left side of the image capturing region corresponding to the second image, and therefore, it is possible to select the repeated sequence section information near the rear end in the first object sequence information and delete the first object detection frame indicated by the sequence section information, that is, delete the two first object detection frames corresponding to the DE.

According to the embodiment of the disclosure, duplicate removal is performed according to the first relative position information and the repeated sequence fragment information, so that the duplicate removal accuracy can be improved.

In some embodiments, it may also be determined whether the repeated sequence fragment information includes at least two SKUs. If not, the duplicate removal is refused. And if so, executing the operation of deleting the object detection frame indicated by the sequence fragment information.

For example, the object sequence information of the goods placed on the shelf is abccde, wherein the first object sequence information acquired by the first image is ABCC and the second object sequence information acquired by the first image is CCDE. It can be seen that although there is duplicate sequence fragment information CC, since the duplicate sequence fragment information includes only one SKU (i.e., C above), the object detection box indicated by the sequence fragment information CC can be retained.

Fig. 7 is a block diagram of a schematic structure of an apparatus for determining object information according to an embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 for determining object information may include an image pair determination module 710, a deduplication module 720, and an information determination module 730.

The image pair determination module 710 is configured to determine at least one image pair from at least two target images, wherein the image pair comprises: the first image and the second image, and the first image and the second image have an overlap characterizing a same location.

The deduplication module 720 is configured to perform target detection on the first image and the second image respectively for each image pair, so as to obtain a first object detection frame corresponding to the first image and a second object detection frame corresponding to the second image; and performing duplication removal on the object detection frame indicating the same object in the first object detection frame and the second object detection frame to obtain a third object detection frame corresponding to each image pair.

The information determining module 730 is configured to determine object information related to the target image according to the third object detection frame.

According to another embodiment of the present disclosure, a deduplication module comprises: the system comprises a characteristic point pair determining submodule, a matrix determining submodule, a position determining submodule and a first duplication eliminating submodule. The characteristic point pair determining submodule is used for determining matched characteristic point pairs according to the first object detection frame and the second object detection frame, wherein the characteristic point pairs comprise first characteristic points in the first object detection frame and second characteristic points in the second object detection frame. The matrix determination submodule is used for determining the homography matrix of the first image and the second image according to the characteristic point pairs. The position determining submodule is used for determining the position of the overlapping part in the first image according to the homography matrix. The first de-duplication sub-module is used for deleting the first object detection frames at the overlapping part from the first object detection frames.

According to another embodiment of the present disclosure, a deduplication module comprises: a relationship determination submodule and a second deduplication submodule. The relation determination submodule is used for determining the matching relation between the first object detection frame and the second object detection frame. And the second duplication removal submodule is used for carrying out duplication removal on the first object detection frame according to the matching relation, the first relative position information and the second relative position information. The first relative position information indicates a relative position between an image pickup area corresponding to the first image and an image pickup area corresponding to the second image. The second relative position information indicates a relative position of the first object detection frame in the first image.

According to another embodiment of the disclosure, a deduplication module comprises: a first sequence determination submodule, a second sequence determination submodule and a third duplication removal submodule. The first sequence determining sub-module is used for determining first object sequence information according to the relative position of the first object detection frame in the first image. The second sequence determining sub-module is used for determining second object sequence information according to the relative position of the second object detection frame in the second image. And the third de-duplication sub-module is used for deleting the first object detection frame indicated by the sequence segment information in the first object detection frame according to the first relative position information and the repeated sequence segment information in the first object sequence information and the second object sequence information. The first relative position information indicates a relative position between an image capturing region corresponding to the first image and an image capturing region corresponding to the second image.

According to another embodiment of the present disclosure, the above apparatus further comprises: the device comprises an overlap rate determining submodule, a prompting submodule and an acquisition submodule. The overlap ratio determination sub-module is for determining an overlap ratio between the current estimated image and a previous image for the target location prior to determining the at least one image pair. And the prompting submodule is used for generating prompting information for indicating the moving direction of the target under the condition that the overlapping rate is determined to be smaller than the overlapping rate threshold value. The acquisition submodule is used for acquiring an image of the target position according to the received image acquisition instruction and determining the acquired image as a target image under the condition that the overlapping rate is determined to be greater than or equal to the overlapping rate threshold value.

According to another embodiment of the present disclosure, the target image is obtained by image-capturing at least two sub-regions in the target region, and the sub-regions are arranged along a predetermined direction. The above-mentioned device still includes: the device comprises a splicing module, an output module and a deleting module. The splicing module is used for splicing two target images obtained by image acquisition of two sub-areas adjacent to each other along a preset direction in the target images to obtain spliced images. The output module is used for outputting the spliced image. And the deleting module is used for deleting the target image corresponding to the canceling instruction in the target image according to the canceling instruction in response to the receiving of the canceling instruction.

According to another embodiment of the present disclosure, the target image is obtained by image-capturing at least two sub-regions in the target region, and the sub-regions are arranged in an array. The image pair determination module includes at least one of a first determination sub-module and a second determination sub-module. The first determining submodule is used for determining two target images which are obtained by image acquisition on two adjacent sub-areas positioned in the same column in the array in the target images as an image pair. The second determining submodule is used for determining two target images which are obtained by image acquisition of two adjacent sub-areas positioned in the same row in the array in the target images as an image pair.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

According to an embodiment of the present disclosure, there is also provided an electronic device, comprising at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining object information described above.

According to an embodiment of the present disclosure, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above-described method of determining object information.

According to an embodiment of the present disclosure, there is also provided a computer program product, including a computer program, which when executed by a processor, implements the above method of determining object information.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as a method of determining object information. For example, in some embodiments, the method of determining object information may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the method of determining object information described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of determining object information in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of determining object information, comprising:

determining at least one image pair from at least two target images, wherein the image pair comprises: a first image and a second image, and the first image and the second image have an overlap characterizing a same location;

for each of said pairs of images, a plurality of image pairs,

respectively carrying out target detection on the first image and the second image to obtain a first target detection frame corresponding to the first image and a second target detection frame corresponding to the second image; and

removing the duplication of the object detection frame indicating the same object in the first object detection frame and the second object detection frame to obtain a third object detection frame corresponding to each image pair; and

and determining object information related to the target image according to the third object detection frame.

2. The method of claim 1, wherein the de-duplicating one of the first object detection box and the second object detection box that indicates the same object comprises:

determining matched feature point pairs according to the first object detection frame and the second object detection frame, wherein the feature point pairs comprise first feature points in the first object detection frame and second feature points in the second object detection frame;

determining homography matrixes of the first image and the second image according to the characteristic point pairs;

determining the position of the overlapping part in the first image according to the homography matrix; and

and deleting the first object detection frame in the overlapping part from the first object detection frames.

3. The method of claim 1, wherein the de-duplicating one of the first and second object detection boxes that indicates the same object comprises:

determining a matching relationship between the first object detection frame and the second object detection frame; and

removing the duplicate of the first object detection frame according to the matching relation, the first relative position information and the second relative position information;

wherein the first relative position information indicates a relative position between an image capture area corresponding to the first image and an image capture area corresponding to the second image; the second relative position information indicates a relative position of the first object detection frame in the first image.

4. The method of claim 1, wherein the de-duplicating one of the first and second object detection boxes that indicates the same object comprises:

determining first object sequence information according to the relative position of the first object detection frame in the first image;

determining second object sequence information according to the relative position of the second object detection frame in the second image; and

deleting the first object detection frame indicated by the sequence segment information in the first object detection frame according to the first relative position information and the repeated sequence segment information in the first object sequence information and the second object sequence information;

wherein the first relative position information indicates a relative position between an image capture area corresponding to the first image and an image capture area corresponding to the second image.

5. The method of any of claims 1 to 4, further comprising: prior to the determination of at least one image pair,

determining an overlap ratio between a current estimated image and a previous image for the target location;

generating prompt information for indicating a target moving direction under the condition that the overlapping rate is determined to be smaller than an overlapping rate threshold value; and

and under the condition that the overlapping rate is determined to be larger than or equal to the overlapping rate threshold value, carrying out image acquisition on the target position according to the received image acquisition instruction, and determining the acquired image as a target image.

6. The method according to claim 5, wherein the target image is obtained by image acquisition of at least two sub-regions in a target region, the sub-regions being arranged along a predetermined direction; the method further comprises the following steps:

splicing two target images obtained by image acquisition of two sub-areas adjacent to each other along the preset direction in the target images to obtain spliced images;

outputting the spliced image; and

and in response to receiving a revocation instruction, deleting a target image corresponding to the revocation instruction in the target images according to the revocation instruction.

7. The method according to claim 1, wherein the target image is obtained by image acquisition of at least two sub-regions in a target region, the sub-regions being arranged in an array; said determining at least one image pair from at least two target images comprises at least one of:

determining two target images obtained by image acquisition on two adjacent sub-areas positioned in the same column in the array in the target images as an image pair; and

and determining two target images obtained by image acquisition of two adjacent sub-areas positioned in the same row in the array in the target images as an image pair.

8. An apparatus for determining object information, comprising:

an image pair determination module for determining at least one image pair from at least two target images, wherein the image pair comprises: a first image and a second image, and the first image and the second image have an overlap characterizing a same location;

a deduplication module to, for each of the image pairs,

and the information determining module is used for determining the object information related to the target image according to the third object detection frame.

9. The apparatus of claim 8, wherein the de-duplication module comprises:

a characteristic point pair determining submodule, configured to determine a matched characteristic point pair according to the first object detection frame and the second object detection frame, where the characteristic point pair includes a first characteristic point in the first object detection frame and a second characteristic point in the second object detection frame;

the matrix determination submodule is used for determining homography matrixes of the first image and the second image according to the characteristic point pairs;

a position determination sub-module configured to determine a position of the overlapping portion in the first image based on the homography matrix; and

and the first de-weight submodule is used for deleting the first object detection frames at the overlapping part in the first object detection frames.

10. The apparatus of claim 8, wherein the de-duplication module comprises:

a relation determination submodule for determining a matching relation between the first object detection frame and the second object detection frame; and

the second duplication elimination submodule is used for eliminating duplication of the first object detection frame according to the matching relation, the first relative position information and the second relative position information;

11. The apparatus of claim 8, wherein the de-duplication module comprises:

a first sequence determining sub-module, configured to determine first object sequence information according to a relative position of the first object detection frame in the first image;

the second sequence determining submodule is used for determining second object sequence information according to the relative position of the second object detection frame in the second image; and

a third de-duplication sub-module, configured to delete the first object detection frame indicated by the sequence fragment information in the first object detection frame according to the first relative position information and repeated sequence fragment information in the first object sequence information and the second object sequence information;

12. The apparatus of any of claims 8 to 11, further comprising:

an overlap rate determination sub-module for determining an overlap rate between a current estimated image and a previous image for the target location prior to determining the at least one image pair;

the prompting submodule is used for generating prompting information used for indicating the moving direction of the target under the condition that the overlapping rate is determined to be smaller than the overlapping rate threshold value; and

and the acquisition submodule is used for acquiring the image of the target position according to the received image acquisition instruction and determining the acquired image as a target image under the condition that the overlapping rate is determined to be greater than or equal to the overlapping rate threshold value.

13. The apparatus of claim 12, wherein the target image is acquired by image capturing at least two sub-regions in a target region, the sub-regions being arranged along a predetermined direction; the device further comprises:

the splicing module is used for splicing two target images obtained by image acquisition of two sub-areas adjacent to each other along the preset direction in the target images to obtain spliced images;

the output module is used for outputting the spliced image; and

and the deleting module is used for deleting the target image corresponding to the canceling instruction in the target image according to the canceling instruction in response to the receiving of the canceling instruction.

14. The apparatus of claim 8, wherein the target image is acquired by image capturing at least two sub-regions of a target region, the sub-regions being arranged in an array; the image pair determination module, based on at least two target images, comprises at least one of:

the first determining submodule is used for determining two target images which are obtained by image acquisition on two adjacent sub-areas positioned in the same column in the array in the target images as an image pair; and

and the second determining submodule is used for determining two target images obtained by image acquisition on two adjacent sub-areas positioned in the same row in the array in the target images as an image pair.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.