CN114549597A

CN114549597A - Method, device, storage medium and program product for matching target object in image

Info

Publication number: CN114549597A
Application number: CN202111595885.0A
Authority: CN
Inventors: 刘佳
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-05-27

Abstract

The embodiment of the invention provides a matching method of a target object in an image, electronic equipment, a storage medium and a computer program product. The method comprises the following steps: respectively detecting target objects in different images; respectively dividing the same regions in different images; respectively counting the grid areas occupied by the target objects aiming at different images; and determining whether the target objects in the different images are the same object according to the grid areas occupied by the target objects in the different images in the respective images. The scheme has low requirement on the computing capability of hardware, and can reduce the occupation of hardware memory. And because the characteristics of the target object do not need to be compared, the deep learning algorithm and errors possibly brought by the deep learning algorithm are not involved, and the accuracy of the target object matching result is ensured.

Description

Method, device, storage medium and program product for matching target object in image

Technical Field

The present invention relates to the field of target object matching, and more particularly, to a method for matching a target object in an image, an electronic device, a storage medium, and a computer program product.

Background

Object matching, which may also be referred to as object re-recognition. The objective is to perform similarity matching for objects of interest in different images, thereby determining whether multiple objects of interest in different images are the same object.

At present, the main technical solution for realizing target matching is to extract the features of each target of interest in different images, and then compare the extracted features with each other. When the comparison result is greater than a preset threshold, it may indicate that the objects of interest in different images are the same object.

However, the above technical solutions still have some technical problems. On one hand, the feature extraction for the target is based on a deep learning algorithm, so that the requirement on the computing power of hardware is high. On the other hand, for the deep learning algorithm, due to the influence of the training model precision, it may not be possible to accurately identify whether the targets of interest are the same target based on the extracted features, that is, the accuracy of the matching result of the targets cannot be guaranteed.

Disclosure of Invention

The present invention has been made in view of the above problems. According to an aspect of the present invention, there is provided a method for matching a target object in an image, including: respectively detecting target objects in different images, wherein the different images are images acquired from different angles aiming at the same target scene area; respectively performing division operation on the same regions in different images, wherein the same regions in different images are regions in which target scene regions are respectively imaged in different images, and the same regions comprise detected target objects, and the division operation is used for dividing the same regions in different images into a plurality of grid regions based on the same division rule; respectively counting the grid areas occupied by the detected target objects aiming at different images; and determining whether the target objects in the different images are the same object according to the grid areas occupied by the target objects in the different images in the respective images.

Illustratively, separately counting the grid areas occupied by the detected target objects for different images includes: respectively executing marking operation aiming at different images, wherein the marking operation is used for marking all grid areas in the different images so as to enable the grid areas at corresponding positions in the different images to have the same index numbers; and for the detected target object in different images, counting the index number of the grid area occupied by the target object. Determining whether the target objects in the different images are the same object according to the grid areas occupied by the target objects in the different images in the respective images comprises: and determining whether the target objects in the different images are the same object or not according to the index numbers of the grid areas occupied by the target objects in the different images in the respective images.

Illustratively, determining whether the target objects in the different images are the same object according to the index numbers of the grid areas occupied by the target objects in the different images in the respective images includes: determining the number of the same index numbers in the index numbers of the grid areas occupied by the target objects in different images in the respective images; and determining whether the target objects in different images are the same object or not according to the number of the same index numbers.

Exemplarily, determining whether the target objects in different images are the same object according to the number of the same index number includes: determining an image with the largest total number of grid areas occupied by the target object in different images as a main image; under the condition that the ratio of the number of the same index numbers to the total number of grid areas occupied by the target objects in the main image is larger than a preset proportion threshold value, determining that the target objects in different images are the same object; otherwise, the target objects in different images are determined not to be the same object.

Exemplarily, determining whether the target objects in different images are the same object according to the number of the same index number includes: determining that the target objects in different images are the same object under the condition that the number of the same index numbers is larger than a preset number threshold; otherwise, the target objects in different images are determined not to be the same object.

Illustratively, each grid region is rectangular, and the detected target object is represented by a bounding box. For the detected target object in different images, counting the index number of the grid area occupied by the target object, including: for each vertex in the four vertexes of the bounding box of the detected target object, traversing all grid areas in the image where the target object is located, and determining whether the vertex of the bounding box is located in any grid area according to the vertex coordinates of the rectangle of the grid area; for the condition that the vertex of the bounding box is positioned in any grid area, recording the grid area in which the vertex is positioned; and determining the index number of the mesh area occupied by the target object according to the mesh area where the four vertexes of the bounding box are located.

Illustratively, the method further comprises: and respectively detecting the attributes of the target objects in the different images by using an attribute detection algorithm. Determining whether the target object in the different images is the same object comprises: and determining whether the target objects in the different images are the same object or not according to the grid areas occupied by the target objects in the different images in the respective images and the attributes of the target objects in the different images.

Illustratively, the dividing operation performed on the same region in the different images respectively includes: the same regions in different images are equally divided into grid regions, respectively. The number of the transverse equally-divided grids of the same area in different images is the same, and the number of the longitudinal equally-divided grids of the same area in different images is also the same.

Illustratively, equally dividing the same region in different images into mesh regions respectively comprises: determining the area S occupied by the detected target object in one of the different images_obj(ii) a According to the occupied area S_objThe same regions in different images are subdivided into grid regions, respectively. Wherein the area of each grid region is k times the occupied area, wherein k is any value less than 1/3.

According to another aspect of the present invention, there is also provided an electronic device including a processor and a memory. Wherein the memory has stored therein computer program instructions for executing the method of matching a target object in an image as described above when executed by the processor.

According to still another aspect of the present invention, there is also provided a storage medium. On the storage medium program instructions are stored which, when run, are adapted to perform the method of matching a target object in an image as described above.

According to yet another aspect of the invention, there is also provided a computer program product comprising a computer program. The computer program is used when running to perform the method of matching a target object in an image as described above.

The scheme fully utilizes the characteristic that the imaging positions of the same object in different images are approximately the same, and the target object matching is carried out based on the subareas (namely grid areas) of the same area in different images occupied by the target object. According to the technical scheme, the target object can be matched without extracting and comparing the characteristics of the target object. The scheme has low requirement on the computing capacity of hardware and can reduce the occupation of hardware memory. And because the characteristics of the target object do not need to be compared, the deep learning algorithm and errors possibly brought by the deep learning algorithm are not involved, and the accuracy of the target object matching result is ensured.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows a schematic block diagram of an example electronic device for implementing a method and apparatus for matching a target object in an image according to an embodiment of the present invention;

FIG. 2 shows a schematic flow diagram of a method of matching a target object in an image according to one embodiment of the invention

FIG. 3 shows a schematic diagram of two different images according to one embodiment of the invention;

FIG. 4 shows a schematic flow diagram of a method of equally dividing the same region in different images into mesh regions according to one embodiment of the invention;

FIG. 5 shows a schematic flow diagram of a method of counting the grid areas occupied by detected target objects according to one embodiment of the invention;

FIG. 6 shows a schematic flow diagram of a method of counting index numbers of a grid area occupied by a target object according to one embodiment of the present invention;

FIG. 7 shows a schematic flow diagram of a method of determining whether target objects in different images are the same object according to one embodiment of the invention;

FIG. 8 shows a schematic view of different images according to another embodiment of the invention;

FIG. 9 shows a schematic block diagram of a matching apparatus for a target object in an image according to an embodiment of the present invention; and

FIG. 10 shows a schematic block diagram of an electronic device according to an embodiment of the invention.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and development of artificial intelligence technology, the technology is applied to many fields, such as security protection, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, intelligent medical treatment, face payment, face unlocking, fingerprint unlocking, human evidence verification, smart screens, smart televisions, cameras, mobile internet, network, beauty, makeup, medical beauty, intelligent temperature measurement and the like.

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments described in the present application without inventive step, shall fall within the scope of protection of the present application.

First, an example electronic device 100 for implementing a matching method and apparatus of a target object in an image according to an embodiment of the present invention is described with reference to fig. 1.

As shown in fig. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104. Optionally, the electronic device 100 may further include an input device 106, an output device 108, and an image acquisition device 110. These components are interconnected by a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may also have some of the components shown in fig. 1, or may have other components and structures as desired.

The processor 102 may be implemented in hardware using at least one of a microprocessor, a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), and a Programmable Logic Array (PLA). Processor 102 may also be one or a combination of Central Processing Units (CPUs), Graphics Processors (GPUs), Application Specific Integrated Circuits (ASICs), or other forms of processing units having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 100 to perform desired functions.

Storage 104 may include one or more computer program products. The computer program product may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions or images, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, an image capture device, and the like.

The output device 108 may output various information (e.g., images and/or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, and the like. Alternatively, the input device 106 and the output device 108 may be integrated together, implemented using the same interactive device (e.g., a touch screen).

Image capture device 110 may capture images (including still images and video frames) and store the captured images in storage device 104 for use by other components. The image capture device 110 may be a separate camera, a camera in a mobile terminal, or an image sensor in a snap-in camera. It should be understood that the image capture device 110 is merely an example, and the electronic device 100 may not include the image capture device 110. In this case, an image may be captured by other image capturing devices and the captured image may be transmitted to the electronic apparatus 100.

Exemplary electronic devices for implementing the method and apparatus for matching a target object in an image according to embodiments of the present invention may be implemented on devices such as a personal computer or a remote server.

A matching method of a target object in an image according to an embodiment of the present invention will be described below with reference to fig. 2. Fig. 2 shows a schematic flow diagram of a method 200 of matching a target object in an image according to an embodiment of the invention. As shown in fig. 2, the method 200 includes the following steps.

Step S210, detecting target objects in different images respectively, where the different images are images captured from different angles for the same target scene area.

Illustratively, the target scene area may be a target area for which the user wants to image-capture. The target objects appearing in the target scene area may appear in the different images simultaneously. The different images may come from different image capturing devices 110 such as a camera, which may be captured original images, or may be images obtained after preprocessing the original images. The preprocessing operation may include all operations for clearer target object detection. For example, the preprocessing operation may include a denoising operation such as filtering.

In one particular embodiment, different images may be from videos taken by different webcams from different perspectives for the same target scene area. The network cameras can be installed at different positions of the same place, and the camera of each network camera is respectively aligned to the same target scene area at different angles. The different images thus obtained are taken for the same target scene area, which differ only in view angle. Based on this, the accuracy of matching the target object subsequently can be ensured. Optionally, multiple webcams can capture video of the same target scene area simultaneously.

For different images, still images or video frames, detection can be performed respectively to obtain the target object therein. The target object may be a pedestrian, a vehicle, or the like. The target object detection algorithm is not specifically limited, and any existing or future algorithm capable of achieving target object detection is within the scope of the present application, such as various single-stage (One-stage) or Two-stage (Two-stage) target object detection algorithms, including but not limited to R-CNN, FastR-CNN, SSD, etc., image semantic segmentation algorithms, image instance segmentation algorithms, etc. Thereby, the position of the target object of interest in the respective different images, for example the position of a pedestrian desiring target object matching in the different images, is determined.

The different images in the following are illustrated by way of example for two images. These two images are referred to as a first image and a second image, respectively. FIG. 3 shows a schematic diagram of two different images according to one embodiment of the invention. Wherein the left image is shown as the first image and the right image is shown as the second image in fig. 3. When the two images are detected separately, the target object in each image is determined, as shown by the black rectangular box in fig. 3. The black rectangular frame in the first image is a Bounding frame (Bounding Box) of the first target object, and the black rectangular frame in the second image is a Bounding frame of the second target object. For convenience of description, the target object detected in the first image may be referred to as a first target object, and the target object detected in the second image may be referred to as a second target object.

In step S220, the same regions in different images are divided. Wherein the same region in the different images is a region in which the target scene region is imaged in the different images, respectively, and the same region includes the detected target object, the dividing operation being for dividing the same region in the different images into a plurality of mesh regions based on the same dividing rule.

Alternatively, the same region in different images may be determined by acquiring a marker of the same region in different images. In one embodiment, the user may frame out the marked region in the image by entering instructions or parameters, etc. using the input device 106. For example, a region is framed in the first image as a first marked region. A second marked area identical to the first marked area is framed in the second image. The term "identical" may refer to the first marker region and the second marker region both being imaged for the same target scene region. The first marker region is the region of the target scene region imaged in the first image and the second marker region is the region of the target scene region imaged in the second image. Alternatively, the label information for the same area in different images may also be pre-stored on the memory. The marking information may include, among other things, the position coordinates of the vertices of the area to be marked on the image on which they are located. For example, different images are video frames of video taken by different webcams from different perspectives for a unit doorway area. The same area is a partial area of the entire doorway area. Wherein the marked areas, such as the first marked area and the second marked area, each comprise a detected target object, such as a pedestrian, a vehicle, a building, an animal, etc., in the respective image.

It will be appreciated that the location and size of the marked regions in different images may be different in the respective images, as different images may be obtained for the same region imaged at different viewing angles.

Alternatively, the same region in different images can also be determined by identifying the imaged content in different images. The identification means includes, but is not limited to, identifying the imaging content of the image based on a neural network or based on wavelet moments, etc. It is understood that the same region in different images refers to a region having the same imaged content.

For convenience of description, the following description will be given taking an example in which the same region is determined based on a mark input by a user.

Referring again to FIG. 3, user labels for the different images shown in FIG. 3 are received. In fig. 3, the light area marked with numerals is the area marked by the user, and the dark gray area without numerals is the area other than the marked area, and the content thereof is not displayed in order to simplify and highlight the marked area. As shown in fig. 3, the marked areas in the two images differ slightly in position and size in the images.

Step S220 is for dividing the same area in different images into smaller areas. The partitioning rule may be used to partition a region into a certain number of mesh regions in a certain proportion.

Illustratively, the dividing operation may include a horizontal division and a vertical division. After the first mark region and the second mark region are obtained as described above, the same horizontal division and vertical division may be performed on the two based on the same division rule to obtain a plurality of divided mesh regions, respectively. Referring again to the embodiment shown in fig. 3, if the first image is divided into 8 cells by horizontal averaging and 5 cells by vertical averaging, the second image is also divided into 8 cells by horizontal averaging and 5 cells by vertical averaging based on the same division rule. As described above, since the first marker region and the second marker region correspond to the same target scene region, the plurality of mesh regions obtained after performing the same division operation also correspond to substantially the same target scene region.

Step S230, for different images, respectively counting the grid areas occupied by the detected target objects.

In step S210, the position of the detected target object in the image is determined. In step S220, each of the small mesh regions into which the different images are divided is obtained. In this step S230, the mesh region occupied by the detected target object is determined according to the position of the detected target object in the different image in the image and the mesh division of the image.

Still taking fig. 3 as an example, as shown in fig. 3, for convenience of description, the mesh regions divided by the first image and the second image in fig. 3 are marked by natural numbers starting from 1, so as to obtain the index numbers of the mesh regions. As shown in the left drawing of fig. 3, the mesh areas occupied by the first target object are 7 th, 8 th, 12 th and 13 th mesh areas. As shown in the right diagram of fig. 3, the mesh areas occupied by the second target object are also the 7 th, 8 th, 12 th and 13 th mesh areas.

Step S240, determining whether the target objects in different images are the same object according to the grid areas occupied by the target objects in different images in the respective images.

Illustratively, the grid areas occupied by the detected target objects in the respective images are compared. When the comparison result indicates that the similarity between the two is greater than or equal to a preset position similarity threshold, for example, 80%, it may be determined that the first target object and the second target object respectively from different images are the same object. Otherwise, it is not the same object.

The scheme fully utilizes the characteristic that the imaging positions of the same object in different images are approximately the same, and the target matching is carried out based on the subareas (namely grid areas) of the same area in different images occupied by the target object. According to the technical scheme, the target object can be matched without extracting and comparing the characteristics of the target object. The scheme has low requirement on the computing capacity of hardware and can reduce the occupation of hardware memory. And because the characteristics of the target object do not need to be compared, the deep learning algorithm and errors possibly brought by the deep learning algorithm are not involved, and the accuracy of the target object matching result is ensured.

It is to be understood that the above description of the method 200 for matching a target object in an image according to an embodiment of the present application is only for illustration and not for limitation. Although the above step S210 is described prior to the step S220, it may be performed after the step S220 or simultaneously with the step S220.

Exemplarily, step S220 may include: the same region in different images is equally divided into grid regions, respectively. The number of the transverse equally-divided grids of the same area in different images is the same, and the number of the longitudinal equally-divided grids of the same area in different images is also the same. Referring again to fig. 3, fig. 3 shows a schematic diagram of the same region in different images being equally divided laterally and vertically, respectively, to obtain grid regions, according to one embodiment of the invention. As mentioned above, the left image in fig. 3 is the first image, and the right image is the second image, wherein the same region in different images is the first mark region and the second mark region respectively. Based on the length and width of the first and second mark regions, respectively, 8 equal divisions are made in the lateral direction thereof and 5 equal divisions are made in the longitudinal direction thereof, respectively, and thus divided mesh regions as shown in fig. 3 are obtained.

The same areas in different images are respectively equally divided, so that the influence on the statistical result of the grid area occupied by the target object due to the fact that the area of a certain grid area is too large or too small can be effectively avoided, the accuracy of the target object matching result is further guaranteed, and unnecessary interference is reduced. In addition, the halving operation is simple in calculation and easy to realize, and the requirement on the calculation capacity of the system is reduced.

Alternatively, the same region in different images may also be divided unequally. For example, the region near the target object may be divided with a smaller granularity, and the region far from the target object may be divided with a larger granularity, considering the position of the target object. Therefore, the calculation amount can be reduced on the basis of ensuring the accuracy of the target object matching.

Fig. 4 shows a schematic flow diagram of a method of equally dividing the same region in different images into mesh regions according to one embodiment of the invention. As shown in fig. 4, step S220 may include step S221 and step S222.

Step S221, determining the area S occupied by the detected target object in one of the different images_obj。

For example, the occupied area S of the target object in the image may be determined according to the area of the bounding box of the target object_obj. It can be understood that this occupied area S_objThe area occupied by the first target object in the first image may be the area occupied by the second target object in the second image.

Step S222, according to the occupied area S determined in step S221_objThe same regions in different images are subdivided into grid regions, respectively. Wherein the area of each grid region is the occupied area S_objK is any value less than 1/3. In other words, in the image, the detected target object occupies approximately k mesh areas.

It will be appreciated that the same will be in different imagesThe finer the respective division of the regions, the more the mesh region is occupied by the target object. Meanwhile, the increase in the number of divided mesh areas in the same area in different images means that the target object matching is more accurate, but also brings a larger amount of calculation. Conversely, the smaller the number of divided mesh regions in the same region in different images, the smaller the amount of calculation, but the larger the target object matching error may be. Therefore, the area of each mesh region is defined herein in consideration of the above factors in combination. In this embodiment, the area of each mesh region may be the area S occupied by the detected target object_objWhere k is any value less than 1/3, for example k may be equal to 1/4. Based on this, the same region in different images can be subdivided respectively to obtain a plurality of mesh regions. It will be appreciated that the subdivision operation may be determined on the basis of one image, and then the same subdivision operation is performed on the other image. For example, the granularity of the mesh region in the first image is determined based on the area of the first target object detected in the first image. The first image and the second image are then subdivided based on the same partitioning rule to obtain a plurality of mesh regions having the granularity, respectively.

Therefore, the same regions in different images can be reasonably divided, the accuracy of target object matching is guaranteed, the calculated amount is small, and the requirement on hardware is low.

Alternatively, the aliquoting operation may be set based on experience rather than the occupied area of the target object.

Fig. 5 shows a schematic flow diagram of a method of counting the grid areas occupied by detected target objects according to one embodiment of the invention. As shown in fig. 5, step S230 may include step S231, step S232, and step S233.

Step S231, respectively executing a marking operation for the different images, where the marking operation marks all the grid regions in the different images, so that the grid regions at the corresponding positions in the different images have the same index numbers.

Taking the first marked area as an example, the marking operation is performed respectively for all the grid areas therein. Referring again to fig. 3, the marking operation may be to number all grid regions by a natural number starting from 1, which is an index number of each grid region. It will be appreciated that each number represents a respective grid area. Illustratively, as shown in the left diagram of fig. 3, the first marked region is divided into 8 columns in the longitudinal direction, and the first column and the second column … … from left to right are the eighth column. All grid areas are numbered from top to bottom starting with the first column. And so on, then each column is numbered in turn until the eighth column is numbered. The second marker region is numbered with the same marking operation so that the grid regions at corresponding positions in the first marker region and the second marker region have the same index number. For example, the index number of the grid area at the position of the second column and the second row of the first marked area is 7. By numbering the second marker regions with the same marking operation, the index number of the grid region at the location of the second column and the second row of the second marker region is also 7. Alternatively, the marking operation may be performed on the marking region in other manners, such as changing the starting point, the marking order, or the marking manner. The purpose of this marking operation is to have the same index numbers for the mesh regions at corresponding positions in different images, but different index numbers for the mesh regions at different positions in the same image, as long as this is achieved.

Step S232, for the detected target object in different images, counting the index number of the grid area occupied by the target object.

In this step S232, it is determined which grid areas are occupied by the detected target object. In one example, it is determined which grid regions overlap pixels between the detected target object and. If there are overlapping pixels between the two, the target object may be considered to occupy the grid area. In an alternative example, it is determined which all pixels of the mesh region belong to the detected target object. A target object is considered to occupy a grid area if all pixels of the grid area belong to the detected target object, i.e. the detected target object completely covers the grid area. And for the detected target object, counting the index number of the grid area occupied by the target object. As previously described, in the embodiment shown in FIG. 3, the grid areas occupied by the first target object have

index numbers

7, 8, 12, and 13. The index numbers of the grid areas occupied by the second target object are also 7, 8, 12 and 13.

Preferably, after step S232, an index number sequence may also be generated according to the counted index numbers. The index numbers counted in step S232 may constitute an index number sequence.

Illustratively, the counted index numbers may be arranged from small to large to generate a sequence of index numbers. Referring again to fig. 3, for the left graph, the index number sequence generated after the statistical index numbers are arranged from small to large is (7.8.12.13). Alternatively, the counted index numbers may be arranged from large to small to generate a sequence of index numbers. Still taking fig. 3 as an example, the resulting index number sequence is (13.12.8.7). In the same way, the index number sequence (7.8.12.13) or (13.12.8.7) of the right figure can be obtained.

Therefore, after the index numbers are sorted according to a certain rule, an ordered index number sequence can be obtained. The time is saved for the subsequent matching process, the algorithm is simplified, and the index number sequence can be conveniently aligned and matched subsequently.

After step S232, step S240 may be performed. Exemplarily, step S240 may include: and determining whether the target objects in the different images are the same object or not according to the index numbers of the grid areas occupied by the target objects in the different images in the respective images.

As described above, the index numbers corresponding to the first target object are 7, 8, 12, and 13, and the index numbers corresponding to the second target object are also 7, 8, 12, and 13. For example, the index number corresponding to the first target object and the index number corresponding to the second target object are aligned and compared to determine whether the index number corresponding to the first target object and the index number corresponding to the second target object have the same index number in a certain proportion. The proportion can be set arbitrarily and reasonably according to the requirements of application scenes, for example, any value between 50% and 90%. If the same index number above the ratio exists, it can be determined that the first target object and the second target object are the same object. Otherwise, the two objects are not the same object. It is understood that the foregoing is only a specific example of the present application, and in the implementation process of the embodiment, any reasonable setting may be performed on the comparison rule of the index numbers to determine whether the target objects in different images are the same object. For example, in the example of generating the index number sequence described above, the target objects in different images may be determined to be the same object when the similarity of the index numbers occupied by the target objects in the different images exceeds a certain similarity threshold, e.g., 65%. Otherwise, it is not the same object.

According to the technical scheme, the target object is accurately determined in a mode of giving the index number to the grid area, so that the accuracy of the determined target object is guaranteed, meanwhile, the algorithm for realizing the scheme is simple, and the burden of hardware cannot be increased.

Fig. 6 shows a schematic flow chart of a method of counting the index numbers of the mesh areas occupied by the target object in step S232 according to an embodiment of the present invention. This embodiment is described with reference to fig. 3. As shown in fig. 3, each of the grid regions obtained by the division is rectangular. The detected target object is represented by a bounding box, i.e. the black rectangular box shown in the figure.

In this embodiment, step S232 may include step S232a, step S232b, and step S232c as shown in fig. 6.

Step S232a, for each vertex of the four vertices of the bounding box of the detected target object, traverse all the mesh areas in the image where the target object is located, and determine whether the vertex of the bounding box is located in any mesh area according to the vertex coordinates of the rectangle of the mesh area.

Still taking the left diagram in fig. 3 as an example, 40 mesh regions are traversed, and the vertex coordinates of the rectangle of each mesh region are recorded. Therein, the coordinates of only two vertices may be recorded for each rectangle, e.g. the coordinates of the top left vertex and the bottom right vertex of each mesh area. The coordinates of the four vertices surrounding the box are detected and recorded simultaneously. Illustratively, it is assumed that a rectangular coordinate system is established with the upper left vertex of the first mark region as the origin, the lateral rightward direction of the first mark region as the X-axis direction, and the longitudinal downward direction of the first mark region as the Y-axis direction. For one of the four vertices of the bounding box, if both the abscissa and the ordinate of the vertex are smaller than the abscissa and the ordinate of the lower right vertex of the rectangle of a certain mesh region and simultaneously larger than the abscissa and the ordinate of the upper left vertex of the rectangle of the mesh region, it may indicate that the vertex of the bounding box is within the mesh region. It will be appreciated that the vertices surrounding the box may also appear on the boundary of the mesh region. When the abscissa of the vertex is equal to the abscissa of one vertex of the rectangle of a certain mesh region while the ordinate of the vertex of the bounding box is less than or equal to the ordinate of the lower right vertex of the rectangle and greater than or equal to the ordinate of the upper left vertex of the rectangle, or when the ordinate of the vertex is equal to the ordinate of one vertex of the rectangle of a certain mesh region while the abscissa of the vertex of the bounding box is less than or equal to the abscissa of the lower right vertex of the rectangle and greater than or equal to the abscissa of the upper left vertex of the rectangle, it may indicate that this vertex of the bounding box is on the boundary of the rectangle of the mesh region. It can also be considered to be within the grid area at this time. The same method can be used to determine whether the vertices of the remaining three bounding boxes are within a certain grid area.

In step S232b, in the case where the vertex of the bounding box is located in any mesh region, the mesh region where the vertex is located is recorded.

It may be determined from step S232a whether the vertices of the bounding box are within any of the grid areas. When any vertex of the bounding box is located in a certain mesh area, the mesh area corresponding to the vertex is recorded. Therefore, the mesh areas corresponding to the four vertexes of the bounding box can be respectively recorded so as to determine the mesh area occupied by the target object in the bounding box. As shown in the left diagram of fig. 3, the mesh regions in which the four vertices of the bounding box are located are the mesh regions with

index numbers

7, 8, 12, and 13, respectively.

Step S232c, determining the index number of the mesh area occupied by the target object according to the mesh area where the four vertices of the bounding box are located.

As shown in the left diagram in fig. 3, the index numbers of the mesh areas occupied by the target object in the bounding box can be determined to be 7, 8, 12, and 13, respectively, according to the mesh areas where the four vertices of the bounding box are located. It will be appreciated that since the bounding box may be much larger in size than the grid area, for example, k may be 1/6, 1/8, etc. In this case, the mesh region in which the adjacent vertices of the bounding box are located may be discontinuous. Step S232c may specifically include: firstly, determining a boundary grid area between grid areas where adjacent vertexes of the bounding box are located; then, based on the determined boundary mesh regions, a center mesh region located between the mesh regions is further determined. Thus, the mesh region occupied by the target object includes the mesh region in which the four vertices of the bounding box are located, the boundary mesh region, and the center mesh region.

In the above technical solution, the grid area is rectangular, the detected target object is also represented by a rectangular bounding box, and the grid area occupied by the target object is determined based on the rectangular bounding box. Therefore, the grid area occupied by the target object can be accurately determined, and the follow-up accurate target object matching is guaranteed; and the calculation amount is small, and the realization speed is high.

Fig. 7 shows a schematic flow chart of a method of determining whether the target objects in different images are the same object according to step S240 of an embodiment of the present invention. As shown in fig. 7, step S240 may include step S241 and step S242.

Step S241, determining the number of the same index number in the index numbers of the grid area occupied by the target object in the different images in the respective images.

Fig. 8 shows a schematic view of different images according to another embodiment of the invention. Fig. 8 left diagram shows a third image, a third marker region therein and a detected third target object. Fig. 8 shows on the right a fourth image, a fourth marker region therein and a fourth detected target object. According to the foregoing steps, the generated

index numbers

7, 8, 12, 13 and 7, 8, 9, 12, 13, 14 for the third image and the fourth image of the detected third target object in the third image and the detected fourth target object in the fourth image in fig. 8, respectively, can be generated. Wherein, the number of the same index number in the two index numbers is 4.

Step S242, determining whether the target objects in different images are the same object according to the number of the same index number.

For example, after determining that the number of the same index numbers is 4 through the step S241, a difference between the number of the index numbers corresponding to the fourth target object and the number of the index numbers corresponding to the third target object may be calculated to obtain a difference of 2. When the number of the same index numbers is larger than the difference value, the target objects in different images can be determined to be the same object. Otherwise, the two objects are not the same object. In this embodiment, the number of identical index numbers 4 > the difference value 2, and therefore, correspondingly, the third target object and the fourth target object are the same object.

Therefore, whether the target objects in different images are the same object can be determined according to the number of the same index numbers. The range is narrowed for the comparison of the index numbers, and the accuracy of the determined target object is further improved.

In one embodiment, step S242 may include step S242a and step S242 b.

In step S242a, the image in which the total number of mesh areas occupied by the target object is the largest in the different images is determined as the main image.

The third image and the fourth image are still used as an example for explanation. The total number of the grid regions occupied by the third target object in the third image is 4, which is the number of the index numbers corresponding to the third target object, and similarly, the total number of the grid regions occupied by the fourth target object in the fourth image is 6. 6 > 4, and therefore, in this embodiment, the fourth image is taken as the main image.

Step S242b, determining that the target objects in different images are the same object when the ratio of the number of the same index numbers to the total number of the grid areas occupied by the target objects in the main image is greater than a preset ratio threshold; otherwise, the target objects in different images are determined not to be the same object.

In this embodiment, the number of identical index numbers divided by the total number of grid areas occupied by the target object in the main image may result in a ratio. When the ratio is greater than a preset threshold, for example, 65%, the target objects in different images may be determined to be the same object. Otherwise, it is not the same object. In this embodiment, the calculated ratio is 4/6 > 65%, and therefore it can be determined that the target objects in the third image and the fourth image in fig. 8 are the same object.

Therefore, the method can realize object matching aiming at the image from the camera with larger visual angle difference, and improves the practicability of the method while ensuring the accuracy of the object matching.

In another embodiment, step S242 may include determining that the target objects in the different images are the same object if the number of the same index numbers is greater than a preset number threshold, and otherwise determining that the target objects in the different images are not the same object.

Taking the image shown in fig. 8 as an example again, according to the foregoing, the number of the same index numbers in the index numbers corresponding to the third target object and the fourth target object is 4. And judging the size relation between the number 4 of the same index numbers and a preset number threshold. And when the number 4 of the same index numbers is larger than a preset number threshold value, determining the third target object and the fourth target object as the same object. Otherwise, the third target object and the fourth target object are not the same object. The preset number threshold may be set according to the total number of the grid areas occupied by the third target object and/or the fourth target object.

Therefore, whether the target objects in different images are the same object can be determined by only comparing the number of the same index numbers with the size of the preset number threshold. The algorithm of the scheme is simple and easy to realize, and the calculated amount is small.

Illustratively, the matching method 200 for the target object in the image may further include:

step S250, using an attribute detection algorithm to detect the attributes of the target objects in different images respectively.

Taking the target object as an example of a pedestrian, the attribute may be gender, age, hair style, hair color, hair accessories, apparel, clothing color, clothing style, vehicle (such as bicycle, skateboard, motorcycle, etc.), behavioral action, height, or the like. The attribute of the pedestrian can be detected and obtained by using an attribute detection algorithm.

For example, determining whether the target objects in different images are the same object may be based on the attributes of the target objects in different images, in addition to the grid areas occupied by the target objects in different images in the respective images. After the detection of obtaining the attributes of the target objects according to the foregoing step S250, the attributes of the first target object may be compared with the attributes of the second target object, and when the comparison result exceeds a preset attribute similarity threshold, for example, 75%, the first target object and the second target object may be considered to be the same object. Otherwise, the two objects are not the same object. The attribute detection algorithm is not specifically limited in the present application, and any existing or future algorithm that can implement attribute detection on a target object is within the scope of the present application.

Therefore, matching conditions are added for matching of the target object, and the accuracy of the matching result can be further ensured. Meanwhile, the target object can be quickly positioned by the attributes of the target object, hardware occupied by the target object is less in computing resources, the target object matching speed can be increased, and the use experience of a user is improved.

According to a second aspect of the present invention, there is provided a matching apparatus for a target object in an image. Fig. 9 shows a schematic block diagram of a matching apparatus 900 for a target object in an image according to an embodiment of the present invention. As shown in fig. 9, the matching apparatus 900 includes a target object detecting module 910, a dividing module 920, a counting module 930, and a matching module 940.

The target object detection module 910 is configured to detect target objects in different images respectively. Different images are images acquired from different angles for the same target scene area. The target object detection module 910 may be implemented by the processor 102 in the electronic device 100 shown in fig. 1 executing program instructions stored in the storage 104.

The dividing module 920 is configured to perform a dividing operation on the same region in different images, where the same region in different images is a region in which the target scene region is imaged in different images, and the same region includes the detected target object, respectively, and the dividing operation is configured to divide the same region in different images into a plurality of mesh regions based on the same dividing rule. The dividing module 920 may be implemented by the processor 102 in the electronic device 100 shown in fig. 1 executing program instructions stored in the storage 104.

The statistic module 930 is configured to separately count the grid areas occupied by the detected target objects for different images. The statistics module 930 may be implemented by the processor 102 in the electronic device 100 shown in fig. 1 executing program instructions stored in the storage 104.

The matching module 940 is configured to determine whether the target objects in the different images are the same object according to the grid areas occupied by the target objects in the different images in the respective images. The matching module 940 may be implemented by the processor 102 in the electronic device 100 shown in fig. 1 executing program instructions stored in the storage 104.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

According to a third aspect of the invention, an electronic device is provided. Fig. 10 shows a schematic block diagram of an electronic device 1000 according to an embodiment of the invention. As shown in fig. 10, the electronic device 1000 includes a processor 1010 and a memory 1020.

The memory 1020 stores computer program instructions for implementing corresponding steps in a method of matching a target object in an image according to an embodiment of the present invention.

The processor 1010 is configured to execute the computer program instructions stored in the memory 1020 to perform the corresponding steps of the method for matching a target object in an image according to an embodiment of the present invention, and to implement the target object detecting module 910, the dividing module 920, the counting module 930, and the matching module 940 in the matching apparatus for a target object in an image according to an embodiment of the present invention.

According to a fourth aspect of the present invention, there is also provided a storage medium. On the storage medium, program instructions are stored which, when executed by the computer or processor 1010, perform the respective steps of the matching method for a target object in an image according to an embodiment of the present invention, and are used to implement the respective modules in the matching apparatus for a target object in an image according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

According to a fifth aspect of the invention, there is also provided a computer program product comprising a computer program. The computer program is used when running to perform the method of matching a target object in an image of an embodiment of the invention.

Specific implementations and advantageous effects of the apparatus for matching a target object in an image, an electronic device, a storage medium, and a computer program product described above can be understood by those skilled in the art by reading the above description of the method for matching a target object in an image with reference to the accompanying drawings, and therefore, for brevity, no further description is provided herein.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the foregoing illustrative embodiments are merely exemplary and are not intended to limit the scope of the invention thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present invention should not be construed to reflect the intent: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some of the modules in the matching apparatus for target objects in images according to embodiments of the present invention. The present invention may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of matching a target object in an image, comprising:

respectively detecting target objects in different images, wherein the different images are images acquired from different angles aiming at the same target scene area;

respectively performing a dividing operation on the same regions in the different images, wherein the same regions in the different images are regions in which the target scene regions are respectively imaged in the different images, and the same regions include the detected target object, and the dividing operation is used for dividing the same regions in the different images into a plurality of grid regions based on the same dividing rule;

respectively counting the grid areas occupied by the detected target objects aiming at the different images; and

and determining whether the target objects in the different images are the same object or not according to the grid areas occupied by the target objects in the different images in the respective images.

2. The method of claim 1, wherein,

the separately counting, for the different images, grid areas occupied by the detected target objects comprises:

respectively executing marking operation aiming at the different images, wherein the marking operation is used for marking all grid areas in the different images so as to enable the grid areas at corresponding positions in the different images to have the same index numbers; and

counting the index number of the grid area occupied by the target object aiming at the detected target object in the different images;

determining whether the target objects in the different images are the same object according to the grid areas occupied by the target objects in the different images in the respective images comprises:

and determining whether the target objects in the different images are the same object or not according to the index numbers of the grid areas occupied by the target objects in the different images in the respective images.

3. The method of claim 2, wherein the determining whether the target objects in the different images are the same object according to the index numbers of the grid areas occupied by the target objects in the different images in the respective images comprises:

determining the number of the same index numbers in the index numbers of the grid areas occupied by the target objects in the different images in the respective images; and

and determining whether the target objects in the different images are the same object or not according to the number of the same index numbers.

4. The method of claim 3, wherein the determining whether the target objects in the different images are the same object according to the number of the same index numbers comprises:

determining an image with the largest total number of grid areas occupied by the target object in the different images as a main image;

determining that the target objects in the different images are the same object under the condition that the ratio of the number of the same index numbers to the total number of the grid areas occupied by the target objects in the main image is greater than a preset ratio threshold; otherwise, the target objects in different images are determined not to be the same object.

5. The method of claim 3, wherein the determining whether the target objects in the different images are the same object according to the number of the same index numbers comprises:

determining that the target objects in the different images are the same object under the condition that the number of the same index numbers is larger than a preset number threshold; otherwise, the target objects in different images are determined not to be the same object.

6. The method of claim 2, wherein each grid area is rectangular, the detected target object is represented by a bounding box,

the counting, for the detected target object in the different images, the index number of the grid area occupied by the target object, including:

for each vertex in the four vertexes of the bounding box of the detected target object, traversing all grid areas in the image where the target object is located, and determining whether the vertex of the bounding box is located in any grid area according to the vertex coordinates of the rectangle of the grid area;

for the condition that the vertex of the bounding box is positioned in any grid area, recording the grid area in which the vertex is positioned; and

and determining the index number of the grid area occupied by the target object according to the grid area where the four vertexes of the bounding box are positioned.

7. The method of any one of claims 1 to 6,

the method further comprises the following steps:

respectively detecting the attributes of the target objects in the different images by using an attribute detection algorithm;

the determining whether the target objects in the different images are the same object comprises:

and determining whether the target objects in the different images are the same object or not according to the grid areas occupied by the target objects in the different images in the respective images and the attributes of the target objects in the different images.

8. The method of any one of claims 1 to 6, wherein the dividing the same region in the different images respectively comprises:

and equally dividing the same region in the different images into grid regions respectively, wherein the number of the horizontal equally-divided grids of the same region in the different images is the same, and the number of the vertical equally-divided grids of the same region in the different images is also the same.

9. The method of claim 8, wherein said equally dividing the same region in the different images into grid regions, respectively, comprises:

determining an area S occupied by the detected target object in one of the different images_obj；

According to the occupied area S_objAnd respectively subdividing the same region in the different images into the grid regions, wherein the area of each grid region is k times of the occupied area, and k is an arbitrary value smaller than 1/3.

10. An electronic device comprising a processor and a memory, wherein the memory has stored therein computer program instructions for executing the method of matching a target object in an image according to any one of claims 1 to 9 when executed by the processor.

11. A storage medium on which program instructions are stored which, when executed, are for performing a method of matching a target object in an image as claimed in any one of claims 1 to 9.

12. A computer program product comprising a computer program for performing, when running, a method of matching a target object in an image according to any one of claims 1 to 9.