CN113762220B

CN113762220B - Object recognition method, electronic device, and computer-readable storage medium

Info

Publication number: CN113762220B
Application number: CN202111292368.6A
Authority: CN
Inventors: 郭宇鹏; 王晓; 毛少将; 雷庆庆; 袁帅
Original assignee: CRSC Communication and Information Group Co Ltd CRSCIC
Current assignee: CRSC Communication and Information Group Co Ltd CRSCIC
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-03-15
Anticipated expiration: 2041-11-03
Also published as: CN113762220A

Abstract

The present disclosure provides a target identification method and a target identification deviceThe sub-device and the computer-readable storage medium, the target identification method comprises: judging whether the original image meets a preset first image segmentation condition or not; under the condition that the original image meets the first image segmentation condition, segmenting the original image to obtain M₁A 1 st sub-image; for each ith sub-image, inputting the ith sub-image into a target recognition model to obtain an (i + 2) th detection result; judging whether the ith sub-image meets a preset second image segmentation condition or not according to the (i + 2) th detection result; under the condition that the ith sub-image meets the second image segmentation condition according to the (i + 2) th detection result, segmenting the ith sub-image to obtain M_i+1(ii) the (i + 1) th sub-image; and under the condition that the ith sub-image is determined not to meet the second image segmentation condition according to the (i + 2) th detection result, combining the detection results corresponding to all the sub-images obtained by segmentation to obtain a final detection result.

Description

Object recognition method, electronic device, and computer-readable storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, and in particular relates to a target identification method, electronic equipment and a computer-readable storage medium.

Background

The target recognition algorithm of the image is always one of the main scenes for artificial intelligence application, such as face recognition, pedestrian detection, mask wearing detection, and the like, and intrusion detection, perimeter protection, crowd gathering detection, and the like based on these basic recognition functions. Although the recognition accuracy rate of the current mainstream deep learning method is high in most experimental scenes, the situation is often more complex in practical application scenes, and therefore the landing effect of the target recognition algorithm is not ideal. For example, cameras in public places such as high-speed rail stations are generally higher in installation position, targets are less imaged in image pictures, the environment is more complex, and human eye resolution has certain difficulty, so that a serious challenge is provided for the accuracy of target identification algorithms.

An object recognition algorithm in the industry generally inputs image data with a specific size into a model for detection to obtain a final detection result, and the obvious defect of the model is that when the size of some images is larger, the detection model is generally limited by the limit of the size of the image size acceptable by the model under the condition, and when the size of the input image is larger than the size of the image acceptable by the model, the size of the input image is reduced; when the size of the input image is smaller than the size of the image acceptable by the model, the size of the input image is enlarged. In the case of enlargement, there is no information loss in the features of the input image, but in the case of reduction, the loss of the feature information of the input image is relatively large, and particularly in the case where the resolution of the input image is large and the angle of view is large, resulting in relatively poor final detection effect.

Disclosure of Invention

The embodiment of the disclosure provides a target identification method, electronic equipment and a computer readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a target identification method, including: judging whether the original image meets a preset first image segmentation condition or not; under the condition that the original image meets the first image segmentation condition, segmenting the original image to obtain M₁A 1 st sub-image; wherein M is₁Is an integer greater than or equal to 1; for each ith sub-image, inputting the ith sub-image into a target recognition model to obtain an (i + 2) th detection result; judging whether the ith sub-image meets a preset second image segmentation condition or not according to the (i + 2) th detection result; under the condition that the ith sub-image meets the second image segmentation condition according to the (i + 2) th detection result, segmenting the ith sub-image to obtain M_i+1(ii) the (i + 1) th sub-image; wherein M is_i+1Is greater than or equal toAn integer from 1; i is an integer of 1,2 and … … in sequence; continuing to perform the step of inputting the (i + 1) th sub-image into the target recognition model to obtain an (i + 3) th detection result for each (i + 1) th sub-image; and under the condition that the ith sub-image is determined not to meet the second image segmentation condition according to the (i + 2) th detection result, combining the detection results corresponding to all the sub-images obtained by segmentation to obtain a final detection result.

In some exemplary embodiments, after the merging the detection results corresponding to all the sub-images obtained by the segmentation to obtain the final detection result, the method further includes: and deleting the target of which the size of the target frame in the final detection result is smaller than or equal to a first preset threshold value.

In some exemplary embodiments, the determining whether the original image satisfies a preset first image segmentation condition includes: under the condition that the size of the original image is larger than the standard input size of a target recognition model, inputting the original image into the target recognition model to obtain a 1 st detection result; reducing the size of the original image to one-half of the original size of N1 to obtain a reduced original image, and inputting the reduced original image into the target recognition model to obtain a 2 nd detection result; wherein N1 is an integer greater than or equal to 2; and judging whether the original image meets the first image segmentation condition or not according to the number of small targets in the 1 st detection result, the 2 nd detection result and the 1 st detection result.

In some exemplary embodiments, the determining whether the original image satisfies the first image segmentation condition according to the number of small objects in the 1 st detection result, the 2 nd detection result, and the 1 st detection result includes at least one of: determining that the original image meets the first image segmentation condition when the difference between the 1 st detection result and the 2 nd detection result is greater than or equal to a second preset threshold and the number of small targets in the 1 st detection result is greater than or equal to a third preset threshold; determining that the original image does not satisfy the first image segmentation condition when a difference between the 1 st detection result and the 2 nd detection result is less than or equal to a fourth preset threshold, or when the number of small objects in the 1 st detection result is less than the third preset threshold.

In some exemplary embodiments, the small target in the 1 st detection result is a target in the 1 st detection result whose difference between the size and the maximum target frame size is greater than a fifth preset threshold.

In some exemplary embodiments, the determining whether the ith sub-image satisfies a preset second image segmentation condition according to the (i + 2) th detection result includes at least one of: determining that the ith sub-image meets the second image segmentation condition when the number of small targets in the (i + 2) th detection result is greater than or equal to the third preset threshold; determining that the ith sub-image does not satisfy the second image segmentation condition if the number of small targets in the (i + 2) th detection result is less than the third preset threshold.

In some exemplary embodiments, the segmenting the original image results in M₁The 1 st sub-image includes: according to the standard input size of the target recognition model, performing sliding segmentation on the original image to enable the size of the 1 st sub-image obtained by segmentation to be smaller than or equal to the standard input size of the target recognition model; the step length of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the 1 st detection result, and the 1 st detection result is the detection result obtained by inputting the original image into the target recognition model.

In some exemplary embodiments, the dividing the ith sub-image is M_i+1The (i + 1) th sub-image includes: magnifying the ith sub-image by N2 times; wherein N2 is an integer greater than or equal to 2; performing sliding segmentation on the enlarged ith sub-image according to the standard input size of the target recognition model, so that the size of the (i + 1) th sub-image obtained by segmentation is smaller than or equal to the standard input size of the target recognition model; wherein the step length of the sliding is the target identificationA difference between a standard input size of the other model and a maximum target frame size in the (i + 2) th detection result.

In a second aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory having at least one program stored thereon, which when executed by the at least one processor, causes the at least one processor to implement any of the above-described object recognition methods.

In a third aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the above-mentioned object recognition methods.

According to the target identification method provided by the embodiment of the disclosure, the accuracy of target identification is improved by performing adaptive segmentation on the original image, so that the final detection effect is improved.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. Detailed exemplary embodiments are described by reference to the accompanying drawings, in which:

fig. 1 is a flowchart of a target identification method according to an embodiment of the present disclosure;

fig. 2 is a block diagram of a target recognition apparatus according to another embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the object recognition method, the electronic device, and the computer-readable storage medium provided in the present disclosure are described in detail below with reference to the accompanying drawings.

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of at least one of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of at least one other feature, integer, step, operation, element, component, and/or group thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Fig. 1 is a flowchart of a target identification method according to an embodiment of the present disclosure.

In a first aspect, referring to fig. 1, an embodiment of the present disclosure provides a target identification method, including:

and step 100, judging whether the original image meets a preset first image segmentation condition.

In some exemplary embodiments, the value of N1 may be set empirically. For example, N1 may be set to 2.

In some exemplary embodiments, the size of the original image being larger than the standard input size of the target recognition model comprises at least one of: the width of the original image is larger than the standard input width of the target recognition model; the height of the original image is greater than the standard input height of the target recognition model.

In some exemplary embodiments, the standard input size of the target recognition model refers to the maximum size of the input image supported by the target recognition model, and the size of the image generally includes the width and height of the image.

The object recognition model in the embodiments of the present disclosure may be any object recognition model known to those skilled in the art, or may be a new object recognition model developed in the future, and the object recognition model is used for recognizing an object in an image.

In some exemplary embodiments, the 1 st detection result may include at least one of: the number of the detected targets in the original image, the confidence of each detected target, and the position information of the target frame in which each detected target is located in the original image.

In some exemplary embodiments, the position information of the target frame in the original image includes: coordinates of any point on the target frame in the original image, width of the target frame and height of the target frame.

In some exemplary embodiments, the 2 nd detection result may include at least one of: the number of the targets detected from the reduced original image, the confidence of each detected target, and the position information of the target frame where each detected target is located in the reduced original image.

In some exemplary embodiments, the position information of the target frame in the reduced original image includes: coordinates of any point on the target frame in the reduced original image, width of the target frame and height of the target frame.

In some exemplary embodiments, the determining whether the original image satisfies the first image segmentation condition according to the number of small objects in the 1 st detection result, the 2 nd detection result, and the 1 st detection result includes at least one of:

determining that the original image meets the first image segmentation condition when the difference between the 1 st detection result and the 2 nd detection result is greater than or equal to a second preset threshold and the number of small targets in the 1 st detection result is greater than or equal to a third preset threshold;

determining that the original image does not satisfy the first image segmentation condition when a difference between the 1 st detection result and the 2 nd detection result is less than or equal to a fourth preset threshold, or when the number of small objects in the 1 st detection result is less than the third preset threshold.

In some exemplary embodiments, the difference between the 1 st and 2 nd test results is calculated in at least one of the following ways: the absolute value of the difference between the number of targets in the 1 st detection result and the number of targets in the 2 nd detection result; for each object detected, the absolute value of the difference between the confidence of the object detected in the 1 st detection result and the confidence of the object detected in the 2 nd detection result.

In other exemplary embodiments, the difference between the 1 st and 2 nd test results is calculated using at least one of: the ratio of the absolute value of the difference between the number of targets in the 1 st detection result and the number of targets in the 2 nd detection result to the number of targets in the 1 st detection result; for each object detected, the ratio of the absolute value of the difference between the confidence of the object detected in the 1 st detection result and the confidence of the object detected in the 2 nd detection result to the confidence of the object detected in the 1 st detection result.

In other exemplary embodiments, the difference between the 1 st and 2 nd test results is calculated using at least one of: the ratio of the absolute value of the difference between the number of targets in the 1 st detection result and the number of targets in the 2 nd detection result to the number of targets in the 2 nd detection result; for each object detected, the ratio of the absolute value of the difference between the confidence of the object detected in the 1 st detection result and the confidence of the object detected in the 2 nd detection result to the confidence of the object detected in the 2 nd detection result.

In some exemplary embodiments, the second preset threshold and the fourth preset threshold may be set according to actual needs, for example, the second preset threshold is set to be 30%, and the fourth preset threshold is set to be 20%.

In some exemplary embodiments, the fifth preset threshold may be set according to actual needs, for example, the fifth preset threshold is set to 40%.

In some exemplary embodiments, the difference between the size of the small target in the 1 st detection result and the maximum target frame size is the difference between the maximum target frame size in the 1 st detection result and the size of the target frame in which the small target in the 1 st detection result is located.

In other exemplary embodiments, the difference between the size of the small target in the 1 st detection result and the size of the maximum target frame is a ratio of a difference between the size of the maximum target frame in the 1 st detection result and the size of the target frame in which the small target in the 1 st detection result is located to the size of the maximum target frame in the 1 st detection result.

In other exemplary embodiments, the difference between the size of the small target in the 1 st detection result and the size of the maximum target frame is a ratio of a difference between the size of the maximum target frame in the 1 st detection result and the size of the target frame in which the small target in the 1 st detection result is located to the size of the target frame in which the small target in the 1 st detection result is located.

In other exemplary embodiments, the difference between the size of the small target and the size of the maximum target frame in the 1 st detection result may be calculated in other manners, and the specific calculation manner is not used to limit the protection scope of the embodiment of the present disclosure.

In some exemplary embodiments, the maximum target frame size refers to the size of the target frame having the largest area or the size of the target frame having the longest circumference, and the size of the target frame generally includes the width and height of the target frame.

In some exemplary embodiments, the third preset threshold may be set according to actual needs, for example, the third preset threshold is set to 10.

Step 101, under the condition that the original image meets the first image segmentation condition, segmenting the original image to obtain M₁A 1 st sub-image; wherein M is₁Is an integer greater than or equal to 1.

In some exemplary embodiments, the size of the divided 1 st sub-image is smaller than or equal to the standard input size of the target recognition model, that is, the width of the divided 1 st sub-image is smaller than or equal to the standard input width of the target recognition model, and the height of the divided 1 st sub-image is smaller than or equal to the standard input height of the target recognition model.

In some exemplary embodiments, the standard input width of the target recognition model refers to a maximum width of the input image supported by the target recognition model, and the standard input height of the target recognition model refers to a maximum height of the input image supported by the target recognition model.

In some exemplary embodiments, the width-direction sliding step is a difference between a standard input width of the target recognition model and a width of a maximum target frame in the 1 st detection result, and the height-direction sliding step is a difference between a standard input height of the target recognition model and a height of the maximum target frame in the 1 st detection result.

In some exemplary embodiments, the purpose of the sliding segmentation of the original image is to prevent missing detection of the target.

102, inputting the ith sub-image into a target recognition model to obtain an (i + 2) th detection result for each ith sub-image; judging whether the ith sub-image meets a preset second image segmentation condition or not according to the (i + 2) th detection result; under the condition that the ith sub-image meets the second image segmentation condition according to the (i + 2) th detection result, segmenting the ith sub-image to obtain M_i+1(ii) the (i + 1) th sub-image; wherein M i +1 is an integer greater than or equal to 1; i is an integer of 1,2 and … … in sequence; and continuing to execute the step of inputting the (i + 1) th sub-image into the target recognition model to obtain an (i + 3) th detection result for each (i + 1) th sub-image.

In some exemplary embodiments, inputting the ith sub-image into the target recognition model to obtain the (i + 2) th detection result may be implemented in a parallel or serial manner. For example, when the parallel mode is adopted, two target recognition models can be set to perform parallel processing, different target recognition models process different ith sub-images, but the specific implementation codes of different target recognition models are completely the same.

In some exemplary embodiments, the (i + 2) th detection result may include at least one of: the number of the targets detected from the ith sub-image, the confidence level of each detected target, and the position information of the target frame where each detected target is located in the ith sub-image.

In some exemplary embodiments, the position information of the target frame in the ith sub-image includes: coordinates of any point on the target frame in the ith sub-image, width of the target frame and height of the target frame.

In some exemplary embodiments, the determining whether the ith sub-image satisfies a preset second image segmentation condition according to the (i + 2) th detection result includes at least one of:

determining that the ith sub-image meets the second image segmentation condition when the number of small targets in the (i + 2) th detection result is greater than or equal to the third preset threshold;

determining that the ith sub-image does not satisfy the second image segmentation condition if the number of small targets in the (i + 2) th detection result is less than the third preset threshold.

In some exemplary embodiments, the small target in the (i + 2) th detection result is a target in the (i + 2) th detection result whose difference between the size and the maximum target frame size is greater than a fifth preset threshold.

In some exemplary embodiments, the difference between the size of the small target in the (i + 2) th detection result and the maximum target frame size is the difference between the maximum target frame size in the (i + 2) th detection result and the size of the target frame in which the small target in the (i + 2) th detection result is located.

In other exemplary embodiments, the difference between the size of the small target in the (i + 2) th detection result and the maximum target frame size is a ratio of a difference between the maximum target frame size in the (i + 2) th detection result and the size of the target frame in which the small target in the (i + 2) th detection result is located to the maximum target frame size in the (i + 2) th detection result.

In other exemplary embodiments, the difference between the size of the small target in the (i + 2) th detection result and the maximum target frame size is a ratio of a difference between the maximum target frame size in the (i + 2) th detection result and the size of the target frame in which the small target in the (i + 2) th detection result is located to the size of the target frame in which the small target in the (i + 2) th detection result is located.

In other exemplary embodiments, the difference between the size of the small target and the size of the maximum target frame in the (i + 2) th detection result may be calculated in other manners, and the specific calculation manner is not used to limit the protection scope of the embodiment of the present disclosure.

In some exemplary embodiments, the dividing the ith sub-image is M_i+1The (i + 1) th sub-image includes: magnifying the ith sub-image by N2 times; wherein N2 is an integer greater than or equal to 2; performing sliding segmentation on the enlarged ith sub-image according to the standard input size of the target recognition model, so that the size of the (i + 1) th sub-image obtained by segmentation is smaller than or equal to the standard input size of the target recognition model; wherein the step size of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the (i + 2) th detection result.

In some exemplary embodiments, the specific value of N2 may be set empirically, e.g., N2 may be set to 2.

In some exemplary embodiments, the size of the (i + 1) th segmented sub-image being smaller than or equal to the standard input size of the target recognition model means that the width of the (i + 1) th segmented sub-image being smaller than or equal to the standard input width of the target recognition model and the height of the (i + 1) th segmented sub-image being smaller than or equal to the standard input height of the target recognition model.

In some exemplary embodiments, the width-direction sliding step is a difference between a standard input width of the target recognition model and a width of a maximum target frame in the (i + 2) th detection result, and the height-direction sliding step is a difference between a standard input height of the target recognition model and a height of a maximum target frame in the (i + 2) th detection result.

In some exemplary embodiments, the purpose of the sliding segmentation of the enlarged ith sub-image is to prevent missing detection of the target.

And 103, combining the detection results corresponding to all the sub-images obtained by division to obtain a final detection result under the condition that the ith sub-image is determined not to meet the second image division condition according to the (i + 2) th detection result.

In some exemplary embodiments, all the sub-images obtained by the splitting refer to all the ith sub-images obtained by the splitting, and i is an integer which is 1,2,3, … … in sequence. For example, when i =1, all the sub-images obtained by the division include: m obtained by division₁A 1 st sub-image; when i =2, all the sub-images obtained by the division include: m obtained by division₁1 st sub-image and M₂A 2 nd sub-image; when i =3, all the sub-images obtained by the division include: m1 sub-image 1, M obtained by division₂2 nd sub-image and M₃A 3 rd sub-image; and so on.

In some exemplary embodiments, merging the detection results corresponding to all the sub-images obtained by the segmentation to obtain a final detection result includes: and calculating the position information of each target in the original image according to the position information of the target frame where the target is located in the detection results corresponding to all the sub-images obtained by segmentation, the sliding step length adopted by each segmentation and the N2 value adopted by each segmentation, and combining the position information of all the targets in the original image by adopting an Intersection-over-Unit (IOU) and a Non-Maximum Suppression (NMS) to obtain a final detection result.

In some exemplary embodiments, the IOU is a concept used in target detection, and is the overlap ratio of the generated candidate frame and the original marked frame, i.e. the ratio of their intersection to union, and ideally the ratio is complete overlap, i.e. the ratio is 1.

In some exemplary embodiments, the NMS, as its name implies, suppresses either the elements that are maxima for use in target detection, or extracts the target detection boxes with high confidence and suppresses false detection boxes with low confidence.

In a second aspect, another embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory having at least one program stored thereon, the at least one program, when executed by the at least one processor, causing the at least one processor to implement the object recognition method of any one of the above.

Wherein, the processor is a device with data processing capability, which includes but is not limited to a Central Processing Unit (CPU) and the like; memory is a device with data storage capabilities including, but not limited to, random access memory (RAM, more specifically SDRAM, DDR, etc.), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), FLASH memory (FLASH).

In some embodiments, the processor, memory, and in turn other components of the computing device are connected to each other by a bus.

In a third aspect, another embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any one of the above-mentioned object recognition methods.

In a fourth aspect, another embodiment of the present disclosure provides an object recognition apparatus, including: a judging module 201, configured to judge whether an original image meets a preset first image segmentation condition; a segmentation module 202, configured to segment the original image to obtain M when the original image meets the first image segmentation condition₁A 1 st sub-image; wherein M is₁Is an integer greater than or equal to 1; the determining module 201 is further configured to: for each ith sub-image, inputting the ith sub-image into a target recognition model to obtain an (i + 2) th detection result; judging whether the ith sub-image meets a preset second image segmentation condition or not according to the (i + 2) th detection result(ii) a The segmentation module 202 is further configured to: under the condition that the ith sub-image meets the second image segmentation condition according to the (i + 2) th detection result, segmenting the ith sub-image to obtain M_i+1(ii) the (i + 1) th sub-image; wherein M is_i+1Is an integer greater than or equal to 1; i is an integer of 1,2 and … … in sequence; continuing to perform the step of inputting the (i + 1) th sub-image into the target recognition model to obtain an (i + 3) th detection result for each (i + 1) th sub-image; a merging module 203, configured to merge the detection results corresponding to all the sub-images obtained by segmentation to obtain a final detection result when it is determined that the ith sub-image does not satisfy the second image segmentation condition according to the (i + 2) th detection result.

In some exemplary embodiments, the merge module 203 is further configured to: and deleting the target of which the size of the target frame in the final detection result is smaller than or equal to a first preset threshold value.

In some exemplary embodiments, the determining module 201 is specifically configured to implement the determining whether the original image meets a preset first image segmentation condition by: under the condition that the size of the original image is larger than the standard input size of a target recognition model, inputting the original image into the target recognition model to obtain a 1 st detection result; reducing the size of the original image to one-half of the original size of N1 to obtain a reduced original image, and inputting the reduced original image into the target recognition model to obtain a 2 nd detection result; wherein N1 is an integer greater than or equal to 2; and judging whether the original image meets the first image segmentation condition or not according to the number of small targets in the 1 st detection result, the 2 nd detection result and the 1 st detection result.

In some exemplary embodiments, the determining module 201 is specifically configured to determine whether the original image satisfies the first image segmentation condition according to the small target number of the 1 st detection result, the 2 nd detection result, and the 1 st detection result by at least one of the following manners: determining that the original image meets the first image segmentation condition when the difference between the 1 st detection result and the 2 nd detection result is greater than or equal to a second preset threshold and the number of small targets in the 1 st detection result is greater than or equal to a third preset threshold; determining that the original image does not satisfy the first image segmentation condition when a difference between the 1 st detection result and the 2 nd detection result is less than or equal to a fourth preset threshold, or when the number of small objects in the 1 st detection result is less than the third preset threshold.

In some exemplary embodiments, the determining module 201 is specifically configured to determine whether the ith sub-image meets a preset second image segmentation condition according to the (i + 2) th detection result by at least one of the following manners: determining that the ith sub-image meets the second image segmentation condition when the number of small targets in the (i + 2) th detection result is greater than or equal to the third preset threshold; determining that the ith sub-image does not satisfy the second image segmentation condition if the number of small targets in the (i + 2) th detection result is less than the third preset threshold.

In some exemplary embodiments, the segmentation module 202 is specifically configured to implement the segmentation of the original image into M in the following manner₁1 st sub-image: according to the standard input size of the target recognition model, performing sliding segmentation on the original image to enable the size of the 1 st sub-image obtained by segmentation to be smaller than or equal to the standard input size of the target recognition model; the step length of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the 1 st detection result, and the 1 st detection result is the detection result obtained by inputting the original image into the target recognition model.

In some exemplary embodiments, the segmentation module 202 is specifically configured to implement the i-th sub-image in the following mannerIs divided into M_i+1(i + 1) th sub-image: magnifying the ith sub-image by N2 times; wherein N2 is an integer greater than or equal to 2; performing sliding segmentation on the enlarged ith sub-image according to the standard input size of the target recognition model, so that the size of the (i + 1) th sub-image obtained by segmentation is smaller than or equal to the standard input size of the target recognition model; wherein the step size of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the (i + 2) th detection result.

The specific implementation process of the target identification device is the same as that of the target identification method in the foregoing embodiment, and is not described here again.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. An object recognition method, comprising:

judging whether the original image meets a preset first image segmentation condition or not;

under the condition that the original image meets the first image segmentation condition, segmenting the original image to obtain M₁A 1 st sub-image; wherein M is₁Is an integer greater than or equal to 1;

for each ith sub-image, inputting the ith sub-image into a target recognition model to obtain an (i + 2) th detection result; judging whether the ith sub-image meets a preset second image segmentation condition or not according to the (i + 2) th detection result; under the condition that the ith sub-image meets the second image segmentation condition according to the (i + 2) th detection result, segmenting the ith sub-image to obtain M_i+1(ii) the (i + 1) th sub-image; wherein M is_i+1Is an integer greater than or equal to 1; i is an integer of 1,2 and … … in sequence; continue execution for eachInputting the (i + 1) th sub-image into the target recognition model to obtain an (i + 3) th detection result;

under the condition that the ith sub-image is determined not to meet the second image segmentation condition according to the (i + 2) th detection result, combining the detection results corresponding to all the sub-images obtained by segmentation to obtain a final detection result;

wherein, the judging whether the original image meets a preset first image segmentation condition comprises:

under the condition that the size of the original image is larger than the standard input size of a target recognition model, inputting the original image into the target recognition model to obtain a 1 st detection result;

reducing the size of the original image to one-half of the original size of N1 to obtain a reduced original image, and inputting the reduced original image into the target recognition model to obtain a 2 nd detection result; wherein N1 is an integer greater than or equal to 2;

judging whether the original image meets the first image segmentation condition or not according to the number of small targets in the 1 st detection result, the 2 nd detection result and the 1 st detection result;

wherein the judging whether the ith sub-image meets a preset second image segmentation condition according to the (i + 2) th detection result comprises at least one of the following steps:

determining that the ith sub-image meets the second image segmentation condition when the number of small targets in the (i + 2) th detection result is greater than or equal to a third preset threshold;

2. The object recognition method according to claim 1, wherein after the detection results corresponding to all the divided sub-images are combined to obtain the final detection result, the method further comprises:

and deleting the target of which the size of the target frame in the final detection result is smaller than or equal to a first preset threshold value.

3. The target recognition method of claim 1, wherein the determining whether the original image satisfies the first image segmentation condition according to the small target number in the 1 st detection result, the 2 nd detection result and the 1 st detection result comprises at least one of:

4. The target identification method according to claim 1, wherein the small target in the 1 st detection result is a target in the 1 st detection result whose difference between the size and the maximum target frame size is greater than a fifth preset threshold.

5. The object recognition method according to any one of claims 1-2, wherein the segmenting the original image results in M₁The 1 st sub-image includes:

according to the standard input size of the target recognition model, performing sliding segmentation on the original image to enable the size of the 1 st sub-image obtained by segmentation to be smaller than or equal to the standard input size of the target recognition model; the step length of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the 1 st detection result, and the 1 st detection result is the detection result obtained by inputting the original image into the target recognition model.

6. The object recognition method of any one of claims 1-2, wherein the segmenting the ith sub-image results in M_i+1The (i + 1) th sub-image includes:

magnifying the ith sub-image by N2 times; wherein N2 is an integer greater than or equal to 2;

performing sliding segmentation on the enlarged ith sub-image according to the standard input size of the target recognition model, so that the size of the (i + 1) th sub-image obtained by segmentation is smaller than or equal to the standard input size of the target recognition model; wherein the step size of the sliding is the difference between the standard input size of the target recognition model and the maximum target frame size in the (i + 2) th detection result.

7. An electronic device, comprising:

at least one processor;

memory having stored thereon at least one program which, when executed by the at least one processor, causes the at least one processor to carry out the object recognition method of any one of claims 1-6.

8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the object recognition method of any one of claims 1 to 6.