CN112734747A

CN112734747A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN112734747A
Application number: CN202110081738.5A
Authority: CN
Inventors: 周红花
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2021-04-30
Anticipated expiration: 2041-01-21

Abstract

The application discloses a target detection method, a target detection device, electronic equipment and a storage medium; the method comprises the steps of determining at least one target object to be detected in an image to be detected, and acquiring a target template image of each target object; determining the image complexity of each target template image based on the pixel value difference of each pixel point in each target template image; determining a detection threshold corresponding to each target object according to the complexity of the image; identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; aiming at each target object, determining an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold; and determining a target object detection area of the target object from the primary object detection area. The method and the device can adapt to detection scenes with various types of targets, and improve the accuracy of target detection.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

With the development of computer technology, artificial intelligence is more and more widely applied. The technology of target detection by machine learning means relying on artificial intelligence is also becoming a mainstream research direction of target detection. The task of object detection is to find objects of interest from the image, determine their category and location, e.g. to detect faces, vehicles or buildings etc. from the image.

In the current related technology, target detection can be generally performed through a deep neural network model, but the deep neural network model needs a large amount of labeled data, and the coordinate data of a target frame is labeled on an image sample to train the model, so that the method has high consumption of human and material resources.

In addition, a template matching method can also be adopted, the similarity of the target template picture and the screenshot of each sliding window position of the picture to be detected is calculated through comparison between pictures of the sliding windows, and the window with the similarity higher than a specified threshold value is used as a detection result.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, which can adapt to detection scenes with various types of targets and improve the accuracy of target detection.

The embodiment of the application provides a target detection method, which comprises the following steps:

determining at least one target object to be detected in an image to be detected, and acquiring a target template image corresponding to each target object;

determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image;

determining a detection threshold corresponding to each target object according to the image complexity of each target template image;

according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object;

for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object;

and determining at least one target object detection area of the target object from the primary object detection areas of the target object to obtain at least one target object detection area of each target object.

Correspondingly, an embodiment of the present application provides an object detection apparatus, including:

the device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is used for determining at least one target object to be detected in an image to be detected and acquiring a target template image corresponding to each target object;

the complexity determining unit is used for determining the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image;

the threshold value determining unit is used for determining a detection threshold value corresponding to each target object according to the image complexity of each target template image;

the identification unit is used for identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object;

a primary selection determining unit, configured to determine, for each target object, at least one primary selection object detection area of the target object from the candidate object detection areas according to a similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object;

and the target determining unit is used for determining at least one target object detection area of the target object from the primary selection object detection areas of the target object to obtain at least one target object detection area of each target object.

Optionally, in some embodiments of the present application, the identifying unit may include a scaling subunit and an identifying subunit, as follows:

the scaling subunit is configured to scale, in different scales, the target template image corresponding to each target object to obtain target template images in multiple scales corresponding to each target object;

and the identification subunit is used for identifying the image to be detected based on the target template images under multiple scales to obtain multiple candidate object detection areas of each target object.

Optionally, in some embodiments of the present application, the target determination unit may include a dividing subunit, a calculating subunit, and a fifth determination subunit, as follows:

the dividing subunit is configured to, for each target object, perform mesh division on the primary selection object detection area and the target template image respectively to obtain a plurality of sub-object detection mesh areas of the primary selection object detection area and a plurality of sub-template mesh areas of the target template image;

a calculating subunit, configured to calculate a sub-region similarity between a target sub-object detection grid region of the initially selected object detection region and a target sub-template grid region of the target template image, where a position of the target sub-object detection grid region corresponds to a position of the target sub-template grid region;

and the fifth determining subunit is configured to determine, based on the sub-region similarity, at least one target object detection region of the target object from the initially selected object detection regions of the target object, to obtain at least one target object detection region of each target object.

Optionally, in some embodiments of the present application, the calculating subunit may be specifically configured to calculate a first pixel mean value of a target sub-object detection grid region of the primary selection object detection region in each color channel; calculating a second pixel mean value of a target sub-template grid area of the target template image under each color channel; and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

Optionally, in some embodiments of the present application, the complexity determining unit may include a first determining subunit and a second determining subunit, as follows:

the first determining subunit is configured to determine at least two types of difference parameters of each target template image based on a pixel value difference between pixels in each target template image;

and the second determining subunit is used for determining the image complexity of each target template image based on the difference parameters.

Optionally, in some embodiments of the present application, the difference parameter includes a lateral difference parameter and a longitudinal difference parameter; the second determining subunit may be specifically configured to fuse the lateral difference parameter and the longitudinal difference parameter to obtain an image complexity of each target template image.

Optionally, in some embodiments of the application, the threshold determining unit may be specifically configured to determine the detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relationship set, where the preset mapping relationship set includes a mapping relationship between the preset image complexity and a preset detection threshold.

Optionally, in some embodiments of the present application, the preset mapping relationship set includes a first sub-mapping relationship set and a second sub-mapping relationship set; the first sub-mapping relation set comprises an inverse mapping relation between a preset image complexity and a preset detection threshold value; the second sub-mapping relation set comprises a fixed mapping relation between a preset image complexity and a preset detection threshold value;

the threshold determining unit may include a third determining subunit and a fourth determining subunit, as follows:

the third determining subunit is configured to, when the image complexity of the target template image is smaller than a preset complexity, determine a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the first sub-mapping relationship set;

and the fourth determining subunit is configured to, when the image complexity of the target template image is not less than the preset complexity, determine a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the second sub-mapping relationship set.

Optionally, in some embodiments of the present application, the target determining unit may include a first selecting subunit, a second selecting subunit, and a sixth determining subunit, as follows:

the first selecting subunit is configured to, for each target object, select, as a candidate target object detection area of the target object, a primary object detection area with a highest similarity to the target template image in the primary object detection areas;

a second selecting subunit, configured to select, from the primary object detection area, a candidate target object detection area corresponding to the target object based on a distance between the candidate target object detection area and the primary object detection area, where the distance represents an overlapping degree of the candidate target object detection area and the primary object detection area;

a sixth determining subunit, configured to determine at least one target object detection area of the target object according to the candidate target object detection area.

Optionally, in some embodiments of the present application, the second selecting subunit may be specifically configured to use, as a reference object detection area, a primary object detection area whose distance from the candidate target object detection area is greater than a preset distance threshold; and selecting a reference object detection area with the highest similarity with the target template image from the reference object detection areas as a candidate target object detection area corresponding to the target object.

The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores a plurality of instructions, and the processor loads the instructions to execute the steps in the target detection method provided by the embodiment of the application.

In addition, the embodiment of the present application also provides a storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the object detection method provided by the embodiment of the present application.

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium, which can determine at least one target object to be detected in an image to be detected and acquire a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object; for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object; and determining at least one target object detection area of the target object from the primary object detection areas of the target object to obtain at least one target object detection area of each target object. According to the embodiment of the application, a large amount of training is not needed, manpower and material resources are saved, the detection threshold value of the target object is determined based on the image complexity, the detection threshold value is not fixed and unchangeable, the method and the device can adapt to the detection scene with various types of targets, and the accuracy of target detection can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1a is a schematic view of a scene of a target detection method provided in an embodiment of the present application;

FIG. 1b is a flowchart of a target detection method provided in an embodiment of the present application;

fig. 1c is an explanatory diagram of a target detection method provided in the embodiment of the present application;

FIG. 1d is another illustration of a target detection method provided by an embodiment of the present application;

fig. 1e is another illustrative diagram of a target detection method provided in an embodiment of the present application;

FIG. 2a is another flowchart of a target detection method provided in an embodiment of the present application;

FIG. 2b is another flowchart of a target detection method provided by an embodiment of the present application;

FIG. 2c is another flowchart of a target detection method provided by an embodiment of the present application;

FIG. 3a is a schematic structural diagram of an object detection apparatus provided in an embodiment of the present application;

FIG. 3b is a schematic structural diagram of an object detection apparatus provided in the embodiment of the present application;

FIG. 3c is a schematic structural diagram of an object detection apparatus according to an embodiment of the present disclosure;

FIG. 3d is a schematic diagram of another structure of the object detection apparatus according to the embodiment of the present application;

fig. 3e is another schematic structural diagram of the object detection apparatus provided in the embodiment of the present application;

FIG. 3f is a schematic diagram of another structure of the object detection apparatus according to the embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a target detection method, a target detection device, electronic equipment and a storage medium. The object detection device may be specifically integrated in an electronic device, and the electronic device may be a terminal or a server.

It is understood that the target detection method of the present embodiment may be executed on the terminal, may also be executed on the server, and may also be executed by both the terminal and the server. The above examples should not be construed as limiting the present application.

As shown in fig. 1a, a target detection method is performed by a terminal and a server together. The target detection system provided by the embodiment of the application comprises a terminal 10, a server 11 and the like; the terminal 10 and the server 11 are connected via a network, for example, a wired or wireless network connection, wherein the object detection device may be integrated in the server.

The server 11 may be configured to: determining at least one target object to be detected in an image to be detected, and acquiring a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object; for each target object, determining an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold corresponding to the target object; and determining a target object detection area of the target object from the primary object detection area of the target object to obtain a target object detection area of each target object. The server 11 may be a single server, or may be a server cluster or a cloud server composed of a plurality of servers.

The terminal 10 may obtain an image to be detected, determine at least one target object to be detected in the image to be detected, and send the image to be detected and related information such as the target object to be detected to the server 11, so that the server 11 identifies the image to be detected based on a target template image of the target object to obtain a target object detection area of the target object in the image to be detected. The server 11 may also transmit the detection result obtained by the recognition to the terminal 10, that is, the target object detection area of the target object is transmitted to the terminal 10, and the terminal 10 may receive the target object detection area of the target object transmitted by the server 11. The terminal 10 may include a mobile phone, a smart television, a tablet Computer, a notebook Computer, a Personal Computer (PC), or the like. A client, which may be an application client or a browser client or the like, may also be provided on the terminal 10.

The above-mentioned step of detecting the object by the server 11 may be executed by the terminal 10.

The embodiment of the application provides a target detection method, which relates to a computer vision technology in the field of artificial intelligence. The method and the device can adapt to detection scenes with various types of targets, and improve the accuracy of target detection.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision technology (CV) is a science for researching how to make a machine "see", and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, object detection and positioning, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

The present embodiment will be described from the perspective of an object detection apparatus, which may be specifically integrated in an electronic device, and the electronic device may be a server or a terminal, etc.

The target detection method of the embodiment of the application can be applied to various target detection scenes, the type of the target object to be detected is not limited, and for example, the target object can be a small icon, a character object, a large icon, a complex object and the like. For example, for a large number of types of game target detection tasks in various game screens, the target detection method provided by the embodiment can determine the detection threshold of the target object based on the image complexity, the detection threshold is not fixed and unchangeable, and can adapt to detection scenes with various types of targets, thereby improving the accuracy of target detection.

As shown in fig. 1b, the specific flow of the target detection method may be as follows:

101. determining at least one target object to be detected in the image to be detected, and acquiring a target template image corresponding to each target object.

The image to be detected may include at least one target object to be detected, which is specifically an image at which a specific position of the target object needs to be identified. The image type of the image to be detected is not limited.

The target object can be various types of targets to be recognized, such as various types of small icons, simple targets, characters, buttons, large icons, complex targets, and the like. The target template image can be used to identify a target object in the image to be detected, which can be regarded as a standard image containing the target object. The standard image is an image corresponding to a predetermined target object.

102. And determining the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image.

Wherein the image complexity characterizes the complexity of texture colors, etc. of the target template image. The pixel value may be specifically an RGB (red, green, blue) value, or a gray scale value, which is not limited in this embodiment.

Optionally, in this embodiment, the step "determining the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image" may include:

determining at least two types of difference parameters of each target template image based on the pixel value difference between pixel points in each target template image;

determining an image complexity for each target template image based on the difference parameters.

The difference parameters may include a horizontal difference parameter, a vertical difference parameter, an oblique difference parameter, and the like of the pixel points, which is not limited in this embodiment. Specifically, the difference parameters may be weighted and fused to obtain the image complexity of the target template image. Therefore, based on the difference parameters, the image complexity can be acquired more quickly and accurately.

The horizontal difference parameter can represent the difference between pixel points of each row of the target template image from left to right or from right to left; the longitudinal difference parameter can represent the difference between pixel points of each row of the target template image from top to bottom or from bottom to top; the oblique difference parameters can represent the difference between pixel points of the target template image in the non-parallel and non-vertical directions.

Optionally, in this embodiment, the difference parameters include a transverse difference parameter and a longitudinal difference parameter; the step of determining the image complexity of each target template image based on the difference parameter may include:

and fusing the transverse difference parameters and the longitudinal difference parameters to obtain the image complexity of each target template image.

The transverse difference parameter may specifically be a transverse gradient difference value matrix, and the longitudinal difference parameter may specifically be a longitudinal gradient difference value matrix. Specifically, the transverse gradient difference value matrix may be obtained by performing difference operation on pixel values of pixel points in two adjacent columns of the image matrix of the target template image, and the longitudinal gradient difference value matrix may be obtained by performing difference operation on pixel values of pixel points in two adjacent rows of the image matrix of the target template image.

The fusion method of the transverse difference parameter and the longitudinal difference parameter may be various, and this embodiment does not limit this. For example, a sum of square error operation or mean square error operation may be performed on the horizontal difference parameter and the vertical difference parameter, and the operation result may be used as the image complexity of the target template image.

Specifically, the target template image is an image with N rows and M columns, and the image matrix of the target template image may be denoted as N rows and M columns, the longitudinal gradient difference value matrix is denoted as a1, and the transverse gradient difference value matrix is denoted as a 2. The calculation process of A1, A2 and image complexity is as follows:

1) as shown in fig. 1c, the longitudinal gradient difference value matrix may be obtained by calculating a difference value between two adjacent rows of pixel points of the image matrix, and then making an absolute value on a difference value result. Specifically, the filter 1 may be used to perform convolution operation on the image matrix of the target template image, the effect of which is equivalent to difference operation between two adjacent rows of the image matrix, so as to obtain a matrix of N-1 rows and M columns, and then calculate an absolute value (abs is an absolute value function) for each element of the matrix, so as to obtain a positive integer matrix of N-1 rows and M columns, which is the longitudinal gradient difference value matrix a 1.

2) As shown in fig. 1c, the transverse gradient difference matrix may be obtained by calculating a difference between two adjacent rows of pixel points in the image matrix and then making an absolute value of the difference result. Specifically, the filter 2 may be used to perform convolution operation on the image matrix of the target template image, the effect of which is equivalent to difference operation between two adjacent columns of the image matrix, so as to obtain a matrix of N rows and M-1 columns, and then calculate an absolute value (abs is an absolute value function) for each element of the matrix, so as to obtain a positive integer matrix of N rows and M-1 columns, where the positive integer matrix is the transverse gradient difference value matrix a 2.

Wherein, the filter 1 may be:

the filter 2 may be: [ -11]。

3) After obtaining the longitudinal gradient difference value matrix a1 and the transverse gradient difference value matrix a2, the image complexity may be calculated based on a1 and a 2. In a specific embodiment, the mean value of each of the values of the a1 and a2 matrices may be calculated to obtain AM1 and AM2, i.e., all the values of the a1 and a2 matrices are added, respectively, and then divided by the product of the number of rows and columns of the matrices; the calculation processes of AM1 and AM2 are shown as equation (1) and equation (2), respectively:

wherein A is_i，jDenotes the value of the ith row and jth column in the matrix A1, A_l，kRepresents the value of the kth column of the 1 st row in the matrix A2; the image complexity may specifically be obtained by performing sum and variance operations on AM1 and AM2, and the image complexity is c,

the image complexity is also shown in equation (3):

optionally, the pixel value of this embodiment may also be a luminance value, and the image complexity of the target template image is determined based on the luminance difference of the whole target template image. However, for a target object with a too dark picture, the complexity of the calculated image is too low, and the local complexity of the picture cannot be identified.

103. And determining a detection threshold corresponding to each target object according to the image complexity of each target template image.

Optionally, in this embodiment, the step "determining a detection threshold corresponding to each target object according to the image complexity of each target template image" may include:

determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises the mapping relation between the preset image complexity and the preset detection threshold.

The preset mapping relationship set includes a mapping relationship between a preset image complexity and a preset detection threshold, and it can be understood that the preset mapping relationship set may also be regarded as a function, the preset image complexity is an independent variable, and the preset detection threshold is a dependent variable. In one embodiment, the image complexity is denoted as c, and the detection threshold is denoted as a, then the preset mapping relationship set can be denoted as a ═ f (c).

The mapping relationship may include an inverse mapping relationship, and the inverse mapping relationship may be linear or non-linear, which is not limited in this embodiment. The inverse mapping relationship specifically means that the preset image complexity changes, and the preset detection threshold changes in the opposite direction, and if the preset image complexity increases, the preset detection threshold decreases. The mapping relation set to be in inverse proportion can adapt to more diversified detection scenes, and in some detection scenes, the image complexity of an object to be detected is possibly very different, some textures are simple, and some textures are complex; in the embodiment, the preset detection threshold is larger for the target object with simple texture, and is smaller for the target object with complex texture, and as the target object with simple texture contains fewer features, the non-target object is easy to detect during target detection, and the higher detection threshold can improve the accuracy of target detection; for a target object with a complex texture, the target object itself contains more features and is easily distinguished from a non-target object, so that the requirement on a preset detection threshold value is relatively low.

Optionally, in some embodiments, the mapping relationship between the preset image complexity and the preset detection threshold in the preset mapping relationship set may be set according to an actual situation, which is not limited in this embodiment, for example, the preset detection threshold may be obtained by performing an inverse operation or a polynomial operation on the preset image complexity.

Optionally, in this embodiment, the preset mapping relationship set includes a first sub-mapping relationship set and a second sub-mapping relationship set; the first sub-mapping relation set comprises an inverse mapping relation between a preset image complexity and a preset detection threshold value; the second sub-mapping relation set comprises a fixed mapping relation between a preset image complexity and a preset detection threshold value;

the step of determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relationship set may include:

when the image complexity of a target template image is smaller than a preset complexity, determining a detection threshold value of a target object corresponding to the target template image based on the image complexity of the target template image and a first sub-mapping relation set;

and when the image complexity of the target template image is not less than the preset complexity, determining a detection threshold value of a target object corresponding to the target template image based on the image complexity of the target template image and the second sub-mapping relation set.

The fixed mapping relation in the second sub-mapping relation set specifically means that when the preset image complexity changes in the second sub-mapping relation set, the preset detection threshold is a fixed value and does not change with the change of the preset image complexity.

The preset complexity may be set according to an actual situation, and this embodiment does not limit this. In a specific scene, when the image complexity is not less than the preset complexity, the increase of the image complexity hardly affects the detection threshold, and a too low detection threshold may cause a non-target object to be detected, so that when the image complexity reaches a certain degree (e.g., not less than the preset complexity), the detection threshold may be set to a fixed value and is not reduced with the increase of the image complexity.

In a specific embodiment, the preset complexity may be set to 15, and then the interval of the preset image complexity of the first sub-mapping relation set is 0 to 15, the interval may use a linear threshold method, and the size of the detection threshold is inversely proportional to the image complexity; the preset image complexity interval of the second set of sub-mapping relationships is 15 to infinity, and the detection threshold of the interval may be set to a fixed value, such as 0.68. Specifically, the relationship between the preset image complexity and the preset detection threshold in the preset mapping relationship set may be as shown in equation (4), where c is the preset image complexity, and a is the preset detection threshold:

referring to fig. 1d, it is a graph of the relationship between the image complexity and the detection threshold, where the horizontal axis represents the image complexity value and the vertical axis represents the detection threshold. As can be seen from the figure, the detection threshold and the image complexity are in a linear relationship when the image complexity value is less than 15, and the detection threshold is a constant value when the image complexity is greater than 15.

104. And identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object.

The target template image can slide on the image to be detected, and a plurality of candidate object detection areas of the target object corresponding to the target template image are obtained. The target template image slides on the image to be detected, namely, the image to be detected is traversed, and a plurality of candidate object detection areas with the same size as the target template image are marked on the image to be detected. In some embodiments, the target template image may be scaled to obtain a multi-scale target template image, and then, for the same target object, based on the multi-scale target template image, the scales of the candidate object detection regions obtained by dividing are different.

Optionally, in this embodiment, the step of "identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection regions of each target object" may include:

scaling the target template image corresponding to each target object under different scales to obtain target template images under multiple scales corresponding to each target object;

and identifying the image to be detected based on the target template images under multiple scales to obtain multiple candidate object detection areas of each target object.

The scaling scale may be set according to actual conditions, which is not limited in this embodiment. For example, scaling may be performed in the following scale:

0.8，0.85，0.9，0.95，1，1.05，1.1，1.15，1.2。

the length and the width of the target template image need to be respectively zoomed to the scales of the proportion, and then the target template image slides on the image to be detected based on the zoomed scales to identify the image to be detected. The target template image is zoomed by a multilayer pyramid scale method, so that the problem that the scales of a target object in the image to be detected and the target template image are different can be solved, and the accuracy of target detection is improved.

The multi-layer pyramid scale method is to scale an image in a certain proportion to obtain a series of image sequences with different scales, and linear difference and other methods are generally adopted in the scaling process.

105. And for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object.

In this embodiment, the step of determining at least one preliminary object detection region of the target object from the candidate object detection region according to the similarity between the candidate object detection region and the target template image and the detection threshold corresponding to the target object may include:

calculating the similarity between the candidate object detection area and the target template image;

and taking the candidate object detection area with the similarity larger than the detection threshold value as an initial selection object detection area of the target object.

In some embodiments, the similarity between the candidate object detection region and the target template image may be calculated by image contour detection. The image contour detection specifically includes preprocessing an image (i.e., a candidate object detection region and a target template image), for example, performing smoothing filtering processing by using a two-dimensional Gaussian template to remove image noise; performing edge detection processing on the smoothed image to obtain an edge response image, wherein the edge response image usually relates to available gradient characteristic information of brightness, color and the like which can distinguish an object from a background; and finally, accurately positioning the contour of the edge response image to obtain contour information of the candidate object detection area and contour information of the target template image, and calculating the similarity based on the contour information of the candidate object detection area and the contour information of the target template image.

Alternatively, the similarity between the candidate object detection region and the target template image may be calculated by normalizing the correlation coefficient. The specific similarity calculation formula is shown in formula (5):

t '(x', y ') represents a relative value of each pixel point in the target template image to its mean value (i.e., a pixel value mean value of each pixel point in the target template image), x' and y 'represent positions of the pixel points in the target template image, I' (x + x ', y + y') represents a relative value of each pixel point in the candidate object detection region to its mean value (i.e., a pixel value mean value of each pixel point in the candidate object detection region), x + x 'and y + y' represent positions of the pixel points in the to-be-detected image, x and y can be regarded as positions of reference position points (specifically, vertices of the candidate object detection region) of the candidate object detection region in the to-be-detected image, and R (x, y) represents a correlation coefficient between the candidate object detection region and the target template image.

In other embodiments, feature extraction may be performed on an image of the candidate object detection region to obtain feature information of the candidate object detection region; extracting the characteristics of the target template image to obtain the characteristic information of the target template image; and calculating the feature similarity between the feature information of the candidate object detection area and the feature information of the target template image, and taking the feature similarity as the similarity between the candidate object detection area and the target template image.

The feature extraction process may include a convolution operation and a pooling operation, and specifically, feature extraction may be performed on the candidate object detection area and the target template image through a neural network. The type of the neural Network is not limited, for example, the neural Network may be an open-end model (inclusion), an efficiency Network (efficiency Network), a Visual Geometry Group Network (VGGNet, Visual Geometry Group Network), a Residual Network (ResNet, Residual Network), a Dense connection convolution Network (densneet, Dense connectivity Network), and the like, but it should be understood that the neural Network of the present embodiment is not limited to the above listed types.

The feature information of the candidate object detection area is specifically a feature vector of the candidate object detection area, and the feature information of the target template image is specifically a feature vector of the target template image; the vector distance between the two can be calculated, and the magnitude of the vector distance represents the magnitude of the feature similarity. The larger the vector distance is, the smaller the feature similarity is, and the smaller the similarity between the candidate object detection area and the target template image is; the smaller the vector distance, the greater the feature similarity, and the greater the similarity between the candidate object detection region and the target template image.

Optionally, for a target template image containing text, a method of OCR (Optical Character Recognition) text detection Recognition may also be used to determine the similarity between the candidate object detection area and the target template image, so as to perform matching of the content.

It can be understood that, for each target object, if the candidate object detection region is identified by a multi-scale target template image, the candidate object detection region also includes multiple scales; in performing the similarity calculation between the candidate object detection region and the target template image, the dimensions of the candidate object detection region and the target template image should be the same, that is, the candidate object detection region should perform the similarity calculation with the target template image having the same dimensions.

106. And determining at least one target object detection area of the target object from the primary object detection areas of the target object to obtain at least one target object detection area of each target object.

The primary object detection area obtained by filtering the detection threshold may include a primary object detection area only locally similar to the target template image, and therefore the primary object detection area needs to be further filtered. In some embodiments, it may be further filtered using grid color filtering. The grid color filtering method is to divide the target template image and the primary selection object detection area into a plurality of sub-grid areas, then calculate the sub-area similarity between the sub-grid area (namely the sub-template grid area) in the target template image and the sub-grid area (namely the sub-object detection grid area) corresponding to the position in the primary selection object detection area under each color channel, and determine the target object detection area of the target object from the primary selection object detection area based on the sub-area similarity. The primary selection object detection area is screened by comparing the similarity between the sub-grid area in the target template image and the corresponding sub-grid area in the primary selection object detection area, so that the target template image and the primary selection object detection area can be integrally and comprehensively compared, and the primary selection object detection area which is only locally similar to the target template image is excluded. Wherein, the local similarity is partial similarity, and the whole similarity is not.

Optionally, in this embodiment, the step of "determining at least one target object detection area of the target object from the initially selected object detection area of the target object to obtain at least one target object detection area of each target object" may include:

respectively carrying out grid division on the primary selection object detection area and the target template image aiming at each target object to obtain a plurality of sub-object detection grid areas of the primary selection object detection area and a plurality of sub-template grid areas of the target template image;

calculating the sub-region similarity between a target sub-object detection grid region of the initially selected object detection region and a target sub-template grid region of the target template image, wherein the position of the target sub-object detection grid region corresponds to the position of the target sub-template grid region;

and determining at least one target object detection area of the target object from the primary selection object detection areas of the target object based on the sub-area similarity, so as to obtain at least one target object detection area of each target object.

The mesh division manner may be set according to an actual situation, which is not limited in this embodiment. However, it should be noted that the mesh division modes of the target template image and the primary selection object detection area should be consistent, the number of the sub-object detection mesh areas and the sub-template mesh areas obtained by dividing the target template image and the primary selection object detection area is the same, and the positions of the sub-object detection mesh areas and the sub-template mesh areas correspond to each other. In the above embodiments, the plurality of the fingers means two or more.

In a specific embodiment, the target template image and the primary object detection area may be trisected in terms of the number of rows and columns, and divided into 3 × 3 small spaces, then four small spaces at the upper left corner are merged into one sub-grid area, four small spaces at the lower left corner are merged into one sub-grid area, four small spaces at the upper right corner are merged into one sub-grid area, and four small spaces at the lower right corner are merged into one sub-grid area, as shown in fig. 1e, so as to obtain four sub-template grid areas at the upper left, upper right, lower left, and lower right of the target template image, and four sub-object detection grid areas at the upper left, upper right, lower left, and lower right of the primary object detection area.

The position of the target sub-object detection grid region corresponds to the position of the target sub-template grid region, and specifically, the sub-region similarity calculation may be performed between the sub-object detection grid region at the upper left corner and the sub-template grid region at the upper left corner, and the sub-region similarity calculation may be performed between the sub-object detection grid region at the lower right corner and the sub-template grid region at the lower right corner. The target sub-object detection grid area refers to a certain sub-object detection grid area in the plurality of sub-object detection grid areas, and the target sub-template grid area refers to a certain sub-template grid area in the plurality of sub-template grid areas.

Optionally, in some embodiments, the sub-region similarity between the target sub-object detection grid region and the target sub-template grid region is calculated, specifically, the feature information of the target sub-object detection grid region may be extracted, the feature information of the target sub-template grid region is extracted, and the similarity between the feature information of the target sub-object detection grid region and the feature information of the target sub-template grid region is calculated, where the similarity may be used as the sub-region similarity between the target sub-object detection grid region and the target sub-template grid region. Wherein the feature information may be extracted through a neural network.

Optionally, in some embodiments, the step of "calculating a sub-region similarity between the target sub-object detection grid region of the preliminary selection object detection region and the target sub-template grid region of the target template image" may include:

calculating a first pixel mean value of a target sub-object detection grid area of the primary selection object detection area under each color channel;

calculating a second pixel mean value of a target sub-template grid area of the target template image under each color channel;

and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

The step of calculating the first pixel mean value of the target sub-object detection grid region of the initially selected object detection region in each color channel may specifically be calculating the mean values of pixel values of all pixel points in the target sub-object detection grid region in a red channel, a blue channel and a green channel, respectively, where the mean value is the first pixel mean value. Optionally, in some embodiments, the first pixel mean value may also be a pixel value mean value of all pixel points in the target sub-object detection network region, that is, the pixel value mean values of the pixel points under all color channels do not distinguish the color channels.

Similarly, the step of "calculating the second pixel mean value of the target sub-template grid region of the target template image in each color channel" may specifically be calculating the mean values of the pixel values of all the pixel points in the target sub-template grid region in the red channel, the blue channel and the green channel, respectively, where the mean value is the second pixel mean value. Optionally, in some embodiments, the second pixel mean value may also be a pixel value mean value of all pixel points in the target sub-template grid region, that is, the pixel value mean values of the pixel points under all color channels do not distinguish the color channels.

The step of calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value may specifically include:

and aiming at each color channel, calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image under the color channel based on the first pixel mean value and the second pixel mean value under the color channel.

The color channels may include a red color channel, a blue color channel, and a green color channel, among others. In this embodiment, the pixel mean values (i.e., first pixel mean values) of each sub-object detection grid region under three color channels can be calculated, the pixel mean values (i.e., second pixel mean values) of each sub-template grid region under three color channels can be calculated, and the sub-region similarity between the first pixel mean values of the sub-object detection grid regions and the second pixel mean values of the sub-template grid regions corresponding to the positions can be calculated based on the first pixel mean values of the sub-object detection grid regions and the second pixel mean values of the sub-template grid regions corresponding to the positions.

The sub-region similarity may include sub-region similarity in a blue channel, sub-region similarity in a red channel, sub-region similarity in a green channel, sub-region similarity in all color channels, and the like. The sub-region similarity under the blue channel is obtained by calculating a first pixel mean value of the target sub-object detection grid region under the blue channel and a second pixel mean value of the target sub-template grid region under the blue channel, and the other similar reasons are similar.

Specifically, the calculation method of the pixel mean value under each color channel may be represented by the following equation:

pixel mean under red channel:

pixel mean under blue channel:

pixel mean at green channel:

pixel mean for all color channels:

average_mean＝(light_r+light_b+light_g)/3

wherein P and Q are the number of rows and columns of the sub-grid area, the value range of P is more than 0 and less than or equal to P, P is an integer, the value range of Q is more than 0 and less than or equal to Q, Q is an integer, R is_p，qRepresenting the pixel value, B, of a pixel point (p, q) in the red channel_p，qRepresenting the pixel value, G, of a pixel point (p, q) in the blue channel_p，qAnd (3) representing the pixel value of the pixel point (p, q) under the green channel.

The similarity calculation of the sub-regions can be shown as equation (6):

wherein light_tSecond pixel mean, light, representing the sub-template grid area in the target template image under all/a certain color channel_cRepresenting the first pixel mean value of the sub-object detection grid area under all/a certain color channel in the initially selected object detection area. light (light)_tAnd light_cThe color channels of (1) are the same, i.e. if light_tLight is the second pixel mean value of the sub-template grid area under the blue channel_cThe first pixel mean value of the grid area under the blue channel must be detected for the sub-object. light (light)_tAnd light_cThe color channels are the same, so that the accuracy of calculating the similarity of the sub-regions is ensured.

In the step "determining at least one target object detection region of the target object from the initially selected object detection region of the target object based on the sub-region similarity to obtain at least one target object detection region of each target object", there are various methods for selecting a target object detection region based on the sub-region similarity, which is not limited in this embodiment. Specifically, the preset condition selected as the target object detection area may be that the sub-area similarity of all sub-object detection grid areas of the initially selected object detection area and the sub-template grid area corresponding to the position in the target template image in each color channel is higher than a preset threshold, and the preset threshold may be set according to an actual situation, which is not limited in this embodiment. It is understood that the preset condition selected as the target object detection area may also be set according to the actual situation.

For example, the preset condition selected as the target object detection area may be: r _ precision is not less than 0.83, g _ precision is not less than 0.83, b _ precision is not less than 0.83 and average _ precision is not less than 0.8, when one sub-object detection grid area in the primary object detection area does not meet the preset condition, the primary object detection area cannot be used as the target object detection area.

Wherein, r _ similarity is the sub-region similarity of the target sub-object detection grid area and the target sub-template grid area under the red channel, g _ similarity is the sub-region similarity of the target sub-object detection grid area and the target sub-template grid area under the green channel, b _ similarity is the sub-region similarity of the target sub-object detection grid area and the target sub-template grid area under the blue channel, and average _ similarity is the sub-region similarity of the target sub-object detection grid area and the target sub-template grid area under all color channels.

In a specific scenario, it is noted that the number of lines and columns of the target template image are K and L, respectively, and then the number of lines and columns of the primary object detection area are also K and L, according to the grid division method in the above embodiment, the target template image and the primary object detection area are divided into four sub-grid areas of upper left, upper right, lower left and lower right, respectively (the target template image corresponds to the sub-template grid area, and the primary object detection area corresponds to the sub-object detection grid area), and the sub-grid areas at four positions are respectively denoted as S1, S2, S3 and S4, where S1 denotes the sub-grid area at the upper left corner, S2 denotes the sub-grid area at the upper right corner, S3 denotes the sub-grid area at the lower left corner, S4 denotes the sub-grid area at the lower right corner, and as shown in fig. 1e, the sub-grid areas at the lower right corner can be represented by the following formula (7) (8) (9) (:

S1＝(0≤row＜K*2/3，0≤col＜L*2/3) (7)

S2＝(0≤row＜K*2/3，L*1/3≤col≤L) (8)

S3＝(K*1/3＜row≤K，0≤col＜L*2/3) (9)

S4＝(K*1/3＜row≤K，L*1/3≤col≤L) (10)

where row is the range of the number of rows in the sub-grid region, col is the range of the number of columns in the sub-grid region, and 0 ≦ row < K × 2/3 indicates that the rows in the sub-grid region are truncated from row 0 to row K × 2/3 in the target template image or the preliminary selection object detection region.

In this embodiment, the initially selected object detection region may be filtered by using the 2/3 4-mesh filtering method in the above embodiment, and the initially selected object detection region with local color difference is filtered out, so that the fault tolerance rate is high in the aspect of partitioning, and a strong filtering method may be used in the aspect of result discrimination (in the above embodiment, any sub-object detection mesh region is not allowed to satisfy the preset condition). Alternatively, a fine grid division may be used, such as dividing the target template image and the primary object detection region into 8 × 8 small spaces (finer than the 3 × 3 division of the 4-grid filtering method of 2/3 in the above embodiment), so that the fault tolerance rate is low in the division aspect, and a weak filtering method (such as allowing a certain percentage of the sub-object detection grid regions not to satisfy the preset condition) may be used in the result discrimination aspect.

Optionally, in this embodiment, the step "determining at least one target object detection area of the target object from the primary object detection area of the target object" may include:

selecting a primary selection object detection area with the highest similarity with the target template image in the primary selection object detection areas as a candidate target object detection area of the target object aiming at each target object;

selecting a candidate target object detection area corresponding to the target object from the primary object detection area based on the distance between the candidate target object detection area and the primary object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the primary object detection area;

and determining at least one target object detection area of the target object according to the candidate target object detection area.

The distance may be an Intersection and Intersection ratio (IOU) between the candidate target detection region and the initial selection target detection region, or a distance between a position reference point of the candidate target detection region and a position reference point of the initial selection target detection region, and it should be noted that the selection manners of the position reference points of the candidate target detection region and the initial selection target detection region should be the same, for example, the position reference point may be an upper left vertex of the candidate target detection region or an upper left vertex of the initial selection target detection region.

Specifically, all the candidate target object detection regions may be used as the target object detection regions of the target object, or the candidate target object detection regions may be further screened to obtain the target object detection regions of the target object, which is not limited in this embodiment, where the further screening is not limited in manner.

Optionally, in this embodiment, the step "selecting, from the preliminary selection object detection area, a candidate target object detection area corresponding to the target object based on the distance between the candidate target object detection area and the preliminary selection object detection area" may include:

taking the initially selected object detection area with the distance from the candidate target object detection area larger than a preset distance threshold value as a reference object detection area;

and selecting a reference object detection area with the highest similarity with the target template image from the reference object detection areas as a candidate target object detection area corresponding to the target object.

Wherein, the similarity is also the similarity calculated in the step 105. The preset distance threshold may be set according to actual situations, and this embodiment does not limit this.

The step of "selecting a reference object detection region with the highest similarity to the target template image from the reference object detection regions as a candidate target object detection region corresponding to the target object" may include:

selecting a reference object detection area with the highest similarity with the target template image from the reference object detection areas as candidate target object detection areas corresponding to the target object, and taking each reference object detection area as a new primary selection object detection area;

and returning to execute the step of taking the primarily selected object detection area with the distance to the candidate target object detection area larger than the preset distance threshold as the reference object detection area until the number of the candidate target object detection areas meets the preset number.

In this embodiment, the candidate object detection region with a very low similarity may be filtered by a dynamic threshold to obtain a primary object detection region, and the primary object detection region may be further filtered by a greedy Non-maximum Suppression (NMS) method to improve the accuracy of target detection. Non-maxima suppression is a process of finding local maxima.

The greedy non-extremum inhibition method can be used to remove the detection region with high overlap ratio or the detection region with close peripheral distance. Specifically, the greedy method may obtain a two-dimensional matrix having the same size as an image matrix of the image to be detected, where each element in the two-dimensional matrix may represent a candidate object detection region corresponding to a position, and a position of the candidate object detection region in the two-dimensional matrix is determined by a position reference point of the candidate object detection region in the area to be detected, for example, the position reference point may specifically be a vertex (a pixel point a) at an upper left corner of the candidate object detection region, and an element at a corresponding position of the pixel point a in the two-dimensional matrix represents the candidate object detection region.

In an embodiment, for each scale of target template image of each target object, if the position reference point is the top left vertex of the candidate object detection area, each point (element) in the two-dimensional matrix represents the similarity value between a rectangular frame (rectangular frame, i.e. candidate object detection area) and its corresponding target template image, starting from the point, whose length and width are the same as the length and width of the target template image, respectively, and this two-dimensional matrix is a similarity map. That is, the candidate detection region may be represented by an array of 1 × 3, where one value represents the length of the candidate detection region, one value represents the width of the candidate detection region, and one value represents the similarity between the candidate detection region and its corresponding target template image.

Specifically, in the case that the primary selection object detection area is obtained by multi-scale target template image recognition, the screening process may be as follows:

1) determining the number num of target object detection areas to be acquired and a preset distance threshold;

2) and generating a three-channel similarity map according to the target template image of each scale, wherein the three channels are the similarity between the candidate object detection region and the target template image and the length and width of the candidate object detection region respectively. Each position in the similarity map represents the similarity of a rectangular frame centered on the position, the rectangular length, and the rectangular width. Comparing the similarity maps under each scale, removing all rectangular frames with non-maximum similarity at the same position aiming at corresponding elements at the same position in the similarity maps of each scale, and only keeping the elements (or rectangular frames) with the maximum similarity to obtain a unique three-channel target similarity map; in some embodiments, the corresponding value of the candidate object detection region of the non-primary object detection region in the similarity map may be set to zero, and the screening processes 1) to 4) are equivalent to further screening the primary object detection region by a non-extremum inhibition method;

3) finding the maximum value in the target similarity map, taking the detection frame with the highest matching degree as an election frame (namely a candidate target object detection area), adding the coordinates of the row number and the column number of the election frame to an election frame list, and deleting the rectangular frames with the distance from the election frame being smaller than a preset distance threshold value, namely, setting the corresponding numerical values of the rectangular frames in the target similarity map to zero, or deleting the rectangular frames crossed with the election frame;

4) and judging that if the number of the election frames is less than num and the target similarity map has a value other than 0, returning to the step 3, otherwise, ending the operation, wherein the election frame list comprises the position information of the candidate target object detection area selected and extracted.

According to the greedy non-maximum suppression method, the rectangular frame with the maximum global matching degree can be obtained in a circulating mode, the rectangular frame crossed with the rectangular frame with the maximum global matching degree is deleted, and finally 0 or a plurality of local optimal solutions are obtained.

In one embodiment, for example, when detecting the position of the target object in the game screen, the 2/3 grid method, the 1/2 grid method, the whole comparison method (i.e. not dividing the grid) and other methods such as the whole comparison method (i.e. the grid) can be used for comparison, and specifically, the comparison experiment can be performed on the f1 index when the iou (intersection ratio) threshold is 0.75, and the data of the experiment results are shown in the following table:

wherein the indication of the collusion uses this method. f1 is used as a comprehensive evaluation index, the higher the f1 value is, the better the target detection effect is, and the calculation method of f1 is shown in the following formula (11), (12) and (13):

wherein, P represents precision rate, R represents recall rate, TP represents true positive, FP represents false positive, and FN represents false negative. Specifically, true and false indicate whether the label box is the target object to be detected, and positive and negative indicate whether the target detection algorithm detects the label box, for example, true positive indicates that the target detection algorithm detects the target object to be detected.

The multi-scale matching of the header is a multi-layer pyramid scale method. "complexity filtering" refers to a method of setting a detection threshold based on image complexity. "grid filtering" refers to the grid color filtering method of the embodiment in step 106 described above. The "overlap removal" is to remove the primary selection object detection region having a high degree of overlap.

Specifically, the "feature matching" of the header refers to a feature matching method, the feature matching method is to extract the multi-level pyramid feature points of the target template image and the image to be detected respectively, then match the feature points of the target template image and the image to be detected, and take the region with the matching degree higher than the specified threshold as the detection result.

Specifically, the "template matching" of the header refers to a template matching method, the template matching method is to calculate the similarity of the target template image and the screenshot of the sliding window at each position of the picture to be detected by using a sliding window picture-to-picture comparison method, and the area with the similarity higher than a specified threshold value is used as a detection result.

As can be seen from the table, the template matching method is superior to the feature matching (A group and B group are used as comparison) for the target detection effect; 2/3 grid filtration outperformed 1/2 grids (group D and group E were used as controls); linear filtration was superior to binary filtration (group E and group F were used as controls); the parameter-adjusted linear parameter threshold outperformed the linear filtering (group G and group H were used as controls). The binary filtering is a binary threshold method, which is a method in which a fixed detection threshold value is used and the detection threshold value does not change with the complexity of an image.

The target detection method of the application obtains a better detection effect under the condition of not needing any training data. In the experiment of comprehensive matching of various targets, the effect of the scheme of the application is far higher than that of a method for simply using template matching and a method for using feature matching, and the use cost of resources and training is far lower than that of a detection method using deep learning.

The target detection method provided by the application can dynamically adjust the detection threshold value based on the image complexity of the target object, cover the target objects with different image complexity types, adapt to various types of target detection scenes, further improve the accuracy of target detection, and obtain a target object detection area including the target object. In addition, a plurality of candidate object detection areas of the target object to be detected can be obtained based on the target template images under different scales, and at least one target object detection area can be determined from the candidate object detection areas. In some specific embodiments, the image to be detected includes target objects of multiple scales, and the accuracy of target detection can be improved based on a multi-scale target template image, for example, the target object to be detected is a "goldfish", the image to be detected includes goldfishes of different sizes, target object detection regions of different scales can be detected by a multi-scale method, and each target object detection region includes a target object of a corresponding scale, namely, a "goldfish".

As can be seen from the above, the electronic device of this embodiment can determine at least one target object to be detected in the image to be detected, and acquire a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object; for each target object, determining an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold corresponding to the target object; and determining a target object detection area of the target object from the primary object detection area of the target object to obtain a target object detection area of each target object. According to the embodiment of the application, a large amount of training is not needed, manpower and material resources are saved, the detection threshold value of the target object is determined based on the image complexity, the detection threshold value is not fixed and unchangeable, the method and the device can adapt to the detection scene with various types of targets, and the accuracy of target detection can be improved.

The method described in the previous embodiment will be described in further detail below with the target detection device specifically integrated in the server.

An embodiment of the present application provides a target detection method, as shown in fig. 2a, a specific process of the target detection method may be as follows:

201. the server determines at least one target object to be detected in the image to be detected and acquires a target template image corresponding to each target object.

The target object can be various types of targets to be recognized, such as various types of small icons, simple targets, characters, buttons, large icons, complex targets, and the like. The target template image can be used to identify a target object in the image to be detected, which can be regarded as a standard image containing the target object.

202. And the server determines the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image.

203. The server determines a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between the preset image complexity and a preset detection threshold, and the mapping relation comprises an inverse mapping relation.

The inverse mapping relationship may be linear or non-linear, and this embodiment does not limit this. The inverse mapping relationship specifically means that the preset image complexity changes, and the preset detection threshold changes in the opposite direction, and if the preset image complexity increases, the preset detection threshold decreases.

204. And the server identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object.

The target template image can slide on the image to be detected, and a plurality of candidate object detection areas of the target object corresponding to the target template image are obtained. The target template image slides on the image to be detected, namely, the image to be detected is traversed, and a plurality of candidate object detection areas with the same size as the target template image are marked on the image to be detected.

205. And the server determines an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold corresponding to the target object aiming at each target object.

In this embodiment, the step of "determining an initial object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and the detection threshold corresponding to the target object" may include:

206. And the server selects the primary selection object detection area with the highest similarity with the target template image in the primary selection object detection areas as the candidate target object detection area of the target object aiming at each target object.

207. The server selects a candidate target object detection area corresponding to the target object from the primary object detection area based on the distance between the candidate target object detection area and the primary object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the primary object detection area; and determining at least one target object detection area of the target object according to the candidate target object detection area.

In a specific embodiment, the position of the target object in the game screen needs to be detected, and the target detection method provided by this embodiment may be used to determine the detection threshold of the target object first and then perform target detection. As shown in fig. 2b, the image complexity of the target template image corresponding to the target object may be determined first, and then the detection threshold may be determined based on the image complexity, where the detection thresholds corresponding to different image complexities are different. And then, identifying the image to be detected based on the target template image under the multi-scale to obtain a multi-scale candidate object detection area, screening based on a detection threshold to obtain an initial object detection area, and further screening by a greedy non-maximum suppression method and a grid color filtering method to obtain a target object detection area, which is a final detection result as shown in fig. 2 c.

The method and the device can solve the problem of lack of data in deep learning training by using a target template image and a shallow pixel texture matching technology of a picture to be detected, and solve the problem of multi-type target objects by using a complexity calculation result based on the target template image as a coefficient of a matching threshold; the method solves the problem of target objects of different scales of multiple models of scenes by using a spatial multi-scale matching method.

As can be seen from the above, in this embodiment, at least one target object to be detected in an image to be detected may be determined by a server, and a target template image corresponding to each target object is obtained; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set, wherein the preset mapping relation set comprises a mapping relation between the preset image complexity and a preset detection threshold, and the mapping relation comprises an inverse mapping relation; the server identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; for each target object, determining an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold corresponding to the target object; selecting a primary selection object detection area with the highest similarity with the target template image in the primary selection object detection areas as a candidate target object detection area of the target object aiming at each target object; selecting a candidate target object detection area corresponding to the target object from the primary object detection area based on the distance between the candidate target object detection area and the primary object detection area, wherein the distance represents the overlapping degree of the candidate target object detection area and the primary object detection area; and determining at least one target object detection area of the target object according to the candidate target object detection area. According to the embodiment of the application, a large amount of training is not needed, manpower and material resources are saved, the detection threshold value of the target object is determined based on the image complexity, the detection threshold value is not fixed and unchangeable, the method and the device can adapt to the detection scene with various types of targets, and the accuracy of target detection can be improved.

In order to better implement the above method, an embodiment of the present application further provides an object detection apparatus, as shown in fig. 3a, the object detection apparatus may include a determination unit 301, a complexity determination unit 302, a threshold determination unit 303, an identification unit 304, a preliminary selection determination unit 305, and an object determination unit 306, as follows:

(1) a determination unit 301;

the determining unit 301 is configured to determine at least one target object to be detected in the image to be detected, and acquire a target template image corresponding to each target object.

(2) A complexity determination unit 302;

the complexity determining unit 302 is configured to determine the image complexity of each target template image based on the pixel value difference between the pixels in each target template image.

Optionally, in some embodiments of the present application, the complexity determining unit 302 may include a first determining subunit 3021 and a second determining subunit 3022, see fig. 3b, as follows:

the first determining subunit 3021 is configured to determine at least two types of difference parameters of each target template image based on a difference in pixel values between pixels in each target template image;

a second determining subunit 3022, configured to determine an image complexity of each target template image based on the difference parameter.

Optionally, in some embodiments of the present application, the difference parameter includes a lateral difference parameter and a longitudinal difference parameter; the second determining subunit 3022 may be specifically configured to fuse the horizontal difference parameter and the vertical difference parameter to obtain the image complexity of each target template image.

(3) A threshold value determination unit 303;

a threshold determining unit 303, configured to determine a detection threshold corresponding to each target object according to the image complexity of each target template image.

Optionally, in some embodiments of the application, the threshold determining unit 303 may be specifically configured to determine the detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relationship set, where the preset mapping relationship set includes a mapping relationship between the preset image complexity and a preset detection threshold.

the threshold determining unit 303 may comprise a third determining sub-unit 3031 and a fourth determining sub-unit 3032, see fig. 3c, as follows:

the third determining subunit 3031 is configured to determine, when the image complexity of the target template image is smaller than a preset complexity, a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the first sub-mapping relationship set;

a fourth determining subunit 3032, configured to determine, when the image complexity of the target template image is not less than the preset complexity, a detection threshold of the target object corresponding to the target template image based on the image complexity of the target template image and the second sub-mapping relationship set.

(4) An identification unit 304;

and the identifying unit 304 is configured to identify the image to be detected according to the target template image, so as to obtain a plurality of candidate object detection areas of each target object.

Optionally, in some embodiments of the present application, the identifying unit 304 may include a scaling subunit 3041 and an identifying subunit 3042, see fig. 3d, as follows:

the scaling subunit 3041 is configured to scale the target template image corresponding to each target object at different scales to obtain target template images at multiple scales corresponding to each target object;

the identifying subunit 3042 is configured to identify the image to be detected based on the target template images under multiple scales, so as to obtain multiple candidate object detection areas of each target object.

(5) A primary election determination unit 305;

a preliminary selection determining unit 305, configured to determine, for each target object, at least one preliminary selection object detection area of the target object from the candidate object detection areas according to a similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object.

(6) A target determination unit 306;

a target determining unit 306, configured to determine at least one target object detection area of the target object from the initially selected object detection areas of the target object, to obtain at least one target object detection area of each target object.

Optionally, in some embodiments of the present application, the target determination unit 306 may comprise a dividing subunit 3061, a calculating subunit 3062, and a fifth determination subunit 3063, see fig. 3e, as follows:

the dividing unit 3061 is configured to perform grid division on the primary selection object detection area and the target template image respectively for each target object, so as to obtain a plurality of sub-object detection grid areas of the primary selection object detection area and a plurality of sub-template grid areas of the target template image;

a calculation subunit 3062, configured to calculate a sub-region similarity between a target sub-object detection grid region of the initially selected object detection region and a target sub-template grid region of the target template image, where a position of the target sub-object detection grid region corresponds to a position of the target sub-template grid region;

a fifth determining subunit 3063, configured to determine, based on the sub-region similarity, at least one target object detection region of the target object from the initially selected object detection regions of the target object, to obtain at least one target object detection region of each target object.

Optionally, in some embodiments of the present application, the calculating subunit 3062 may be specifically configured to calculate a first pixel mean value of the target sub-object detection grid area of the primary selection object detection area in each color channel; calculating a second pixel mean value of a target sub-template grid area of the target template image under each color channel; and calculating the sub-region similarity between the target sub-object detection grid region of the initially selected object detection region and the target sub-template grid region of the target template image based on the first pixel mean value and the second pixel mean value.

Optionally, in some embodiments of the present application, the target determination unit 306 may comprise a first selection sub-unit 3064, a second selection sub-unit 3065 and a sixth determination sub-unit 3066, see fig. 3f, as follows:

the first selecting subunit 3064 is configured to, for each target object, select, as a candidate target object detection area of the target object, a primary object detection area with the highest similarity to the target template image in the primary object detection areas;

a second selecting subunit 3065, configured to select, based on a distance between the candidate target object detection area and the primary selected object detection area, a candidate target object detection area corresponding to the target object from the primary selected object detection area, where the distance represents an overlapping degree of the candidate target object detection area and the primary selected object detection area;

a sixth determining subunit 3066, configured to determine at least one target object detection area of the target object based on the candidate target object detection areas.

Optionally, in some embodiments of the present application, the second selecting subunit 3065 may be specifically configured to use a primarily selected object detection area whose distance from the candidate target object detection area is greater than a preset distance threshold as a reference object detection area; and selecting a reference object detection area with the highest similarity with the target template image from the reference object detection areas as a candidate target object detection area corresponding to the target object.

As can be seen from the above, in this embodiment, the determining unit 301 determines at least one target object to be detected in the image to be detected, and obtains a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between the pixel points in each target template image through a complexity determining unit 302; determining a detection threshold corresponding to each target object by a threshold determining unit 303 according to the image complexity of each target template image; the identification unit 304 identifies the image to be detected according to the target template image to obtain a plurality of candidate object detection areas of each target object; determining, by the primary selection determining unit 305, for each target object, at least one primary selection object detection area of the target object from the candidate object detection areas according to a similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object; at least one target object detection area of the target object is determined from the initially selected object detection areas of the target object by the target determining unit 306, and at least one target object detection area of each target object is obtained. According to the embodiment of the application, a large amount of training is not needed, manpower and material resources are saved, the detection threshold value of the target object is determined based on the image complexity, the detection threshold value is not fixed and unchangeable, the method and the device can adapt to the detection scene with various types of targets, and the accuracy of target detection can be improved.

An electronic device according to an embodiment of the present application is further provided, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to an embodiment of the present application, specifically:

the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, if the electronic device is a terminal, it may further include a display unit and the like, which are not described herein again. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

determining at least one target object to be detected in an image to be detected, and acquiring a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object; for each target object, determining at least one primary object detection area of the target object from the candidate object detection areas according to the similarity between the candidate object detection areas and the target template image and a detection threshold corresponding to the target object; and determining at least one target object detection area of the target object from the primary object detection areas of the target object to obtain at least one target object detection area of each target object.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, the present embodiment can determine at least one target object to be detected in an image to be detected, and obtain a target template image corresponding to each target object; determining the image complexity of each target template image based on the pixel value difference between pixel points in each target template image; determining a detection threshold corresponding to each target object according to the image complexity of each target template image; according to the target template image, identifying the image to be detected to obtain a plurality of candidate object detection areas of each target object; for each target object, determining an initial selection object detection area of the target object from the candidate object detection area according to the similarity between the candidate object detection area and the target template image and a detection threshold corresponding to the target object; and determining a target object detection area of the target object from the primary object detection area of the target object to obtain a target object detection area of each target object. According to the embodiment of the application, a large amount of training is not needed, manpower and material resources are saved, the detection threshold value of the target object is determined based on the image complexity, the detection threshold value is not fixed and unchangeable, the method and the device can adapt to the detection scene with various types of targets, and the accuracy of target detection can be improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the object detection methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any target detection method provided in the embodiments of the present application, beneficial effects that can be achieved by any target detection method provided in the embodiments of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the object detection aspect described above.

The foregoing describes in detail a target detection method, an apparatus, an electronic device, and a storage medium provided in the embodiments of the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of object detection, comprising:

2. The method according to claim 1, wherein said identifying the image to be detected according to the target template image to obtain a plurality of candidate object detection regions of each target object comprises:

3. The method of claim 1, wherein determining at least one target object detection area of the target object from the preliminary object detection areas of the target object, resulting in at least one target object detection area of each target object, comprises:

4. The method of claim 3, wherein the calculating the sub-region similarity between the target sub-object detection grid region of the preliminary selected object detection region and the target sub-template grid region of the target template image comprises:

5. The method of claim 1, wherein determining the image complexity of each target template image based on pixel value differences between pixels in each target template image comprises:

6. The method of claim 5, wherein the difference parameters include a lateral difference parameter and a longitudinal difference parameter; the determining the image complexity of each target template image based on the difference parameters comprises:

7. The method of claim 1, wherein determining the detection threshold corresponding to each target object according to the image complexity of each target template image comprises:

8. The method of claim 7, wherein the preset set of mapping relationships comprises a first set of sub-mapping relationships and a second set of sub-mapping relationships; the first sub-mapping relation set comprises an inverse mapping relation between a preset image complexity and a preset detection threshold value; the second sub-mapping relation set comprises a fixed mapping relation between a preset image complexity and a preset detection threshold value;

the determining a detection threshold corresponding to each target object based on the image complexity of each target template image and a preset mapping relation set includes:

9. The method of claim 1, wherein determining at least one target object detection area of the target object from among the preliminary object detection areas of the target object comprises:

10. The method of claim 9, wherein the selecting a candidate target object detection area corresponding to the target object from the preliminary object detection area based on the distance between the candidate target object detection area and the preliminary object detection area comprises:

11. An object detection device, comprising:

12. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the object detection method according to any one of claims 1 to 10.

13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the object detection method of any one of claims 1 to 10.