CN112668577A

CN112668577A - Method, terminal and device for detecting target object in large-scale image

Info

Publication number: CN112668577A
Application number: CN202011628816.0A
Authority: CN
Inventors: 陶大鹏; 王林飞
Original assignee: Yunnan United Visual Technology Co ltd
Current assignee: Yunnan United Visual Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-16

Abstract

The application is applicable to the technical field of image processing, and provides a method, a terminal and a device for detecting a target object in a large-scale image, wherein the method comprises the following steps: segmenting the target area image to obtain a key subgraph containing the interested target; performing primary image recognition on the key subgraph, and acquiring the relative position of the target of interest in the key subgraph; according to the relative position, aggregating the first interested target positioned at the edge of the key subgraph to obtain an aggregated image; performing secondary image detection on the aggregated image to obtain a first detection object; and mapping the first detection object to the target area image to acquire a target object in the target area image. The scheme can realize accurate and efficient target detection on the large-scale image.

Description

Method, terminal and device for detecting target object in large-scale image

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a method, a terminal and a device for detecting a target object in a large-scale image.

Background

Deep learning has achieved breakthrough in the field of target detection, but detecting tiny objects on large-scale images remains a challenging problem.

For example, in most remote sensing application scenes, an image to be detected is oversized, a remote sensing image which is not processed manually often comprises tens of millions or even hundreds of millions of pixel units, a target only occupies few pixels in a picture, and a detector suitable for the small-sized remote sensing image cannot work on the large-sized remote sensing image directly.

In the prior art, a method for detecting a target object by image cutting exists, but under a common condition, a target in a large-scale image is often multi-scale, a small target only has dozens of pixels, the size of the large target even exceeds the size of a normal detection picture, and a proper cutting proportion cannot be found to simultaneously meet the detection of the large target and the small target.

Therefore, the large-size characteristic of the large-scale image and the multi-scale characteristic of the target to be detected in the large-scale image cause that the existing detection method is difficult to realize accurate and efficient target detection on the large-scale image.

Disclosure of Invention

The embodiment of the application provides a method, a terminal and a device for detecting a target object in a large-scale image, and aims to solve the problem that accurate and efficient target detection is difficult to realize on the large-scale image by the existing detection method due to the large-size characteristic of the large-scale image and the multi-scale characteristic of a target to be detected in the large-scale image in the prior art.

A first aspect of an embodiment of the present application provides a method for detecting a target object in a large-scale image, including:

segmenting the target area image to obtain a key subgraph containing the interested target;

performing primary image recognition on the key subgraph, and acquiring the relative position of the target of interest in the key subgraph;

according to the relative position, aggregating the first interested target positioned at the edge of the key subgraph to obtain an aggregated image;

performing secondary image detection on the aggregated image to obtain a first detection object;

and mapping the first detection object to the target area image to acquire a target object in the target area image.

A second aspect of the embodiments of the present application provides a device for detecting a target object in a large-scale image, including:

the image segmentation module is used for segmenting the target area image to obtain a key subgraph containing the interested target;

the first image recognition module is used for carrying out primary image recognition on the key subgraph and acquiring the relative position of the target of interest in the key subgraph;

the image aggregation module is used for aggregating the first interested target positioned at the edge of the key subgraph according to the relative position to obtain an aggregated image;

the second image identification module is used for carrying out secondary image detection on the aggregated image to obtain a first detection object;

and the acquisition module is used for mapping the first detection object to the target area image and acquiring a target object in the target area image.

A third aspect of embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, which, when executed by a processor, performs the steps of the method according to the first aspect.

A fifth aspect of the present application provides a computer program product, which, when run on a terminal, causes the terminal to perform the steps of the method of the first aspect described above.

As can be seen from the above, in the embodiment of the present application, a key sub-image including an interested target is obtained by segmenting an image of a target region, primary image recognition is performed on the key sub-image, a relative position of the interested target in the key sub-image is obtained, a first interested target located at an edge of the sub-image in the key sub-image is aggregated according to the relative position to obtain an aggregated image, secondary image detection is performed on the aggregated image to obtain a first detection object, the first detection object is mapped to the image of the target region to obtain a target object in the image of the target region, the process implements identification of the interested target in the image by the primary image recognition, aggregates the interested target after the identification, and performs secondary detection on the aggregated image after the image aggregation to ensure detection accuracy and integrity of the detected target object, accurate and efficient target detection is achieved for large-scale images.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a first flowchart of a method for detecting a target object in a large-scale image according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a key subgraph screening process provided in the embodiment of the present application;

FIG. 3 is a schematic diagram of a process for detecting a target object in a large-scale image according to an embodiment of the present disclosure;

fig. 4 is a flowchart ii of a method for detecting a target object in a large-scale image according to an embodiment of the present application;

fig. 5 is a flowchart three of a method for detecting a target object in a large-scale image according to an embodiment of the present application;

fig. 6 is a structural diagram of a device for detecting a target object in a large-scale image according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In particular implementations, the terminals described in embodiments of the present application include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a first flowchart of a method for detecting a target object in a large-scale image according to an embodiment of the present application. As shown in fig. 1, a method for detecting a target object in a large-scale image includes the following steps:

step 101, segmenting the target area image to obtain a key subgraph containing the interested target.

The target area image is an image obtained by shooting a target area, and the image is specifically a remote sensing image.

When the target detection is performed on the target area image, the image needs to be cut into several small-sized sub-images, and the sub-images can have the same size. The sub-images can include overlapped image parts, a fixed sub-image overlapping proportion is used for further ensuring the integrity of the target to be detected, or the overlapped image parts are not included, so that the integrity of the target to be detected is ensured through subsequent clustering and secondary detection of the interested region. The subgraphs comprise key subgraphs containing the interesting object and non-key subgraphs not containing the interesting object. Here, the result of image segmentation needs to be screened to obtain a key subgraph, so as to perform subsequent target detection based on the key subgraph.

The image segmentation process in the above step may be implemented by specifically selecting a sliding window with a size of L × L, setting an overlap ratio as λ ∈ [0,1), and sliding the sliding window with the overlap ratio to obtain a subgraph set X ═ X ∈ { X ∈ [, 1 ∈ [, ] is obtained₁,x₂,…,x_NIn which x_i∈R^L×LThen, the number of obtained subgraphs is:

where H and W are the length and width of the target area image, respectively.

The object of interest is specifically image content with a high possibility of being determined as a target detection object.

Specifically, as an optional implementation manner, the segmenting the target region image to obtain a key sub-image containing the target of interest includes:

segmenting the target area image to obtain a set number of sub-images; adopting a two-classification network model to respectively judge whether the sub-graph contains the interested target or not and obtain the confidence of the judgment result; and selecting the subgraph with the confidence level within a set range from the subgraphs as a key subgraph containing the target of interest.

Referring to fig. 2, a target area image, specifically, a remote sensing image of a seaport, on which ships are distributed, is obtained. When detecting ships in the image, carrying out sub-image segmentation on the image to obtain a sub-image set X. And inputting the subgraphs in the subgraph set X into a key subgraph retaining module, specifically, carrying out image processing through a binary network model, carrying out pooling, full connection and Softmax regression processing on the characteristic graphs of the subgraphs, finally outputting an image tag, marking the subgraphs with the confidence degrees within a set range as the subgraphs containing the interested target, finally obtaining a key subgraph set Xs, and screening the subgraphs containing the ship.

Specifically, in the key subgraph retention module, a two-classification network model is designed to judge whether the subgraph contains ships or not, so that a key subgraph is obtained, a suitable classification function is selected to obtain a subgraph classification confidence coefficient, and the expression is specifically as follows:

f(x),f:R^L×L→R；

in the expression, a real number matrix (definition domain) with the size of L x L is input, the two-classification network model outputs a matrix R (value domain), the matrix R comprises a confidence coefficient f (x), and the confidence coefficient is an output result of the two-classification network model and is calculated by a convolution neural network.

Wherein, the subgraph containing the ship can be specifically defined as a key subgraph x_sThe rest are non-key subgraphs and key subgraphs x_sThe set of constituents is:

X_s＝{x_s∈X|f(x_s)>ξ}；

key subgraph retention module through two classesJudging the probability of each subgraph being a key subgraph (containing the interesting object) and a non-key graph (not containing the interesting object) through the network, and outputting a confidence coefficient f (x) when the network outputs one picture_s)>ξ, the sub-graph is considered to be a key sub-graph.

And xi is a threshold value and is used for judging whether an interesting target exists in the sub-image, and when xi is smaller, a high recall rate can be ensured. The number of key subgraphs is: n is a radical of_s＝|X_s|。

Specifically, in this embodiment, the determination of the key sub-image only needs to determine that the key sub-image includes the target of interest, where the target of interest may be a complete ship or a incomplete ship existing in the sub-image after image cutting, that is, the target of interest may be a complete target detection object or a partial content of the target detection object.

The set number is a numerical value determined in advance based on the size of the target area image; the set range is an empirical numerical range for determining whether the confidence level meets the requirement.

And 102, performing primary image recognition on the key subgraph to acquire the relative position of the target of interest in the key subgraph.

The relative position specifically refers to a relative position relationship of a position where the target of interest is located in the key subgraph. For example, the identified object of interest is located in the key sub-image at a middle position of the image, at an edge position of the image, at an upper side, a lower side, a left side or a right side of the image, and so on.

Here, as shown in fig. 3, after the input image is acquired and the detection region generation processing is performed on the input image to obtain a key sub-image, image recognition needs to be performed on the key sub-image, which may be specifically implemented by a hierarchical detector.

In the step, a strong detector is used as the first-level detector in the hierarchical detector to perform target detection on the key subgraph, and a target detection frame corresponding to the target of interest in the key subgraph can be output when the key subgraph is subjected to image recognition.

And 103, aggregating the first interested target positioned at the edge of the key subgraph according to the relative position to obtain an aggregated image.

The first interested target located at the edge of the sub-image is specifically the image content of the pixel point including the pixel point at the edge of the sub-image. I.e. all or part of the first object of interest covers the sub-graph edges.

Specifically, when the target of interest in one key sub-graph is the hull part of the ship, because the bow part and the stern part of the ship connected with the hull part are located in other key sub-graphs, the hull part is a incomplete detection object, and at this time, the pixel points in the hull part are inevitably covered to the edge part of the sub-graph in the key sub-graph, it can be confirmed through the characteristics that the target of interest (namely the hull part) in the current key sub-graph is an incomplete target detection object, at this time, the first target of interest located at the edge of the sub-graph in the key sub-graph needs to be aggregated, so that the incomplete target of interest in the cut key sub-graph can be integrated to form a complete image area, and when secondary image detection is performed subsequently, accurate and complete detection of the target object in the large-scale target area image can be realized only by performing content detection on the integrated complete image area, in addition, the process ensures that the integrity of the target does not need to be considered when the target area image is segmented, so the cutting overlapping proportion can be very small or even 0, the overlapping rate of the images during image segmentation is reduced, the image data redundancy during image content detection is reduced, and a large amount of repeated image detection operation is avoided.

And the second interested target which is not positioned at the edge of the sub-image is regarded as a complete target detection object and does not need to be aggregated.

The aggregation here refers to aggregating and combining the detected different objects of interest according to a set index. The setting index is specifically the relative distance between different objects of interest. When a first interested target located at the edge of the sub-image is aggregated, the interested targets whose relative distances are within a set distance range are divided into a group, so that the group of interested targets is aggregated and combined in an image (specifically, an image containing the group of interested targets is cut from a target area image, or sub-image edge image areas where the group of interested targets are located are spliced), and the obtained image is an aggregated image.

In a specific implementation process, the image aggregation is performed on the interested targets, image positions corresponding to the first interested targets in the target area image can be obtained according to relative positions of different first interested targets in the key sub-image, relative distances between the first interested targets are obtained based on the obtained image positions of the first interested targets, the interested targets are summarized and aggregated based on the relative distances, specifically, the relative distances are within a set range, or the relative distances are within a certain distribution density range, the corresponding interested targets are summarized into a group of target objects, image areas where the group of target objects are located are cut from the target area image, or edge image areas of sub-images where the group of interested targets are located are spliced to obtain an aggregated image, wherein when the objects of interest are summarized into multiple groups based on their relative distances, multiple aggregate images are obtained.

When the aggregation processing of the interested targets is realized, the clustering processing can be realized by means of a clustering algorithm, specifically, a primary clustering algorithm is implemented on all the interested targets located at the edge of the sub-graph, as shown in fig. 3, the clustering algorithm can be executed by a clustering module in a hierarchical detector, incomplete target detection frames corresponding to first interested targets located at the edge of the sub-graph in adjacent key sub-graphs are aggregated, a new detection area is synthesized, the relative distance between the first interested targets located in the adjacent key sub-graphs is within a target distance range, a plurality of aggregated images are obtained, and the aggregation of the first interested targets located at the edge of the sub-graph is realized; the processor is also used for carrying out the aggregation on all interested targets positioned at the edge of the subgraph according to the relative distance, or directly through the set distance range parameter.

And 104, performing secondary image detection on the aggregated image to obtain a first detection object.

In the step, the image detection of the aggregated complete interested target is realized, so that a corresponding detection object is obtained, and the accurate and complete detection of the target object in the large-scale target area image is realized.

In a specific implementation, as shown in fig. 3, a weak detector may be used as a secondary detector in the hierarchical detector to perform secondary detection on the synthesized detection region.

In the steps, the hierarchical detector firstly receives the key subgraph input by the key subgraph retaining module to detect a first-level target object, then introduces a clustering algorithm to implement secondary cascade detection, and serially connects two-level detection networks in a clustering mode to ensure the detection precision. The method can not only filter a large number of useless subgraphs and accelerate the detection process, but also get rid of the influence of subgraph cutting overlapping rate on the detection performance, thereby ensuring the detection precision and speed.

And 105, mapping the first detection object to the target area image, and acquiring the target object in the target area image.

After the first detection object is obtained, the detection object may be mapped to an original image (i.e., a target area image), so as to obtain a target object in the target area image, and thus, the target object in the target area image may be detected.

As an optional implementation manner, referring to fig. 4, after performing the primary image recognition on the key sub-image and acquiring the relative position of the object of interest in the key sub-image, the detection method further includes:

and step 106, acquiring a second interested target which is not positioned at the edge of the key subgraph according to the relative position, and taking the second interested target as a second detection object.

That is, the key subgraph is multiple, and the positions of the objects of interest contained in different key subgraphs in the graph are different. For the interested target which is not located at the edge of the sub-image, the interested target can be regarded as a complete detection object.

The first interested target which is not located at the edge of the sub-image is the image content of the pixel points which does not include the pixel points at the edge of the sub-image. I.e. no part of the first object of interest that covers the edge of the sub-graph is present.

Correspondingly, the step 105 of mapping the first detection object into the target area image to obtain the target object in the target area image further includes:

and mapping the first detection object and the second detection object to a target area image to acquire a target object in the target area image.

That is, the complete detection object directly recognized by the primary image recognition and the complete detection object recognized by the aggregation and secondary image recognition are mapped to the original image (i.e., the target area image) together to obtain the target object in the target area image, thereby realizing the detection of the target object in the target area image.

Specifically, the mapping the first detection object and the second detection object to the target area image to obtain the target object in the target area image includes:

carrying out duplicate removal processing on the first detection object and the second detection object to obtain a target detection object obtained after processing; and mapping the target detection object into the target area image to obtain a target object in the target area image.

The first detection object and the second detection object may have a phenomenon of overlapping detection results, where the first detection object and the second detection object are subjected to deduplication processing, specifically, deduplication processing may be performed by using a non-maximum suppression strategy, so that a target detection object subjected to deduplication processing is mapped to an original image, a target object in a target area image is obtained, and detection of the target object in the target area image is achieved.

In the embodiment of the application, the key subgraph containing the interested target is obtained by segmenting the target area image, performing primary image recognition on the key subgraph, acquiring the relative position of the interested target in the key subgraph, according to the relative position, aggregating the first interested target positioned at the edge of the key subgraph to obtain an aggregated image, performing secondary image detection on the aggregated image to obtain a first detection object, mapping the first detection object to a target area image to obtain a target object in the target area image, the process realizes the identification of the interested target in the image through the first image identification, aggregates the interested target after the identification, and carries out the secondary detection on the aggregated image after the image aggregation, the method and the device can ensure the detection accuracy and the integrity of the detected target object, and realize accurate and efficient target detection on the large-scale image.

The embodiment of the application also provides different implementation modes of the detection method of the target object in the large-scale image.

Referring to fig. 5, fig. 5 is a flowchart three of a method for detecting a target object in a large-scale image according to an embodiment of the present application. As shown in fig. 5, a method for detecting a target object in a large-scale image includes the following steps:

step 501, segmenting the target area image to obtain a key subgraph containing the interested target.

The implementation process of this step is the same as that of step 101 in the foregoing embodiment, and is not described here again.

And 502, performing primary image detection on the key subgraph to obtain a detection frame with an interested target selected by the frame.

Here, the primary image detection uses a strong detector to detect, and outputs a target detection frame of the key subgraph, and the frame in the target detection frame selects a detection result, namely, a detected interesting target.

In the embodiment of the application, whether the position of the target frame is at the edge of the sub-image is judged, so that whether the target frame is a target detection frame at the edge of the sub-image is judged.

Step 503, obtaining the center coordinates of the detection frame and the size of the frame body.

The frame dimensions specifically include frame length and frame width.

Step 504, according to the center coordinate and the size, a relative distance between the detection frame and the edge of the key sub-image is determined.

The step is based on the judgment of the relative distance between the detection frame and the edge of the key subgraph, and the judgment of whether the interested target in the current key subgraph is at the edge of the key subgraph is realized.

And step 505, determining the relative position of the target of interest in the key subgraph according to the relative distance.

Specifically, here, the relative distance between the detection frame and the edge of the key sub-image is the relative distance between the object of interest and the edge of the key sub-image.

And judging whether the current interested target is at the edge position of the key subgraph or not according to the relative distance.

Specifically, when the relative distance is 0, it is determined whether the object of interest is located at an edge of a key sub-graph, the object of interest is an incomplete detected object, and when the relative distance is greater than 0, it is determined that the object of interest is not located at an edge of a key sub-graph, the object of interest is an intact detected object.

Step 506, according to the relative position and the central coordinate of the detection frame, aggregating the first interested targets positioned at the adjacent edges between the sub-images in the key sub-image to obtain an aggregated central point coordinate.

During the specific implementation of the step, the mean shift iteration of the central coordinates is performed by adopting the following formula according to the relative position and the central coordinates of the detection frame:

wherein Ci is a central coordinate of a detection frame i corresponding to the first interested target; n is a preset image aggregation area; k is a kernel function; m is a clustering algorithm; t is the number of iterations; when in use

When the iteration is stopped; e is a set value.

Step 507, determining the size of the aggregated image area according to the size of the detection frame corresponding to the first interested target.

And step 508, cutting the target area image to obtain an aggregated image according to the coordinates of the aggregated central point and the size of the aggregated image area.

And clustering the target detection frames of the adjacent key subgraph edges by a clustering module to synthesize a new detection area. Specifically, meanshift clustering is performed on all target frames located at edges of sub-graphs to obtain several clustering regions with variable category numbers, so that regions to be detected are generated, and aggregation of first interested targets located at adjacent edges between the sub-graphs in a key sub-graph is achieved.

In this step, the generation of the aggregate image needs to be done with the aid of the original target area image. And cutting the target area image to obtain a polymerization image by acquiring the size of the polymerization image area and the polymerization central point coordinate.

And detecting the incomplete target in one region at one time through clustering operation. The first-stage detector detects global key subgraphs to realize basic detection, the number of pictures during first-stage detection is large, the sizes of the pictures are consistent, the detection amount is large, the second-stage detector detects clustered regions of incomplete targets, the detection precision is improved, the number of the pictures is reduced, the sizes of the clustered images are inconsistent, the detection amount is small, and finally the results of the first-stage detector and the second-stage detector are filtered and mapped back to the original images through a non-maximum inhibition method.

Step 509, performing secondary image detection on the aggregate image to obtain a first detection object.

The implementation process of this step is the same as that of step 104 in the foregoing embodiment, and is not described here again.

Step 510, mapping the first detection object to the target area image, and acquiring a target object in the target area image.

The implementation process of this step is the same as that of step 105 in the foregoing embodiment, and is not described here again.

In the embodiment of the application, a key sub-image containing an interested target is obtained by segmenting an image of a target region, primary image recognition is carried out on the key sub-image, the relative distance between a detection frame and the edge of the key sub-image is judged, the relative position of the interested target in the key sub-image is determined, a first interested target positioned on the adjacent edge between the sub-images in the key sub-image is aggregated according to the relative position and the central coordinate of the detection frame to obtain an aggregation central point coordinate, the size of an aggregated image region is determined according to the size of the detection frame corresponding to the first interested target, then the image of the target region is cut according to the aggregation central point coordinate and the size of the aggregated image region to obtain an aggregated image, secondary image detection is carried out on the aggregated image to obtain a first detected object, and the first detected object is mapped to the image of the target region, the method comprises the steps of obtaining a target object in a target area image, realizing interested target identification in the image through first image identification, aggregating the interested target after identification, and carrying out secondary detection on the aggregated image after image aggregation so as to ensure detection accuracy and integrity of the detected target object and realize accurate and efficient target detection on a large-scale image.

Referring to fig. 6, fig. 6 is a structural diagram of a device for detecting a target object in a large-scale image according to an embodiment of the present application, and for convenience of description, only a part related to the embodiment of the present application is shown.

The apparatus 600 for detecting a target object in a large-scale image includes:

the image segmentation module 601 is configured to segment an image of a target region to obtain a key sub-image including an interested target;

a first image recognition module 602, configured to perform primary image recognition on the key sub-image, and acquire a relative position of the target of interest in the key sub-image;

the image aggregation module 603 is configured to aggregate the first interest target located at the edge of the sub-image in the key sub-image according to the relative position to obtain an aggregated image;

the second image recognition module 604 is configured to perform secondary image detection on the aggregate image to obtain a first detection object;

an obtaining module 605, configured to map the first detection object into the target area image, and obtain a target object in the target area image.

The image segmentation module 601 is specifically configured to:

segmenting the target area image to obtain a set number of sub-images;

adopting a two-classification network model to respectively judge whether the sub-graph contains the interested target, and obtaining the confidence of the judgment result;

and selecting the subgraph with the confidence coefficient within a set range from the subgraphs as a key subgraph containing the target of interest.

The first image recognition module 602 is specifically configured to:

performing primary image detection on the key subgraph to obtain a detection frame of the interesting target selected by the frame;

acquiring the central coordinate of the detection frame and the size of the frame body;

judging the relative distance between the detection frame and the edge of the key subgraph according to the central coordinate and the size;

and determining the relative position of the target of interest in the key subgraph according to the relative distance.

The image aggregation module 603 is specifically configured to:

according to the relative position and the central coordinate of the detection frame, aggregating the first interested targets positioned at the adjacent edges between the sub-images in the key sub-image to obtain an aggregated central point coordinate;

determining the size of the aggregated image area according to the size of the detection frame corresponding to the first interested target;

and according to the coordinates of the aggregation central point and the size of the aggregated image area, cutting the target area image to obtain the aggregated image.

The device also includes:

the detection object acquisition module is used for acquiring a second interested target which is not positioned at the edge of the key subgraph in the key subgraph according to the relative position and taking the second interested target as a second detection object;

correspondingly, the obtaining module 605 is further configured to:

and mapping the first detection object and the second detection object to the target area image to acquire the target object in the target area image.

The obtaining module 605 is specifically configured to:

carrying out duplicate removal processing on the first detection object and the second detection object to obtain a target detection object obtained after processing;

and mapping the target detection object to the target area image to acquire the target object in the target area image.

The device for detecting the target object in the large-scale image provided by the embodiment of the application can realize each process of the embodiment of the method for detecting the target object in the large-scale image, can achieve the same technical effect, and is not repeated here to avoid repetition.

Fig. 7 is a block diagram of a terminal according to an embodiment of the present application. As shown in the figure, the terminal 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, the processor 70 implementing the steps in any of the various method embodiments described above when executing the computer program 72.

The terminal 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is only an example of a terminal 7 and does not constitute a limitation of the terminal 7, and that it may comprise more or less components than those shown, or some components may be combined, or different components, for example the terminal may further comprise input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal 7, such as a hard disk or a memory of the terminal 7. The memory 71 may also be an external storage device of the terminal 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like provided on the terminal 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The present application realizes all or part of the processes in the method of the above embodiments, and may also be implemented by a computer program product, when the computer program product runs on a terminal, the steps in the above method embodiments may be implemented when the terminal executes the computer program product.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for detecting a target object in a large-scale image is characterized by comprising the following steps:

2. The detection method according to claim 1, wherein the segmenting the target region image to obtain a key sub-image containing the target of interest comprises:

segmenting the target area image to obtain a set number of sub-images;

3. The detection method according to claim 1, wherein the performing primary image recognition on the key sub-image to obtain the relative position of the object of interest in the key sub-image comprises:

4. The detection method according to claim 3, wherein the aggregating the first interest objects located at the sub-graph edges in the key sub-graph according to the relative positions to obtain an aggregated image comprises:

5. The detection method according to claim 1, wherein after the primary image recognition of the key sub-image and the obtaining of the relative position of the object of interest in the key sub-image, further comprising:

according to the relative position, acquiring a second interested target which is not positioned at the edge of the key subgraph in the key subgraph, and taking the second interested target as a second detection object;

the mapping the first detection object to the target area image to obtain the target object in the target area image further includes:

6. The detection method according to claim 5, wherein the mapping the first detection object and the second detection object into the target area image to obtain the target object in the target area image comprises:

7. An apparatus for detecting a target object in a large-scale image, comprising:

8. The detection apparatus according to claim 7, wherein the image segmentation module is specifically configured to:

segmenting the target area image to obtain a set number of sub-images;

9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.