CN114049499A

CN114049499A - Target object detection method, apparatus and storage medium for continuous contour

Info

Publication number: CN114049499A
Application number: CN202111371848.1A
Authority: CN
Inventors: 何涛
Original assignee: Nanchang Black Shark Technology Co Ltd
Current assignee: Nanchang Black Shark Technology Co Ltd
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-02-15

Abstract

The invention provides a target object detection method, equipment and a storage medium for continuous contours, which relate to the technical field of target detection and comprise the following steps: acquiring an image set containing a target object, and carrying out contour recognition and segmentation on each image in the image set to obtain a region image; adopting a picture similarity recognition algorithm to carry out class marking on each region image to be used as training data; constructing a convolutional neural network model, and training to obtain a target model for classifying input images; acquiring an image to be identified, and processing the image to be identified by adopting an edge extraction algorithm to acquire a target area image of the image to be identified and position information of the target area image; inputting the target model, acquiring the category of the target area image, and integrating the position information of the target area image to generate a detection result, thereby solving the problem that the training data marking needs to be completed manually in the existing target detection process, which consumes much time.

Description

Target object detection method, apparatus and storage medium for continuous contour

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a method, an apparatus, and a storage medium for detecting a target object with a continuous contour.

Background

Object detection, which is one of the basic tasks in the field of computer vision, generally locates a target object in an image and assigns a corresponding tag to the target object. At present, when a target detection task is realized, commonly used algorithm frames such as SSD, R-CNN, YOLO and the like need to manually finish position marking information and classification in the process of preparing trained data, and a model is constructed to finish training. Aiming at different characteristics and recognition task requirements of the recognition object, the position of the recognized object can be output only by manually participating in accurate marking information, so that the time consumption is high and the task is heavy.

Disclosure of Invention

In order to overcome the technical defects, the invention aims to provide a method, equipment and a storage medium for detecting a target object with a continuous contour, which are used for solving the problems that training data labeling needs to be completed manually in the existing target detection process, and the tasks are heavy and the time consumption is high.

The invention discloses a target object detection method for continuous contours, which comprises the following steps:

acquiring an image set containing a target object, and performing contour recognition and segmentation on each image in the image set by adopting an edge extraction algorithm to obtain a region image corresponding to each image;

adopting a picture similarity recognition algorithm to carry out category marking on each region image so as to obtain an image set with category labels as training data;

constructing a convolutional neural network model, and training the convolutional neural network model by adopting the training data to obtain a target model for classifying input images;

acquiring an image to be identified, and processing the image to be identified by adopting an edge extraction algorithm to acquire a target area image of the image to be identified and position information of the target area image;

and inputting the target area image into the target model, acquiring the type of the target area image, and collecting the position information of the target area image to generate a detection result.

Preferably, after the image to be recognized is processed by using an edge extraction algorithm and before a target area image of the image to be recognized and position information of the target area image are obtained, the following steps are included:

processing the image to be identified by adopting an edge extraction algorithm to obtain a plurality of subarea images and position information thereof;

splicing the subarea images to obtain a target area image of the image to be identified and position information of the target area image;

and the position information of the target area image is a set of the position information of each sub-area image.

Preferably, the image similarity recognition algorithm is adopted to perform category marking on each region image to obtain an image set with category labels, and the method includes the following steps:

carrying out format processing on each area image, and calculating a gray average value;

comparing the gray value of each regional image with the average gray value to obtain a corresponding hash value of each regional image;

and calculating a Hamming distance based on the hash value of each region image to obtain the similarity of each region image, and classifying and marking each region image according to the similarity to obtain a sample image set with a class label.

carrying out format processing on each area image, and calculating a discrete cosine transform matrix;

reducing the discrete cosine transform matrix and calculating the average value of the discrete cosine transform matrix;

calculating a hash value corresponding to each area image according to the average value of the discrete cosine transform matrix;

Preferably, the acquiring a set of images containing a target object comprises the following:

carrying out video recording on a target object to obtain video data containing the target object at a plurality of positions;

and generating an image set containing the target object based on the video data framing.

Preferably, training the convolutional neural network model using the training data to obtain a target model for classifying the input image includes:

inputting the training data into the convolutional neural network model, classifying each image in the training data, and generating an output;

and comparing the output with the class label of the training data, and reversely adjusting the convolutional neural network model until the training is completed to obtain a target model.

Preferably, the processing of the image to be recognized by using an edge extraction algorithm to obtain a target area image of the image to be recognized and position information of the target area image includes the following steps:

performing graying processing on the image to be identified and then performing filtering operation to obtain a first processed image;

obtaining a gradient image by utilizing an edge detection operator based on the first processing image, and carrying out binarization operation on the gradient image to obtain a second processing image;

after AND operation is carried out on the basis of the pixel position in the second processed image, performing expansion corrosion and filling operation in an iterative manner to generate a target area image of the image to be recognized;

and comparing the target area image with the image to be identified to generate the position information of the target area image.

Preferably, the edge extraction algorithm is executed by calling a preset edge extraction script, and the image similarity recognition algorithm is used for performing category marking on each region image to call a marking script for execution.

The present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above target object detection method when executing the computer program.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned target object detection method.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

the method uses the edge detection algorithm to replace manual labeling, and adopts an automatic script to finish the classification of the region images extracted by the edge detection algorithm, so as to solve the problems that training data labeling needs to be finished manually in the existing target detection process, the task is heavy, and the consumed time is more. And constructing a CNN model to finish the training of the classified target, generating a target model for classifying the target area image in the image to be recognized, and outputting the position information of the target area image extracted by combining an edge detection algorithm to obtain a detection result.

Drawings

FIG. 1 is a flowchart of a first embodiment of a method, an apparatus and a storage medium for detecting a target object of a continuous contour according to the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method, an apparatus, and a storage medium for detecting a target object of a continuous contour according to the present invention;

FIG. 3 is a flowchart of a process for representing a target object of interest in a large field of a picture, including elements of a plurality of discrete units, in different implementation scenarios according to a first embodiment of the method, apparatus and storage medium for detecting a target object of a continuous contour according to the present invention;

fig. 4 is a block diagram illustrating a second embodiment of a method, an apparatus, and a storage medium for detecting a target object with a continuous contour according to the present invention.

Reference numerals: 6-a computer device; 61-a memory; 62-a processor; 63-processing module for carrying out the method for the detection of a target object for a continuous contour

Detailed Description

The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings.

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.

The first embodiment is as follows: the embodiment provides a target object detection method for continuous contours, which is applied to a server side, and referring to fig. 1 and fig. 2, it should be noted that fig. 2 is a module and a flow diagram carried by each algorithm or operation, and includes the following steps:

s100: acquiring an image set containing a target object, and performing contour recognition and segmentation on each image in the image set by adopting an edge extraction algorithm to obtain a region image corresponding to each image;

specifically, the acquiring of the image set including the target object includes the following steps:

s110: carrying out video recording on a target object to obtain video data containing the target object at a plurality of positions;

in this step, the video acquisition of the target object may be implemented by using a mobile terminal, or may be performed at the client, and then the acquired video data is sent to the server, so as to perform the processing of steps S100 to S500 in this embodiment.

S120: and generating an image set containing the target object based on the video data framing.

In the above steps, the acquired video data may be framed by software processing such as PR and AE, or a program may be called to perform framing on the server side according to a preset rule.

It should be emphasized that, in step S100, the Edge extraction algorithm is executed by invoking a preset Edge extraction script, in this embodiment, the Edge Detection algorithm includes, but is not limited to, canny algorithm, Laplacian operator, Fast Edge algorithm, HED algorithm (hold-Nested Edge Detection), and the like, and an appropriate Edge Detection algorithm may be selected according to actual use requirements, and further, other existing algorithms or network structures capable of implementing Edge Detection may also be used herein.

S200: adopting a picture similarity recognition algorithm to carry out category marking on each region image so as to obtain an image set with category labels as training data;

specifically, in the above steps, the image similarity recognition algorithm is used to perform class marking on each region image and call a marking script for execution, the image similarity recognition algorithm may adopt a method including but not limited to AHash (mean hash), dHash (difference hash), and pHash (perceptual hash), and other algorithms that may be used to calculate the image similarity or implement image classification may also be used herein, and in the following, the AHash algorithm and the phaash algorithm are taken as two specific methods in this embodiment:

and (3) processing by adopting mean hash: specifically, the method for obtaining an image set with category labels by using a picture similarity recognition algorithm to perform category labeling on each region image includes the following steps:

s211: carrying out format processing on each area image, and calculating a gray average value;

the format processing includes reducing the size (reducing the picture to 8 × 8 for removing details of the picture), simplifying the color (converting to 64-level gray), and the like, and then calculating the gray average value of each pixel in the area image, and different processing methods can be selected according to the specific implementation scenes in the process of compressing the picture, for example, opencv calls different interpolation, including but not limited to nearest neighbor interpolation, bilinear interpolation, bicubic interpolation, Lanczos interpolation in the 8 × 8 pixel neighborhood, and the like, so as to reduce the situation that the difference of the image before and after compression is too large.

S212: comparing the gray value of each regional image with the average gray value to obtain a corresponding hash value of each regional image;

specifically, the average value is greater than or equal to 1; if the average value is smaller than the average value and is recorded as 0, a 64-bit matrix is formed, that is, the hash value is regarded as the fingerprint of the corresponding area image.

S213: and calculating a Hamming distance based on the hash value of each region image to obtain the similarity of each region image, and classifying and marking each region image according to the similarity to obtain a sample image set with a class label.

Specifically, the hamming distance refers to the number of different characters at corresponding positions of two character strings, 4 of 64-bit hash value sequences are generally selected to be divided, and the divided 4 characters are converted into hexadecimal to be compared, and if the different data bits are not more than 5, the two pictures are very similar; if the number is larger than 10, the pictures are two different pictures, and therefore the area images are classified and marked. As a supplementary explanation, other similarity calculation algorithms than hamming distance that achieve the same effect may be used here.

Processing with perceptual hashing: adopting a picture similarity recognition algorithm to carry out category marking on each region image so as to obtain an image set with category labels, wherein the method comprises the following steps:

s221: carrying out format processing on each area image, and calculating a discrete cosine transform matrix;

in the above steps, the format processing includes reducing the size (32 × 32, the compression processing may be consistent with or inconsistent with the size as described in S211), graying (converting into 64-level gray), and the like, and then calculating a DCT (discrete cosine transform) average value, and the discrete cosine transform matrix in the above steps can be obtained by presetting a one-dimensional DCT transform formula or a two-dimensional DCT transform formula.

S222: reducing the discrete cosine transform matrix and calculating the average value of the discrete cosine transform matrix;

DCT is a special fourier transform that transforms the picture from the pixel domain to the frequency domain and the DCT matrix represents the higher and higher frequency coefficients from the upper left corner to the lower right corner, but the coefficients are 0 or close to 0 elsewhere except in the upper left corner, so here the upper left 8x8 matrix is chosen.

S223: calculating a hash value corresponding to each area image according to the average value of the discrete cosine transform matrix;

in the above step, each DCT value is compared with the average value. Greater than or equal to the average value, denoted as 1, less than the average value, denoted as 0, thereby generating a binary number group (similar to that in step S212 described above).

S224: and calculating a Hamming distance based on the hash value of each region image to obtain the similarity of each region image, and classifying and marking each region image according to the similarity to obtain a sample image set with a class label.

Step S224 is similar to step S213, and if the data bits with different hamming distances do not exceed 5, it indicates that the two pictures are very similar; if the number is larger than 10, the pictures are two different pictures, and therefore the area images are classified and marked. Other similarity calculation algorithms than hamming distance that achieve the same effect may be used herein.

Based on the above steps S100-S200, the object separated by the edge extraction algorithm in this embodiment has clear feature uniformity and coordinate information, and based on the feature uniformity of the identified object, an automatic script can be used to perform category labeling by using the above hash value method, for example, so that the manual labeling behavior is replaced, and the randomness of manual labeling is shielded, thereby overcoming the problems that the labeling of training data needs to be manually completed in the existing target detection process, and the task is heavy and the time consumption is more.

S300: constructing a convolutional neural network model, and training the convolutional neural network model by adopting the training data to obtain a target model for classifying input images;

in this embodiment, the classification of the training data for detection of the target object is implemented by combining the edge detection algorithm of the above steps S100-S200 with the hash value calculation, and the CNN network (i.e. the above convolutional neural network model) is used to classify the pictures during the execution of monitoring of the target object, and the training data generated by S100-S200 is used to train the CNN network to obtain the target model.

Specifically, the training of the convolutional neural network model by using the training data to obtain a target model for classifying the input image includes the following steps:

s310: inputting the training data into the convolutional neural network model, classifying each image in the training data, and generating an output;

specifically, the image with the category label is input into a convolutional neural network model, and the convolutional neural network model classifies the image and then compares the classified image with the category label to train the convolutional neural network model.

S320: and comparing the output with the class label of the training data, and reversely adjusting the convolutional neural network model until the training is completed to obtain a target model.

In the above steps, the completion of the training may be defined as a preset training period or a comparison result between the output and the category label meeting a preset range. If the data amount in the initial training data is less, the trained model can be used to continuously select the segmented contour map (namely the region image) (infinite iteration can be carried out)

S400: acquiring an image to be identified, and processing the image to be identified by adopting an edge extraction algorithm to acquire a target area image of the image to be identified and position information of the target area image;

specifically, the processing the image to be recognized by using the edge extraction algorithm to obtain the target area image of the image to be recognized and the position information of the target area image includes the following steps:

s410: performing graying processing on the image to be identified and then performing filtering operation to obtain a first processed image;

in the above steps, the image to be identified is grayed to facilitate data processing, and the algorithm for edge detection is mainly based on the first derivative and the second derivative of the image intensity, but the derivatives are usually sensitive to noise, so a filter must be used to improve the performance of the edge detector related to noise, and common filtering includes gaussian filtering, median filtering, and the like.

S420: obtaining a gradient image by utilizing an edge detection operator based on the first processing image, and carrying out binarization operation on the gradient image to obtain a second processing image;

for further specific illustration, by way of example and not limitation, the Sobel gradient operators in the x direction and the y direction may respectively obtain gradient images, convert the gradient images into a preset type, and perform binarization on the converted gradient images in the x and y directions to obtain a binary image, i.e., the second processed image.

S430: after AND operation is carried out on the basis of the pixel position in the second processed image, performing expansion corrosion and filling operation in an iterative manner to generate a target area image of the image to be recognized;

specifically, the and operation in the above steps can be used to highlight points with significant changes in the intensity values in the image gray point field, and the iterative expansion erosion and filling operation is used to determine the edge position and orientation.

S440: and comparing the target area image with the image to be identified to generate the position information of the target area image.

That is, the target area image is generated according to the above steps S410 to S430, and the position information of the target area image can be obtained according to the coordinates captured by the target area image, it should be noted that in this embodiment, a Rect object is used to store the position of the target area image, this Rect object is used to store parameters appearing in pairs, such as the coordinates, the width and the height of the upper left corner of a rectangular frame, and as an example, Rect (the x coordinate of the upper left corner, the y coordinate of the upper left corner, the width and the height).

However, in the process of processing the target model, according to different implementation scenarios, according to different tasks, the target object to be focused may be a large domain in a picture, and include elements of a plurality of discrete units, and the identification of the large domain can be completed by splicing the position information of the small units, so that after the image to be identified is processed by using an edge extraction algorithm, and before the target area image of the image to be identified and the position information of the target area image are obtained, referring to fig. 3, the following steps are included:

s400-1: processing the image to be identified by adopting an edge extraction algorithm to obtain a plurality of subarea images and position information thereof;

in the above steps, an edge extraction algorithm is used to identify the contour containing the target object in the image to be identified, where the contour may be only one or multiple, and when there is only one identified contour, the image of the target area may be directly captured and obtained, as in the above operations in steps S410-S440, and when there are multiple identified contours, it is described that multiple portions containing the target in the image to be identified.

S400-2: splicing the subarea images to obtain a target area image of the image to be identified and position information of the target area image; and the position information of the target area image is a set of the position information of each sub-area image.

If there are a plurality of identified outlines, each identified outline is intercepted and spliced, namely, the sum of all parts of the target object in the image to be identified is obtained and is used as a target area image, the set of the position information of each corresponding outline is used as the position information of the target area image, and the specific sub-image intercepted by each outline and the position information corresponding to each outline can be obtained according to the operation of the steps S410-S420.

S500: and inputting the target area image into the target model, acquiring the type of the target area image, and collecting the position information of the target area image to generate a detection result.

According to the above steps S300-S400, the image to be recognized is segmented by the edge detection algorithm to obtain the target area image, the target area image is sent to the target model recognition as Input, the category corresponding to the target area image is output, and the result of the target model recognition is added with the position information in the step S400, so that the target detection can be completed.

In the above embodiment, the process of steps S100-S200 is to use an edge extraction calculation script to segment the contour of the identified object, segment to obtain a region image and label, and instead of manual labeling, classify the segmented data set (region image) by the classification script (i.e. the hash value calculation in step S200), i.e. obtain training data for training the convolutional neural network model in the subsequent step S300, and the process of steps S400-S500 is to rely on the target model obtained by the training, and in a single identification process, take the pre-identified object (i.e. target region image) segmented from the image to be identified as Input, Input the target model for identification, and attach the Rect information of the target region image, thereby completing the identification and detection of the target object. The edge detection algorithm is used for obtaining the area image and the position information thereof, has definite characteristic uniformity, replaces manual marking behaviors, and shields the randomness of manual marking. The feature uniformity of the target object can use an automatic script to complete the classification of the detected target with the edge extracted. The method comprises the steps of constructing lightweight CNN model training classification data, rapidly outputting an identification model (namely an object model), separating a contour (namely an object area image) by an edge extraction algorithm in the identification execution process, specifically, dispersing certain object objects into independent small units to be identified, performing recombination calculation on the identified small unit position information according to the task requirement, obtaining large unit position information, obtaining the object area image and position information by edge extraction, inputting the object area image into the object model and outputting the category, and combining the position information to generate a detection result, so that the method is rapid and accurate.

Example two: in order to achieve the above object, the present invention further provides a computer device 6, referring to fig. 4, the computer device may include a plurality of computer devices, components of the processing module 63 for executing the method for detecting a target object of a continuous contour according to the embodiment may be distributed in different computer devices 6, and the computer device 6 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by a plurality of servers) or the like for executing a program. The computer device of the embodiment at least includes but is not limited to: the memory 61, the processor 62 and the cache, which are communicatively connected to each other through the system bus, execute the processing module 63 for the target object detection method of the continuous contour according to the first embodiment, as shown in fig. 4. It should be noted that fig. 4 only shows a computer device with components, but it should be understood that not all of the shown components are required to be implemented, and more or fewer components may be implemented instead.

In this embodiment, the memory 61 may include a program storage area and a data storage area, wherein the program storage area may store an application program required for at least one function of the system; the storage data area can store skin data information of a user on the computer device. Further, the memory 61 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 61 may optionally include memory 61 located remotely from the processor, which may be connected to the server system via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Processor 62 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 92 is typically used to control the overall operation of the computer device. In this embodiment, the processor 62 is configured to run the program code stored in the memory 61 or process data, for example, run the processing module 63 that executes the target object detection method for continuous contours described in the first embodiment, so as to implement the target object detection method for continuous contours in the first embodiment.

It is noted that fig. 4 only shows the computer device 6 with components 61-62, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the processing module stored in the memory 61 for executing the target object detection method for continuous contours according to the embodiment may be further divided into one or more program modules, and the one or more program modules are stored in the memory 61 and executed by one or more processors (in this embodiment, the processor 62) to complete the present invention.

Example three:

to achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor 62, implements corresponding functions. The computer-readable storage medium of this embodiment is used to execute the processing module of the method for detecting a target object of a continuous contour according to the first embodiment, and when being executed by the processor 62, the processing module implements the method for detecting a target object of a continuous contour according to the first embodiment.

It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.

Claims

1. A method for continuous contour target object detection, comprising:

2. The method for detecting the target object according to claim 1, wherein after the image to be recognized is processed by adopting an edge extraction algorithm and before a target area image of the image to be recognized and position information of the target area image are obtained, the method comprises the following steps:

processing the image to be identified by adopting an edge extraction algorithm to obtain a plurality of subarea images and position information thereof; splicing the subarea images to obtain a target area image of the image to be identified and position information of the target area image;

3. The target object detection method of claim 1, wherein the image similarity recognition algorithm is used to perform class labeling on each region image to obtain a class-labeled image set, and the method comprises the following steps:

comparing the gray value of each regional image with the average gray value to obtain a corresponding hash value of each regional image; and calculating a Hamming distance based on the hash value of each region image to obtain the similarity of each region image, and classifying and marking each region image according to the similarity to obtain a sample image set with a class label.

4. The target object detection method of claim 1, wherein the image similarity recognition algorithm is used to perform class labeling on each region image to obtain a class-labeled image set, and the method comprises the following steps:

5. The method of claim 1, wherein the obtaining a set of images containing a target object comprises:

6. The method of claim 1, wherein training the convolutional neural network model with the training data to obtain a target model for classifying the input image comprises: inputting the training data into the convolutional neural network model, classifying each image in the training data, and generating an output;

7. The method for detecting the target object according to claim 1, wherein the step of processing the image to be recognized by using an edge extraction algorithm to obtain a target area image of the image to be recognized and position information of the target area image comprises the following steps:

8. The target object detection method of claim 1, comprising:

the edge extraction algorithm is executed by calling a preset edge extraction script, and the image similarity recognition algorithm is adopted to carry out category marking on each region image and call a marking script for execution.

9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the target object detection method according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the target object detection method according to one of the preceding claims 1 to 8.