CN112989924B

CN112989924B - Target detection method, target detection device and terminal equipment

Info

Publication number: CN112989924B
Application number: CN202110104667.6A
Authority: CN
Inventors: 赵雨佳; 郭奎; 程骏; 庞建新
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2024-05-24
Anticipated expiration: 2041-01-26
Also published as: CN112989924A

Abstract

The application provides a target detection method, which comprises the following steps: acquiring an infrared image to be processed; obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the setting parameters of an anchor frame in the trained first target detection model are obtained by clustering labeling frames of at least two preset training images; and determining a target object in the infrared image to be processed according to the target detection result. By the method, the accuracy of target identification of the infrared image can be improved.

Description

Target detection method, target detection device and terminal equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target detection method, a target detection apparatus, a terminal device, and a computer readable storage medium.

Background

The infrared thermal imaging can be applied to various detection scenes due to the characteristics of being not influenced by illumination, not influenced by background and night visibility, so as to finish functions such as patrol, temperature measurement and the like.

However, due to the unique imaging characteristics of infrared thermal imaging, information in infrared images is often difficult to identify. For example, the infrared image lacks of color information, texture information and other characteristic information in the visible light image, which is obviously less than that in the visible light image, so that in the target detection scene such as human body detection through the infrared image, the difficulty of accurately identifying the target object in the infrared image is greater and the accuracy is lower.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, terminal equipment and a computer readable storage medium, which can improve the accuracy of target identification on infrared images.

In a first aspect, an embodiment of the present application provides a target detection method, including:

Acquiring an infrared image to be processed;

Obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the setting parameters of an anchor frame in the trained first target detection model are obtained by clustering labeling frames of at least two preset training images;

And determining a target object in the infrared image to be processed according to the target detection result.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

The acquisition module is used for acquiring an infrared image to be processed;

The processing module is used for obtaining a target detection result of the infrared image to be processed through a first target detection model after training, wherein the setting parameters of the anchor frames in the first target detection model after training are obtained by clustering the label frames of at least two preset training images;

and the determining module is used for determining a target object in the infrared image to be processed according to the target detection result.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the target detection method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the object detection method as described in the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product for causing a terminal device to execute the above-mentioned object detection method of the first aspect when the computer program product is run on the terminal device.

Compared with the prior art, the embodiment of the application has the beneficial effects that: in the embodiment of the application, the infrared image to be processed can be acquired; then, obtaining a target detection result of the infrared image to be processed through a first target detection model after training, wherein the setting parameters of an anchor frame in the first target detection model after training are obtained by clustering the label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame, and a more accurate target detection result is obtained, so that a target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the embodiment of the application, the infrared image can be accurately and efficiently identified, and the accuracy of identifying the infrared image is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a target detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of another object detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an object detection device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Specifically, fig. 1 shows a flowchart of a target detection method provided by an embodiment of the present application, where the target detection method may be applied to a terminal device.

By way of example, the terminal device may be a robot, a server, a desktop computer, a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or the like. The embodiment of the application does not limit the specific type of the terminal equipment.

As shown in fig. 1, the target detection method may include:

Step S101, acquiring an infrared image to be processed.

In the embodiment of the present application, the infrared image to be processed may be a local image pre-stored in the terminal device, or the infrared image to be processed may also be transmitted to the terminal device by another terminal connected to the terminal device in a communication manner. The infrared image to be processed can be an image acquired by a specified infrared camera or a video frame extracted from a specified video. The specific source of the infrared image to be processed is not limited herein.

The infrared image to be processed can be obtained in various modes. For example, the image selected by the user may be determined as the infrared image to be processed from the images displayed by the terminal device according to the specified operation of the user on the terminal device. Or the infrared image obtained by current shooting of the infrared camera externally connected with the terminal equipment can be used as the infrared image to be processed. The specific acquisition mode of the infrared image to be processed is not limited herein.

The specific image form of the infrared image to be processed can also be various. In some embodiments, the image format of the infrared image to be processed may be a specified format. For example, the pixel values of the pixel points in the infrared image to be processed may be mapped to a specified value range in advance. For example, the pixel values of the original image corresponding to the infrared image to be processed can be divided by 127.5 and then subtracted by 1 to obtain the infrared image to be processed, and at this time, the value range of the pixel values of the infrared image to be processed is linearly scaled to [ -1,1], so that the data scale is reduced, and the subsequent image processing is facilitated. Or the size of the infrared image to be processed is a preset size.

In addition, in some examples, the original infrared image may be obtained, and at least one of noise reduction, sharpening, saturation adjustment, brightness adjustment and the like may be performed on the original infrared image, so as to obtain the infrared image to be processed, so as to improve the feature expression capability in the infrared image to be processed, thereby improving the accuracy of subsequent target detection.

In some embodiments, the acquiring the infrared image to be processed includes:

Acquiring an original infrared image;

And carrying out median filtering treatment on the original infrared image to obtain the infrared image to be treated.

The median filtering is a nonlinear smoothing technology, noise in the original infrared image can be filtered, and meanwhile, edge information can be prevented from being lost. In the target detection for the infrared image, the edge information of the region where the target is located is an important basis for feature extraction in the target identification process, and the median filtering treatment can keep the edge information of the target relatively well under the condition of effectively filtering noise of the original infrared image, so that the obtained infrared image to be processed can lay a relatively good data basis for subsequent target detection.

In the embodiment of the present application, other processing such as sharpening, saturation adjustment, brightness adjustment, etc. may be performed on the original infrared image in addition to the median filtering processing, and the type of preprocessing of the original infrared image is not limited to the median filtering processing.

Step S102, obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the setting parameters of an anchor frame in the trained first target detection model are obtained by clustering the label frames of at least two preset training images.

In the embodiment of the application, the first object detection model may be used for detecting the object category in the image. The first target detection model may be a current target detection model such as YOLOv a 4 model, a fast R-CNN model, a REFINEDET model, or the like, or may be other target detection models that appear later. The specific type of the first object detection model is not limited herein. The target detection result may indicate whether the corresponding infrared image to be processed includes a target object, and if the corresponding infrared image to be processed includes a target object, indicate a target class of the included target object and an area where the target object is located, and so on.

In one example, the particular terminal inference framework to which the first object detection model applies may be determined according to a particular application scenario. Correspondingly, the specific version of the first target detection model can also be determined according to the application scene requirement.

The embodiment of the application is applied to a robot, namely, the type of the terminal equipment applied by the embodiment of the application is the robot. In some scenarios, the robot terminal running platform is x86, in which Openvino inference frameworks can be employed; in other scenarios, the robot terminal running platform is an inflight GPU, in which TensorRT inference frameworks can be employed. In Openvino reasoning frames, the original model corresponding to the first target detection model needs to be converted into a ONNX model and then into an IR model of Openvino to obtain the first target detection model. In TensorRT reasoning frames, the original model corresponding to the first target detection model is converted into a ONNX model and then into an engine model of TensorRT, so that the first target detection model can be obtained. And when the model format is converted, conversion is required according to the version requirements of different reasoning frameworks, so that a first target detection model matched with the corresponding reasoning framework is obtained.

In some examples, configuration information such as an activation function in the first target detection model may also be determined according to compatibility of a specific terminal reasoning framework corresponding to the embodiment of the present application, and so on. For example, in one scenario, the first target detection model is obtained based on YOLOv a4 detection models, and the embodiment of the present application is applied to a robot. While the terminal inference framework (such as openvino, tensorrt) of the robot does not support the mish activation function in YOLOv4, the activation function in the first object detection model needs to be modified from the mish function in the YOLOv4 detection model to the leak ReLU activation function and the Sigmoid activation function to apply the first object detection model to the terminal inference framework of the robot.

The specific training mode of the first target detection model can also be determined according to the actual scene.

For example, the first object detection model may be trained based on the object data set. In the training process, continuously iterating the first target detection model until a loss function value corresponding to an iteration detection result obtained in a certain iteration process is lower than a preset loss threshold value; or the iteration times reach a preset time threshold value, and the like, the first target detection model is determined to be trained, and the trained first target detection model is obtained.

The Anchor frame setting parameters include parameters of a size, an aspect ratio, and the like of an Anchor frame (Anchor) in the trained first target detection model. When the infrared image to be processed is detected through the trained first target detection model, information such as target category and offset of each anchor frame can be predicted, and then the target category and the area where the target object is located in the infrared image to be processed are determined according to the target category and offset corresponding to each anchor frame.

In the current target detection model, an anchor frame is usually determined based on experience aiming at a general target, so that the limitation is large, and the setting parameters of the anchor frame are unreasonable, so that missed detection can be caused, the recall rate of the model is reduced, and the accuracy of the target detection result is affected.

In the embodiment of the application, the labeling frames of at least two preset training images can be clustered, so that the setting parameters of the anchoring frames are determined according to the clustering result.

At this time, the preset training image may be acquired according to an actual detection scene. For example, if the embodiment of the application is used in the human body detection field, the preset training image may be an infrared image sample containing human body information, and accordingly, the setting parameters of the anchoring frame may be determined according to the size and the aspect ratio of the image area where the human body is located in the infrared image sample, so that the trained first target detection model can more efficiently and accurately identify the target of the human body according to the setting parameters of the anchoring frame.

The specific algorithm for clustering the labeling frames of at least two preset training images may be one of a K-Means (K-Means) clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, and the like. The particular clustering algorithm is not limited herein.

In some embodiments, before obtaining the target detection result of the infrared image to be processed through the trained first target detection model, the method further includes:

Step S201, clustering the at least two preset training images based on a preset clustering algorithm to obtain at least one cluster, wherein each cluster comprises at least two preset training images, the distance between the preset training images contained in the cluster meets a preset distance condition, the distance between any two preset training images is the intersection ratio between a labeling frame of a first preset training image and a labeling frame of a second preset training image, and the any two preset training images comprise the first preset training image and the second preset training image;

Step S202, determining the setting parameters of the anchoring frame according to each cluster.

In the embodiment of the present application, the intersection ratio (Intersection over Union, IOU) is a ratio of an intersection between a labeling frame of the first preset training image and a labeling frame of the second preset training image to a union between the labeling frames of the first preset training image and the second preset training image. The intersection between the labeling frame of the first preset training image and the labeling frame of the second preset training image is the overlapping area between the labeling frame of the first preset training image and the labeling frame of the second preset training image, the union between the labeling frame of the first preset training image and the labeling frame of the second preset training image is the sum of the first area and the second area minus the target overlapping area, the first area is the area of the labeling frame of the first preset training image, and the second area is the area of the labeling frame of the second preset training image.

In the embodiment of the present application, the preset clustering algorithm may be one of a K-Means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, and the like. The specific type of the preset clustering algorithm is not limited herein.

In the embodiment of the application, the correlation degree of the labeling frame between any two preset training images can be effectively evaluated through the intersection ratio, so that each preset training image can be effectively clustered, and the clustering result (namely the obtained cluster) can reflect the size distribution condition of the labeling frame in each preset training image.

For example, the obtained labeling frame corresponding to the center of the cluster may represent the average size of the labeling frames in the cluster, and the distance distribution information between the labeling frames of other preset training images in the cluster and the labeling frames corresponding to the center of the cluster may represent the difference condition of the labeling frames corresponding to the cluster. Based on the average size of the marking frames and the difference condition of the marking frames corresponding to each cluster, the value range of the size of the anchoring frame, the value range of the length-width ratio and the like can be determined, namely the setting parameters of the anchoring frame are determined.

In the embodiment of the application, when the at least two preset training images are clustered based on a preset clustering algorithm, the distance between any two preset training images is determined based on the cross ratio, so that each cluster capable of reflecting the size distribution condition of the marking frames in each preset training image can be obtained, and accurate anchoring frame setting parameters can be obtained based on the average size of the marking frames corresponding to each cluster and the information of the difference condition of the marking frames, and the first target detection model after training can more efficiently and accurately identify specific types of target objects according to the anchoring frame setting parameters.

Training the first target detection model based on a target data set to obtain a trained first target detection model, wherein the target data set comprises at least two preset training images, each preset training image corresponds to a labeling frame and a category label, each preset training image corresponds to at least one scene, and the target data set corresponds to at least two scenes.

In the training process for the first target detection model, the first target detection model can be iterated continuously until a loss function value corresponding to an iteration detection result obtained in a certain iteration process is lower than a preset loss threshold value; or the iteration times reach a preset time threshold value, and the like, the first target detection model is determined to be trained, and the trained first target detection model is obtained. The loss function of the first object detection model is not limited herein, and illustratively, the loss function of the first object detection model may be a cross entropy function, a mean square error function, or the like.

In the embodiment of the application, the scene can be determined according to actual requirements.

For example, if the embodiment of the present application is used in the field of human body detection, the scene may include at least two of indoor, outdoor, daytime, night scene, multiple crowd density, different situations of human body gestures, and the like.

Through the corresponding preset training images under different scenes, the data types of the target data set can be richer, the feature diversity in the target data set is improved, and therefore the model performance of the trained first target detection model can be effectively improved.

The target data set may be acquired in a variety of ways. For example, a third party data set may be obtained as the target data set, or the target data set may be acquired by itself. The labeling frame and the class label corresponding to each preset training image in the target data set can be obtained through manual labeling, or can be obtained through automatic labeling through a specified target detection algorithm.

In some embodiments, before training the first target detection model based on the target data set, obtaining a trained first target detection model further comprises:

Training the second target detection model through a third training data set to obtain a trained second target detection model, wherein the third training data set comprises at least two infrared image samples, and each infrared image sample is correspondingly provided with a labeling frame and a category label;

respectively carrying out target detection on at least two unlabeled infrared image samples through the trained second target detection model to obtain initial labeling frames and initial category labels respectively corresponding to the unlabeled infrared image samples;

Determining the labeling frame and the category label corresponding to each unlabeled infrared image sample according to the initial labeling frame and the initial category label corresponding to each unlabeled infrared image sample;

And determining the target data set according to the labeling frames and the class labels respectively corresponding to the infrared image samples and/or the labeling frames and the class labels respectively corresponding to the unlabeled infrared image samples.

The second target training model may be the same as or different from the first target training model.

In the embodiment of the application, a small amount of accurately marked infrared image samples can be obtained first, so that the second target training model is trained according to the small amount of accurately marked infrared image samples, and a trained second target detection model is obtained. And then, predicting a large number of unlabeled infrared image samples through the trained second target detection model to obtain labeling results of the unlabeled infrared image samples, and calibrating initial labeling frames and initial class labels respectively corresponding to the unlabeled infrared image samples in a manual calibration mode to determine accurate labeling frames and accurate class labels respectively corresponding to the unlabeled infrared image samples. Therefore, according to the embodiment of the application, the time cost and the labor cost of manual labeling can be greatly saved, the labeling efficiency is improved, and a large number of accurately labeled preset training images are efficiently obtained.

acquiring a first training data set, wherein the first training data set comprises at least two first training infrared images;

For each set of saturation adjustment parameters in at least one set of saturation adjustment parameters, respectively processing each first training infrared image in the first training data set according to the set of saturation adjustment parameters to obtain a second training infrared image corresponding to the set of saturation adjustment parameters;

And obtaining the target data set according to the second training infrared image corresponding to each group of saturation adjustment parameters.

In the embodiment of the application, the saturation adjustment parameter is used for adjusting the saturation of the first training infrared image. The degree of adjustment of the saturation of the first training infrared image by different sets of the saturation adjustment parameters may be different. Illustratively, there may be five sets of the saturation adjustment parameters. The first set of saturation adjustment parameters is used for reducing the saturation of the first training infrared image by 40%, the second set of saturation adjustment parameters is used for reducing the saturation of the first training infrared image by 20%, the third set of saturation adjustment parameters is used for increasing the saturation of the first training infrared image by 20%, the fourth set of saturation adjustment parameters is used for increasing the saturation of the first training infrared image by 40%, and the fifth set of saturation adjustment parameters is used for increasing the saturation of the first training infrared image by 60%. Of course, other setting manners of the saturation adjustment parameter are also possible.

In order to enable the trained first target detection model to accurately identify the characteristic information in the infrared image to be processed as much as possible, in the training stage, richer characteristic information can be input into the first target detection model.

In the embodiment of the application, at least one set of saturation adjustment parameters can be preconfigured for the acquired first training infrared image, and each set of saturation adjustment parameters in the at least one set of saturation adjustment parameters is processed for each first training infrared image in the first training data set to obtain a second training infrared image corresponding to the set of saturation adjustment parameters. At this time, training infrared images of the same content in different saturation states can be obtained, so that in the training process for the first target detection model, the first target detection model can be trained to effectively identify the content in the different saturation states, and the target identification capability of the first target detection model on the infrared images in the different saturation states is improved.

Acquiring a second training data set, wherein the second training data set comprises at least two third training infrared images;

For each set of brightness adjustment parameters in at least one set of brightness adjustment parameters, respectively processing each third training infrared image in the second training data set according to the set of brightness adjustment parameters to obtain a fourth training infrared image corresponding to the set of brightness adjustment parameters;

And obtaining the target data set according to the fourth training infrared image corresponding to each group of brightness adjustment parameters.

The second training data set may be the same as the first training data set, or may differ.

In the embodiment of the present application, the brightness adjustment parameter is used to adjust the brightness of the third training infrared image. The brightness of the third training infrared image may be adjusted to a different degree by different sets of brightness adjustment parameters. For example, there may be five sets of the brightness adjustment parameters. The first set of brightness adjustment parameters is used for reducing the brightness of the third training infrared image by 40%, the second set of brightness adjustment parameters is used for reducing the brightness of the third training infrared image by 20%, the third set of brightness adjustment parameters is used for increasing the brightness of the third training infrared image by 20%, the fourth set of brightness adjustment parameters is used for increasing the brightness of the third training infrared image by 40%, and the fifth set of brightness adjustment parameters is used for increasing the brightness of the first training infrared image by 60%. Of course, other setting manners of the brightness adjustment parameter are also possible.

In the embodiment of the application, at least one group of brightness adjustment parameters can be preconfigured for the acquired third training infrared image, and each group of brightness adjustment parameters in the at least one group of brightness adjustment parameters is processed according to the group of brightness adjustment parameters, so as to obtain a fourth training infrared image corresponding to the group of brightness adjustment parameters. At this time, the training infrared images of the same content in different brightness states can be obtained, so that in the training process for the first target detection model, the first target detection model can be trained to effectively identify the content in different brightness states, and the target identification capability of the first target detection model on the infrared images in different brightness states is improved.

It can be seen that, based on the above embodiment, the target data set may be obtained according to the second training infrared image corresponding to each set of saturation adjustment parameters and/or the fourth training infrared image corresponding to each set of brightness adjustment parameters. That is, the target data set may be obtained by saturation preprocessing and brightness preprocessing.

However, since the color information of the infrared image in practical application is less, changing the chromaticity in the infrared image may cause the target feature to disappear, resulting in unstable region where the missed detection or detected target object is located. Thus, in some embodiments, the target data set is not obtained by chrominance preprocessing, and in actual operation, the preprocessing parameters for chrominance are set to be unchanged when configuring the relevant training parameters for the first target detection model.

Step S103, determining a target object in the infrared image to be processed according to the target detection result.

The category of the target object can be determined according to an actual application scene. By way of example, the target object may be a human body, a particular type of animal, or the like.

In the embodiment of the application, the target object in the infrared image to be processed can be directly determined according to the target detection result. In addition, the target object in the infrared image to be processed may also be determined by combining the target detection result and other detection information (for example, a reference detection result for a visible light image corresponding to the infrared image to be processed, etc.).

In some embodiments, the determining the target object in the infrared image to be processed according to the target detection result includes:

obtaining a visible light image corresponding to the infrared image to be processed;

Performing target detection on the visible light image through a third target detection model to obtain a reference detection result aiming at the visible light image;

and determining a target object in the infrared image to be processed according to the target detection result and the reference detection result.

According to the embodiment of the application, the target object in the scene corresponding to the infrared image to be processed can be determined by combining the infrared image characteristics in the infrared image to be processed and the visible light image characteristics in the visible light image, so that the target object in the scene corresponding to the infrared image to be processed can be detected more comprehensively and accurately by combining the multi-dimensional characteristic data, and the accuracy of target detection is improved.

By way of example, if the embodiment of the application is applied to the field of human body detection, the target detection result and the reference detection result are combined, so that the identification of the monkey and other human-like animals or human-shaped display boards as human bodies can be effectively avoided, and the accuracy of human body detection is obviously improved.

In the embodiment of the application, the infrared image to be processed can be acquired; then, obtaining a target detection result of the infrared image to be processed through a first target detection model after training, wherein the setting parameters of an anchor frame in the first target detection model after training are obtained by clustering the label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame, and a more accurate target detection result is obtained, so that a target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the embodiment of the application, the infrared image can be accurately and efficiently identified, and the accuracy of identifying the infrared image is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the above-mentioned object detection method according to the above embodiment, fig. 3 shows a block diagram of an object detection device according to an embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.

Referring to fig. 3, the object detection device 3 includes:

an acquisition module 301, configured to acquire an infrared image to be processed;

The processing module 302 is configured to obtain a target detection result of the infrared image to be processed through a first target detection model after training, where an anchor frame setting parameter in the first target detection model after training is obtained by clustering labeling frames of at least two preset training images;

and the determining module 303 is configured to determine a target object in the infrared image to be processed according to the target detection result.

Optionally, the object detection device 3 further includes:

The clustering module is used for clustering the at least two preset training images based on a preset clustering algorithm to obtain at least one cluster, wherein each cluster comprises at least two preset training images, the distance between the preset training images contained in the cluster meets a preset distance condition, the distance between any two preset training images is the intersection ratio between a labeling frame of a first preset training image and a labeling frame of a second preset training image, and the any two preset training images comprise the first preset training image and the second preset training image;

And the second determining module is used for determining the setting parameters of the anchoring frame according to each cluster.

Optionally, the acquiring module 301 includes:

the first acquisition unit is used for acquiring an original infrared image;

And the processing unit is used for carrying out median filtering processing on the original infrared image to obtain the infrared image to be processed.

Optionally, the object detection device 3 further includes:

The training module is used for training the first target detection model based on a target data set to obtain a trained first target detection model, the target data set comprises at least two preset training images, each preset training image corresponds to a labeling frame and a category label, each preset training image corresponds to at least one scene, and the target data set corresponds to at least two scenes.

Optionally, the object detection device 3 further includes:

The second acquisition module is used for acquiring a first training data set, and the first training data set comprises at least two first training infrared images;

The second processing module is used for processing each first training infrared image in the first training data set according to each group of saturation adjustment parameters in at least one group of saturation adjustment parameters to obtain a second training infrared image corresponding to the group of saturation adjustment parameters;

And the third processing module is used for obtaining the target data set according to the second training infrared image corresponding to each group of saturation adjustment parameters.

Optionally, the object detection device 3 further includes:

The third acquisition module is used for acquiring a second training data set, and the second training data set comprises at least two third training infrared images;

The fourth processing module is used for respectively processing the third training infrared images in the second training data set according to each group of brightness adjustment parameters in at least one group of brightness adjustment parameters to obtain a fourth training infrared image corresponding to the group of brightness adjustment parameters;

and the fifth processing module is used for obtaining the target data set according to the fourth training infrared image corresponding to each group of brightness adjustment parameters.

Optionally, the determining module 303 specifically includes:

The second acquisition unit is used for acquiring a visible light image corresponding to the infrared image to be processed;

The detection unit is used for carrying out target detection on the visible light image through a third target detection model to obtain a reference detection result aiming at the visible light image;

And the determining unit is used for determining the target object in the infrared image to be processed according to the target detection result and the reference detection result.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment includes: at least one processor 40 (only one is shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the processor 40 implementing the steps in any of the various target detection method embodiments described above when executing the computer program 42.

The terminal device 4 may be a robot, a server, a mobile phone, a wearable device, an Augmented Reality (AR)/Virtual Reality (VR) device, a desktop computer, a notebook computer, a desktop computer, a palm computer, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal device 4 and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, such as may also include input devices, output devices, network access devices, etc. The input device may include a keyboard, a touch pad, a fingerprint collection sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, a camera, and the like, and the output device may include a display, a speaker, and the like.

The Processor 40 may be a central processing unit (Central Processing Unit, CPU), the Processor 40 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4 in other embodiments, for example, a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 4. Further, the memory 41 may include both the internal storage unit and the external storage device of the terminal device 4. The memory 41 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, other programs, and the like, such as program codes of the computer programs. The above-described memory 41 may also be used to temporarily store data that has been output or is to be output.

In addition, although not shown, the terminal device 4 may further include a network connection module, such as a bluetooth module Wi-Fi module, a cellular network module, and so on, which will not be described herein.

In the embodiment of the present application, the processor 40 may acquire the infrared image to be processed when executing the computer program 42 to implement the steps in any of the respective target detection method embodiments described above; then, obtaining a target detection result of the infrared image to be processed through a first target detection model after training, wherein the setting parameters of an anchor frame in the first target detection model after training are obtained by clustering the label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame, and a more accurate target detection result is obtained, so that a target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the embodiment of the application, the infrared image can be accurately and efficiently identified, and the accuracy of identifying the infrared image is improved.

The embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of modules or elements described above is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of detecting an object, comprising:

Acquiring an infrared image to be processed;

obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the setting parameters of an anchor frame in the trained first target detection model are obtained by clustering the intersection ratio of marking frames of at least two preset training images;

Determining a target object in the infrared image to be processed according to the target detection result;

Before the target detection result of the infrared image to be processed is obtained through the trained first target detection model, the method further comprises the following steps:

Clustering the at least two preset training images based on a preset clustering algorithm to obtain at least one cluster, wherein each cluster comprises at least two preset training images, the distance between the preset training images contained in the cluster accords with a preset distance condition, the distance between any two preset training images is the intersection ratio between a labeling frame of a first preset training image and a labeling frame of a second preset training image, and the any two preset training images comprise the first preset training image and the second preset training image;

and determining the setting parameters of the anchoring frame according to each cluster.

2. The method for detecting an object according to claim 1, wherein the acquiring an infrared image to be processed includes:

Acquiring an original infrared image;

3. The method for detecting an object according to claim 1, further comprising, before obtaining the object detection result of the infrared image to be processed by the first object detection model after training:

4. The method of claim 3, further comprising, prior to training the first object detection model based on the object data set to obtain a trained first object detection model:

5. The method of claim 3, further comprising, prior to training the first object detection model based on the object data set to obtain a trained first object detection model:

6. The method according to any one of claims 1 to 5, wherein determining the target object in the infrared image to be processed according to the target detection result includes:

7. An object detection apparatus, comprising:

The acquisition module is used for acquiring an infrared image to be processed;

The processing module is used for obtaining a target detection result of the infrared image to be processed through a first target detection model after training, wherein the setting parameters of the anchoring frames in the first target detection model after training are obtained by clustering the intersection ratio of the marking frames of at least two preset training images;

the determining module is used for determining a target object in the infrared image to be processed according to the target detection result;

The object detection device further includes:

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the object detection method according to any of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the object detection method according to any one of claims 1 to 6.