CN112989924A

CN112989924A - Target detection method, target detection device and terminal equipment

Info

Publication number: CN112989924A
Application number: CN202110104667.6A
Authority: CN
Inventors: 赵雨佳; 郭奎; 程骏; 庞建新
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-06-18
Anticipated expiration: 2041-01-26
Also published as: CN112989924B

Abstract

The application provides a target detection method, which comprises the following steps: acquiring an infrared image to be processed; obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein parameters of an anchoring frame in the trained first target detection model are set to be obtained by clustering label frames of at least two preset training images; and determining a target object in the infrared image to be processed according to the target detection result. By the method, the accuracy of target identification of the infrared image can be improved.

Description

Target detection method, target detection device and terminal equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target detection method, a target detection apparatus, a terminal device, and a computer-readable storage medium.

Background

The infrared thermal imaging can be applied to various detection scenes due to the characteristics of no influence of illumination, no influence of background and night visibility, so that the functions of patrol, temperature measurement and the like can be completed.

However, due to the unique imaging characteristics of infrared thermal imaging, information identification in infrared images is often difficult. For example, the infrared image lacks color information in the visible light image, and characteristic information such as texture information is also obviously less than that of the visible light image, so that in a target detection scene such as human body detection through the infrared image, the difficulty of accurately identifying a target object in the infrared image is high, and the accuracy is low.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, a terminal device and a computer readable storage medium, which can improve the accuracy of target identification on an infrared image.

In a first aspect, an embodiment of the present application provides a target detection method, including:

acquiring an infrared image to be processed;

obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein parameters of an anchoring frame in the trained first target detection model are set to be obtained by clustering label frames of at least two preset training images;

and determining a target object in the infrared image to be processed according to the target detection result.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

the acquisition module is used for acquiring an infrared image to be processed;

the processing module is used for obtaining a target detection result of the infrared image to be processed through a trained first target detection model, and the anchor frame setting parameters in the trained first target detection model are obtained by clustering the labeling frames of at least two preset training images;

and the determining module is used for determining a target object in the infrared image to be processed according to the target detection result.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the object detection method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the object detection method as described in the first aspect.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the object detection method described in the first aspect.

Compared with the prior art, the embodiment of the application has the advantages that: in the embodiment of the application, the infrared image to be processed can be obtained; then, obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the anchor frame setting parameters in the trained first target detection model are obtained by clustering label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame to obtain a more accurate target detection result, and thus, the target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the method and the device, the target identification can be accurately and efficiently carried out on the infrared image, and the accuracy of the target identification of the infrared image is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a target detection method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of another method for detecting an object according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Specifically, fig. 1 shows a flowchart of a target detection method provided in an embodiment of the present application, where the target detection method may be applied to a terminal device.

For example, the terminal device may be a robot, a server, a desktop computer, a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, a super-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or the like. The embodiment of the present application does not set any limit to the specific type of the terminal device.

As shown in fig. 1, the target detection method may include:

and step S101, acquiring an infrared image to be processed.

In this embodiment of the application, for example, the to-be-processed infrared image may be a local image stored in the terminal device in advance, or the to-be-processed infrared image may also be transmitted to the terminal device by another terminal in communication connection with the terminal device. The infrared image to be processed can be an image acquired by a designated infrared camera or a video frame extracted from a designated video. The specific source of the infrared image to be processed is not limited herein.

The acquisition mode of the infrared image to be processed can be various. For example, the image selected by the user may be determined to be the to-be-processed infrared image from the images displayed by the terminal device according to a specified operation of the user on the terminal device. Or, an infrared image currently captured by an infrared camera externally connected to the terminal device may be used as the to-be-processed infrared image. The specific acquisition mode of the infrared image to be processed is not limited herein.

The specific image form of the infrared image to be processed can be various. In some embodiments, the image format of the infrared image to be processed may be a specified format. For example, the pixel values of the pixel points in the infrared image to be processed may be mapped to a designated value range in advance. For example, each pixel value of the original image corresponding to the to-be-processed infrared image may be divided by 127.5 and then subtracted by 1 to obtain the to-be-processed infrared image, and at this time, the value range of the pixel value of the to-be-processed infrared image is linearly scaled to [ -1, 1], so that the data scale is reduced, and the subsequent image processing is facilitated. Or the size of the infrared image to be processed is a preset size.

In addition, in some examples, an original infrared image may be obtained, and at least one of noise reduction, sharpening, saturation adjustment, brightness adjustment and the like may be performed on the original infrared image, so as to obtain the to-be-processed infrared image, so as to improve a feature expression capability in the to-be-processed infrared image, and thus improve subsequent target detection accuracy.

In some embodiments, the acquiring the infrared image to be processed includes:

acquiring an original infrared image;

and carrying out median filtering processing on the original infrared image to obtain the infrared image to be processed.

The median filtering is a nonlinear smoothing technology, which can filter out noise in the original infrared image and ensure that edge information is not lost. In the target detection aiming at the infrared image, the edge information of the area where the target is located is an important basis for feature extraction in the target identification process, and the median filtering processing can perfectly maintain the edge information of the target under the condition of effectively filtering the noise of the original infrared image, so that the obtained infrared image to be processed can lay a good data foundation for the subsequent target detection.

It should be noted that, in the embodiment of the present application, in addition to the median filtering process, other processes such as sharpening, saturation adjustment, brightness adjustment, and the like may be performed on the original infrared image, and the preprocessing type of the original infrared image is not limited to the median filtering process.

Step S102, obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the anchor frame setting parameters in the trained first target detection model are obtained by clustering the labeling frames of at least two preset training images.

In the embodiment of the present application, the first object detection model may be used to detect an object class in an image. For example, the first target detection model may be a YOLOv4 model, a Faster R-CNN model, a RefineDet model, or other current target detection models, or may be other target detection models that appear subsequently. The specific type of the first object detection model is not limited herein. The target detection result may indicate whether the corresponding to-be-processed infrared image includes a target object, and if the corresponding to-be-processed infrared image includes the target object, indicate a target category of the included target object, an area where the target object is located, and the like.

In one example, the specific terminal inference framework to which the first object detection model is applied may be determined according to a specific application scenario. Accordingly, the specific version of the first target detection model may also be determined according to the application scenario requirements.

For example, the embodiment of the present application is applied to a robot, that is, the type of the terminal device applied to the embodiment of the present application is a robot. In some scenarios, the robot terminal operation platform is x86, and in this robot terminal operation platform, an openvion inference framework may be adopted; in other scenarios, the robot terminal operating platform is an english giant GPU, and a TensorRT inference framework may be employed in the robot terminal operating platform. In the Openvino inference framework, the original model corresponding to the first target detection model needs to be converted into an ONNX model first, and then into an IR model of Openvino, so as to obtain the first target detection model. In the TensorRT reasoning framework, the original model corresponding to the first target detection model is firstly converted into an ONNX model and then converted into an engine model of TensorRT, so that the first target detection model can be obtained. And when the model format is converted, the conversion is carried out according to the version requirements of different inference frames, so that the first target detection model matched with the corresponding inference frame is obtained.

In some examples, configuration information such as an activation function in the first target detection model may also be determined according to compatibility of a specific terminal inference framework corresponding to the embodiment of the present application. For example, in one scenario, the first target detection model is obtained based on the YOLOv4 detection model, and the embodiment of the present application is applied to a robot. And the terminal inference framework (such as openvino, tensorrt) of the robot does not support the mish activation function in YOLOv4, the activation function in the first target detection model needs to be modified from the mish function in the YOLOv4 detection model into a leak ReLU activation function and a Sigmoid activation function, so that the first target detection model can be applied to the robot terminal inference framework.

The specific training mode of the first target detection model can also be determined according to an actual scene.

For example, a first target detection model may be trained based on a target dataset. Continuously iterating the first target detection model in a training process until a loss function value corresponding to an iteration detection result obtained in a certain iteration process is lower than a preset loss threshold value; or, when the iteration number reaches a preset number threshold, and the like, it is determined that the training of the first target detection model is completed, and the trained first target detection model is obtained.

The Anchor box setting parameters comprise the size, the aspect ratio and other parameters of an Anchor box (Anchor) in the trained first target detection model. When the trained first target detection model is used for detecting the infrared image to be processed, information such as target type, offset and the like of each anchoring frame can be predicted, and then the target type and the area where a target object is located in the infrared image to be processed are determined according to the target type and the offset which are respectively corresponding to each anchoring frame.

In the current target detection model, an anchor frame is usually determined based on experience aiming at a general target, and the limitation is large, so that the setting parameters of the anchor frame are unreasonable, the omission is caused, the model recall rate is reduced, and the accuracy of the target detection result is influenced.

In the embodiment of the application, clustering can be performed on the labeling frames of at least two preset training images, so that the setting parameters of the anchoring frame are determined according to the clustering result.

At this time, the preset training image can be acquired according to an actual detection scene. For example, if the embodiment of the application is used in the field of human body detection, the preset training image may be an infrared image sample containing human body information, and accordingly, the anchor frame setting parameter may be determined according to the size, the aspect ratio, and the like of an image region where a human body is located in the infrared image sample, so that the trained first target detection model can more efficiently and accurately identify the target of the human body according to the anchor frame setting parameter.

The specific algorithm for clustering the labeling boxes of the at least two preset training images may be one of a K-Means (K-Means) clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, and the like. The particular clustering algorithm is not limited herein.

In some embodiments, before obtaining the target detection result of the infrared image to be processed through the trained first target detection model, the method further includes:

step S201, aiming at the at least two preset training images, clustering the at least two preset training images based on a preset clustering algorithm to obtain at least one cluster, wherein each cluster comprises at least two preset training images, the distance between the preset training images contained in the cluster meets a preset distance condition, the distance between any two preset training images is the intersection and parallel ratio between a mark frame of a first preset training image and a mark frame of a second preset training image, and the any two preset training images comprise the first preset training image and the second preset training image;

step S202, according to each cluster, determining the setting parameters of the anchoring frame.

In an embodiment of the present application, the Intersection Over Union (IOU) is a ratio of an Intersection between a labeling frame of a first preset training image and a labeling frame of a second preset training image to a Union between the labeling frame of the first preset training image and the labeling frame of the second preset training image. The intersection between the labeling frame of the first preset training image and the labeling frame of the second preset training image is the overlapping area between the labeling frame of the first preset training image and the labeling frame of the second preset training image, the union between the labeling frame of the first preset training image and the labeling frame of the second preset training image is the sum of the first area and the second area, and the area behind the target overlapping area is subtracted, the first area is the area of the labeling frame of the first preset training image, and the second area is the area of the labeling frame of the second preset training image.

In the embodiment of the present application, the preset clustering algorithm may be, for example, one of a K-Means clustering algorithm, a mean shift clustering algorithm, a density-based clustering algorithm, and the like. The specific type of the predetermined clustering algorithm is not limited herein.

In the embodiment of the application, the relevance of the labeling frame between any two preset training images can be effectively evaluated through the intersection ratio, so that each preset training image can be effectively clustered, and the clustering result (namely the obtained cluster) can reflect the size distribution condition of the labeling frame in each preset training image.

For example, the obtained labeling frame corresponding to the center of the cluster may represent an average size of the labeling frames in the cluster, and the distance distribution information between the labeling frames of other preset training images in the cluster and the labeling frames corresponding to the center of the cluster may represent a difference condition of the labeling frames corresponding to the cluster. Based on the average size of the labeling boxes and the difference between the labeling boxes respectively corresponding to the clusters, the value range of the size of the anchor box, the value range of the aspect ratio, and the like can be determined, that is, the setting parameters of the anchor box are determined.

It can be seen that, in the embodiment of the present application, when clustering is performed on the at least two preset training images based on a preset clustering algorithm, a distance between any two preset training images is determined based on a cross-to-parallel ratio, and each cluster capable of reflecting a size distribution condition of a label frame in each preset training image can be obtained, so that accurate anchor frame setting parameters can be obtained based on information such as an average size of the label frame and a difference condition of the label frame corresponding to each cluster, and the trained first target detection model can more efficiently and accurately identify a target object of a specific type according to the anchor frame setting parameters.

the method comprises the steps of training a first target detection model based on a target data set to obtain the trained first target detection model, wherein the target data set comprises at least two preset training images, each preset training image corresponds to a labeling frame and a category label, each preset training image corresponds to at least one scene, and the target data set corresponds to at least two scenes.

In the training process for the first target detection model, the first target detection model may be continuously iterated until a loss function value corresponding to an iterative detection result obtained in a certain iterative process is lower than a preset loss threshold; or, when the iteration number reaches a preset number threshold, and the like, it is determined that the training of the first target detection model is completed, and the trained first target detection model is obtained. The loss function of the first target detection model is not limited herein, and the loss function of the first target detection model may be, for example, a cross entropy function, a mean square error function, or the like.

In the embodiment of the application, the scene can be determined according to actual requirements.

For example, if the embodiment of the present application is used in the field of human body detection, the scenes may include at least two of indoor scenes, outdoor scenes, daytime scenes, night scenes, various crowd density degrees, different situations of human body postures, and the like.

Through collecting corresponding preset training images in different scenes, the data types of the target data set are richer, the characteristic diversity in the target data set is improved, and therefore the model performance of the trained first target detection model can be effectively improved.

The target data set may be acquired in a variety of ways. For example, a third-party data set may be acquired as the target data set, or the target data set may be acquired by itself. The labeling boxes and category labels corresponding to the preset training images in the target data set can be obtained through manual labeling or can be obtained through automatic labeling of a specified target detection algorithm.

In some embodiments, before training the first target detection model based on the target data set and obtaining the trained first target detection model, the method further includes:

training a second target detection model through a third training data set to obtain the trained second target detection model, wherein the third training data set comprises at least two infrared image samples, and each infrared image sample corresponds to a labeling frame and a category label;

respectively carrying out target detection on at least two unmarked infrared image samples through the trained second target detection model to obtain an initial marking frame and an initial category label which are respectively corresponding to each unmarked infrared image sample;

determining an initial labeling frame and an initial category label which correspond to each unmarked infrared image sample according to the initial labeling frame and the initial category label which correspond to each unmarked infrared image sample;

and determining the target data set according to the labeling frame and the category label respectively corresponding to each infrared image sample and/or the labeling frame and the category label respectively corresponding to each unmarked infrared image sample.

The second target training model may be the same as or different from the first target training model.

In the embodiment of the application, a small amount of accurately labeled infrared image samples can be obtained at first, so that the second target training model is trained according to the small amount of accurately labeled infrared image samples, and the trained second target detection model is obtained. Then, a large number of unmarked infrared image samples can be predicted through the trained second target detection model, the marking results of the unmarked infrared image samples are obtained, and then the initial marking frames and the initial category labels corresponding to the unmarked infrared image samples are corrected through a manual correction and peer-to-peer mode, so that the accurate marking frames and the accurate category labels corresponding to the unmarked infrared image samples are determined. Therefore, through the embodiment of the application, the time cost and the labor cost of manual labeling can be greatly saved, the labeling efficiency is improved, and a large number of preset training images which are accurately labeled are efficiently obtained.

acquiring a first training data set, wherein the first training data set comprises at least two first training infrared images;

aiming at each group of saturation adjustment parameters in at least one group of saturation adjustment parameters, respectively processing each first training infrared image in the first training data set according to the group of saturation adjustment parameters to obtain a second training infrared image corresponding to the group of saturation adjustment parameters;

and obtaining the target data set according to the second training infrared image corresponding to each group of saturation adjusting parameters.

In an embodiment of the application, the saturation adjustment parameter is used to adjust the saturation of the first training infrared image. The degree of adjustment of the saturation of the first training infrared image by the different sets of saturation adjustment parameters may be different. Illustratively, the saturation adjustment parameters may have five groups. The first group of saturation adjustment parameters are used for reducing the saturation of the first training infrared image by 40%, the second group of saturation adjustment parameters are used for reducing the saturation of the first training infrared image by 20%, the third group of saturation adjustment parameters are used for increasing the saturation of the first training infrared image by 20%, the fourth group of saturation adjustment parameters are used for increasing the saturation of the first training infrared image by 40%, and the fifth group of saturation adjustment parameters are used for increasing the saturation of the first training infrared image by 60%. Of course, the saturation adjustment parameter may have other setting manners.

In order to enable the trained first target detection model to recognize the feature information in the infrared image to be processed as accurately as possible, in the training stage, relatively rich feature information can be input into the first target detection model.

In the embodiment of the present application, at least one set of saturation adjustment parameters may be configured in advance for the acquired first training infrared image, and each set of saturation adjustment parameter in the at least one set of saturation adjustment parameter is processed for each first training infrared image in the first training data set, so as to obtain a second training infrared image corresponding to the set of saturation adjustment parameters. At this time, training infrared images of the same content in different saturation states can be acquired, so that in the training process for the first target detection model, the first target detection model can be trained to effectively identify the content in different saturation states, and the target identification capability of the first target detection model for the infrared images in different saturation states is improved.

acquiring a second training data set, wherein the second training data set comprises at least two third training infrared images;

aiming at each group of brightness adjustment parameters in at least one group of brightness adjustment parameters, respectively processing each third training infrared image in the second training data set according to the group of brightness adjustment parameters to obtain a fourth training infrared image corresponding to the group of brightness adjustment parameters;

and obtaining the target data set according to the fourth training infrared image corresponding to each group of brightness adjustment parameters.

The second training data set may be the same as the first training data set or may differ.

In an embodiment of the application, the brightness adjustment parameter is used to adjust the brightness of the third training infrared image. The adjustment degree of the brightness of the third training infrared image by different groups of brightness adjustment parameters can be different. Illustratively, the brightness adjustment parameters may have five groups. The first group of brightness adjusting parameters are used for reducing the brightness of the third training infrared image by 40%, the second group of brightness adjusting parameters are used for reducing the brightness of the third training infrared image by 20%, the third group of brightness adjusting parameters are used for improving the brightness of the third training infrared image by 20%, the fourth group of brightness adjusting parameters are used for improving the brightness of the third training infrared image by 40%, and the fifth group of brightness adjusting parameters are used for improving the brightness of the first training infrared image by 60%. Of course, the brightness adjustment parameter may be set in other manners.

In the embodiment of the present application, at least one group of brightness adjustment parameters may be configured in advance for the acquired third training infrared image, and each third training infrared image in the second training data set is processed according to the group of brightness adjustment parameters, so as to obtain a fourth training infrared image corresponding to the group of brightness adjustment parameters. At this time, training infrared images of the same content in different brightness states can be acquired, so that in the training process for the first target detection model, the first target detection model can be trained to effectively identify the content in different brightness states, and the target identification capability of the first target detection model for the infrared images in different brightness states is improved.

As can be seen, based on the above embodiment, the target data set may be obtained according to the second training infrared image corresponding to each set of saturation adjustment parameters, and/or the fourth training infrared image corresponding to each set of brightness adjustment parameters. That is, the target data set may be obtained by saturation preprocessing and brightness preprocessing.

However, since the color information of the infrared image in practical application is less, changing the chromaticity in the infrared image may cause the target feature to disappear, resulting in missed detection or unstable area where the detected target object is located. Thus, in some embodiments, the target data set is not obtained by chroma pre-processing, and in actual operation, the pre-processing parameters for chroma are set to be unchanged while configuring the relevant training parameters for the first target detection model.

And S103, determining a target object in the infrared image to be processed according to the target detection result.

Wherein the category of the target object can be determined according to an actual application scene. Illustratively, the target object may be a human body, a particular type of animal, or the like.

In the embodiment of the application, the target object in the infrared image to be processed can be directly determined according to the target detection result. In addition, the target detection result and other detection information (for example, a reference detection result for a visible light image corresponding to the infrared image to be processed, etc.) may also be combined to determine the target object in the infrared image to be processed.

In some embodiments, the determining a target object in the to-be-processed infrared image according to the target detection result includes:

acquiring a visible light image corresponding to the infrared image to be processed;

performing target detection on the visible light image through a third target detection model to obtain a reference detection result aiming at the visible light image;

and determining a target object in the infrared image to be processed according to the target detection result and the reference detection result.

In the embodiment of the application, the target object in the scene corresponding to the infrared image to be processed can be determined by combining the infrared image features in the infrared image to be processed and the visible light image features in the visible light image, so that the target object in the scene corresponding to the infrared image to be processed can be comprehensively and accurately detected by combining multi-dimensional feature data, and the accuracy of target detection is improved.

Exemplarily, if this application embodiment is applied to the human body detection field, then, combine target detection result with refer to the testing result, can effectively avoid discerning humanoid animals such as monkey or humanoid show tablet as the human body for human body detection's accuracy obviously promotes.

In the embodiment of the application, the infrared image to be processed can be obtained; then, obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the anchor frame setting parameters in the trained first target detection model are obtained by clustering label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame to obtain a more accurate target detection result, and thus, the target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the method and the device, the target identification can be accurately and efficiently carried out on the infrared image, and the accuracy of the target identification of the infrared image is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 3 shows a block diagram of a target detection device provided in the embodiment of the present application, which corresponds to the above-described target detection method in the above embodiment, and only shows the relevant parts in the embodiment of the present application for convenience of description.

Referring to fig. 3, the object detection device 3 includes:

an obtaining module 301, configured to obtain an infrared image to be processed;

a processing module 302, configured to obtain a target detection result of the infrared image to be processed through a trained first target detection model, where an anchor frame setting parameter in the trained first target detection model is obtained by clustering label frames of at least two preset training images;

a determining module 303, configured to determine, according to the target detection result, a target object in the to-be-processed infrared image.

Optionally, the target detection apparatus 3 further includes:

a clustering module, configured to cluster the at least two preset training images based on a preset clustering algorithm for the at least two preset training images to obtain at least one cluster, where each cluster includes at least two preset training images, a distance between the preset training images included in the cluster meets a preset distance condition, a distance between any two preset training images is an intersection-parallel ratio between a label frame of a first preset training image and a label frame of a second preset training image, and the any two preset training images include the first preset training image and the second preset training image;

and the second determining module is used for determining the setting parameters of the anchoring frame according to each cluster.

Optionally, the obtaining module 301 includes:

the first acquisition unit is used for acquiring an original infrared image;

and the processing unit is used for carrying out median filtering processing on the original infrared image to obtain the infrared image to be processed.

Optionally, the target detection apparatus 3 further includes:

the training module is used for training a first target detection model based on a target data set to obtain the trained first target detection model, the target data set comprises at least two preset training images, each preset training image corresponds to a labeling frame and a category label, each preset training image corresponds to at least one scene, and the target data set corresponds to at least two scenes.

Optionally, the target detection apparatus 3 further includes:

the second acquisition module is used for acquiring a first training data set, and the first training data set comprises at least two first training infrared images;

a second processing module, configured to, for each saturation adjustment parameter of at least one set of saturation adjustment parameters, respectively process each first training infrared image in the first training data set according to the set of saturation adjustment parameters, and obtain a second training infrared image corresponding to the set of saturation adjustment parameters;

and the third processing module is used for obtaining the target data set according to the second training infrared image corresponding to each group of saturation adjusting parameters.

Optionally, the target detection apparatus 3 further includes:

the third acquisition module is used for acquiring a second training data set, and the second training data set comprises at least two third training infrared images;

a fourth processing module, configured to, for each group of brightness adjustment parameters in the at least one group of brightness adjustment parameters, respectively process each third training infrared image in the second training data set according to the group of brightness adjustment parameters, and obtain a fourth training infrared image corresponding to the group of brightness adjustment parameters;

and the fifth processing module is used for obtaining the target data set according to the fourth training infrared image corresponding to each group of brightness adjusting parameters.

Optionally, the determining module 303 specifically includes:

the second acquisition unit is used for acquiring the visible light image corresponding to the infrared image to be processed;

the detection unit is used for carrying out target detection on the visible light image through a third target detection model to obtain a reference detection result aiming at the visible light image;

and the determining unit is used for determining a target object in the infrared image to be processed according to the target detection result and the reference detection result.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 4, the terminal device 4 of this embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, wherein the steps of any of the above-described object detection method embodiments are implemented when the computer program 42 is executed by the processor 40.

The terminal device 4 may be a robot, a server, a mobile phone, a wearable device, an Augmented Reality (AR)/Virtual Reality (VR) device, a desktop computer, a notebook, a desktop computer, a palmtop computer, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of the terminal device 4, and does not constitute a limitation of the terminal device 4, and may include more or less components than those shown, or combine some of the components, or different components, such as may also include input devices, output devices, network access devices, etc. The input device may include a keyboard, a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, a camera, and the like, and the output device may include a display, a speaker, and the like.

The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. In other embodiments, the memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the terminal device 4. Further, the memory 41 may include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer programs. The above-mentioned memory 41 may also be used to temporarily store data that has been output or is to be output.

In addition, although not shown, the terminal device 4 may further include a network connection module, such as a bluetooth module Wi-Fi module, a cellular network module, and the like, which is not described herein again.

In this embodiment, when the processor 40 executes the computer program 42 to implement the steps in any of the above-mentioned target detection method embodiments, an infrared image to be processed may be acquired; then, obtaining a target detection result of the infrared image to be processed through a trained first target detection model, wherein the anchor frame setting parameters in the trained first target detection model are obtained by clustering label frames of at least two preset training images; at this time, a more reasonable anchoring frame can be predetermined, so that the trained first target detection model can perform target detection on the infrared image to be processed according to the more reasonable anchoring frame to obtain a more accurate target detection result, and thus, the target object in the infrared image to be processed can be determined according to the target detection result. Therefore, through the method and the device, the target identification can be accurately and efficiently carried out on the infrared image, and the accuracy of the target identification of the infrared image is improved.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the above modules or units is only one logical function division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of object detection, comprising:

acquiring an infrared image to be processed;

2. The target detection method of claim 1, before obtaining the target detection result of the infrared image to be processed through the trained first target detection model, further comprising:

clustering the at least two preset training images based on a preset clustering algorithm aiming at the at least two preset training images to obtain at least one cluster, wherein each cluster comprises at least two preset training images, the distance between the preset training images contained in the cluster meets a preset distance condition, the distance between any two preset training images is the intersection and parallel ratio between a labeling frame of a first preset training image and a labeling frame of a second preset training image, and the any two preset training images comprise the first preset training image and the second preset training image;

and determining the setting parameters of the anchoring frame according to each cluster.

3. The target detection method of claim 1, wherein said acquiring the infrared image to be processed comprises:

acquiring an original infrared image;

4. The target detection method of claim 1, before obtaining the target detection result of the infrared image to be processed through the trained first target detection model, further comprising:

5. The method of claim 4, wherein before training the first object detection model based on the object data set to obtain the trained first object detection model, further comprising:

6. The method of claim 4, wherein before training the first object detection model based on the object data set to obtain the trained first object detection model, further comprising:

7. The target detection method of any one of claims 1 to 6, wherein the determining the target object in the infrared image to be processed according to the target detection result comprises:

8. An object detection device, comprising:

the acquisition module is used for acquiring an infrared image to be processed;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the object detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the object detection method according to any one of claims 1 to 7.