CN110956118B

CN110956118B - Target object detection method and device, storage medium and electronic device

Info

Publication number: CN110956118B
Application number: CN201911175584.5A
Authority: CN
Inventors: 杨志强; 卢伍平; 湛杰
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Huagan Technology Co ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2023-04-18
Anticipated expiration: 2039-11-26
Also published as: CN110956118A

Abstract

The invention discloses a target object detection method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: under the condition that a first object which is higher than a first threshold value in temperature and has a preset shape is detected from a first image, acquiring a first area where the first object is located and a first size of the first object, wherein the first image is an image obtained by shooting a target area of an infrared detector; acquiring a second area in which a target part of a second object is located and a second size of the target part when the second object is detected from a second image, wherein the second image is a visual image corresponding to the first image, and the second object is a human-shaped object with a predetermined gesture; and under the condition that the distance between the first area and the second area is smaller than a target distance threshold value and the ratio of the first size to the second size is within a target ratio range, determining that the first object is the target object.

Description

Target object detection method and device, storage medium and electronic device

Technical Field

The invention relates to the field of computers, in particular to a target object detection method and device, a storage medium and an electronic device.

Background

At present, in some public places and offices, a "no smoking" in a slogan form is generally used for reminding, or a detection device (for example, a voice reminder) based on smoke detection is used for carrying out voice reminding by detecting the smoke condition of the surrounding environment.

However, the mode of using the 'no smoking' mark depends on the consciousness of the smoking crowd, and the reminding effect cannot be guaranteed. The mode of adopting detection device based on smog detects can only accurately detect when the detector can detect within range and smog concentration is high, and is poor to tiny particle detection ability, uses for a long time and can make the sensitivity of detector reduce to whether can not accurately detect out someone smoking.

Therefore, the related art has a problem that the detection result of the target object is inaccurate due to the poor detection capability of the detector for the fine particles.

Disclosure of Invention

The embodiment of the invention provides a target object detection method and device, a storage medium and an electronic device, which are used for at least solving the technical problem that a target object detection result is inaccurate due to poor detection capability of a detector on tiny particles in the related art.

According to an aspect of an embodiment of the present invention, there is provided a target object detection method, including: under the condition that a first object which is higher than a first threshold value in temperature and has a preset shape is detected from a first image, acquiring a first area where the first object is located and a first size of the first object, wherein the first image is an image obtained by shooting a target area by an infrared detector; acquiring a second area in which a target part of a second object is located and a second size of the target part when the second object is detected from a second image, wherein the second image is a visual image corresponding to the first image, and the second object is a human-shaped object with a predetermined gesture; and under the condition that the distance between the first area and the second area is smaller than a target distance threshold value and the ratio of the first size to the second size is within a target ratio range, determining that the first object is the target object.

According to another aspect of the embodiments of the present invention, there is also provided a target object detection apparatus, including: the first acquisition unit is used for acquiring a first area where a first object is located and a first size of the first object under the condition that the first object with the temperature exceeding a first threshold and a preset shape is detected from a first image, wherein the first image is an image obtained by shooting a target area by an infrared detector; a second acquisition unit configured to acquire, when a second object is detected from a second image, a second region in which a target portion of the second object is located and a second size of the target portion, the second image being a visualized image corresponding to the first image, the second object being a human-shaped object having a predetermined posture; and the first determining unit is used for determining that the first object is the target object under the condition that the distance between the first area and the second area is smaller than the target distance threshold value and the ratio of the first size to the second size is in a target ratio range.

According to a further aspect of the embodiments of the present invention, there is also provided a storage medium storing a computer program configured to perform the above method when executed.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the method described above through the computer program.

In the embodiment of the invention, when a first object with a temperature exceeding a first threshold and a preset shape is detected in a temperature-sensitive image (a first image), a first area where the first object is located and a first size of the first object are obtained, when a humanoid object with a preset posture is detected in a visual image (a second image), a second area where a target part (such as a human head or a human face) of the humanoid object is located and a second size of the target part are obtained, whether the first object is the target object (a burning cigarette) is determined according to the position relation of the first area and the second area and the size relation of the first size and the second size, and due to the combination of the characteristics of the burning cigarette and the specific posture of a human during smoking, the interference of a high-temperature false alarm source without the cigarette head can be removed, the detection accuracy of the target object is improved, and the technical problem that the detection result of the target object caused by the inaccurate detection capability of a detector for tiny particles in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a detection device of a target object detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a network architecture of a target object detection method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram illustrating an alternative target object detection method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of an alternative shape detection method in accordance with embodiments of the present invention;

FIG. 5 is a schematic flow chart diagram of an alternative shape detection method in accordance with embodiments of the present invention;

FIG. 6 is a schematic diagram of an alternative shape detection method of an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative shape detection method according to an embodiment of the invention;

FIG. 8 is a schematic flow chart diagram of an alternative target object detection method in accordance with an embodiment of the present invention;

FIG. 9 is a schematic flow chart diagram illustrating an alternative target object detection method according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of an alternative method of detection without training data in multispectral according to an embodiment of the present invention;

FIG. 11 is a schematic flow chart diagram illustrating an alternative target object detection method according to an embodiment of the present invention;

FIG. 12 is a flowchart illustrating an alternative target object detection method according to an embodiment of the present invention; and the number of the first and second groups,

fig. 13 is a block diagram of an alternative target object detection apparatus according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

According to an aspect of an embodiment of the present invention, there is provided a target object detection method. The above method may be performed in a detection device, a computer terminal or similar display means. Taking an example of the detection device, fig. 1 is a block diagram of a hardware structure of the detection device of the target object detection method according to the embodiment of the present invention. As shown in fig. 1, the detection device may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, the detection device may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the above-described detection apparatus. For example, the detection device may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the target object detection method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the detection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the detection device. In one example, the transmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.

The embodiment of the present application may operate on the network architecture shown in fig. 2, as shown in fig. 2, the network architecture includes: an acquisition device (for acquiring the first image and/or the second image), a detection device (for detecting the target object), an alarm device, wherein the detection device may interact with the acquisition device and the alarm device (e.g., over a network).

Alternatively, the capturing device may provide the captured first image and second image to the detecting device, and the detecting device may detect the first image and second image after receiving the first image and second image, determine whether the target object (burning cigarette) is in the first image and second image, and send an alarm to the alarm device if the target object is detected.

According to an embodiment of the present invention, a target object detection method is provided, which may be applied to the detection device shown in fig. 2, or a combination of the acquisition device, the detection device, and the alarm device. As shown in fig. 3, the method includes:

step S302, under the condition that a first object which is higher than a first threshold value in temperature and has a preset shape is detected from a first image, acquiring a first area where the first object is and a first size of the first object, wherein the first image is an image obtained by shooting a target area by an infrared detector;

step S304, when a second object is detected from a second image, acquiring a second area where a target part of the second object is located and a second size of the target part, wherein the second image is a visual image corresponding to the first image, and the second object is a human-shaped object with a preset posture;

step S306, determining that the first object is the target object when the distance between the first area and the second area is smaller than the target distance threshold and the ratio of the first size to the second size is within the target ratio range.

Through the steps, the technical problem that the detection result of the target object is inaccurate due to the poor detection capability of the detector on the tiny particles in the related art is solved and the accuracy of the detection of the target object is improved by acquiring the first area where the first object is located and the first size of the first object when the first object with the temperature exceeding the first threshold and the preset shape is detected in the temperature-sensitive image (the first image), acquiring the second area where the target part (such as the head or the face of a person) of the humanoid object is located and the second size of the target part when the humanoid object with the preset posture is detected in the visual image (the second image), and determining whether the first object is the target object (the burning cigarette) according to the position relation of the first area and the second area and the size relation of the first size and the second size.

Alternatively, the main body for performing the above steps may be a detection device or the like, but is not limited thereto.

Alternatively, the target object detection method may be applied to a target object detection process in a target scene, for example, to a detection process of burning cigarettes in various scenes, such as a shopping mall, an exhibition hall, a conference room, a traffic control center, and the like.

In step S302, in a case where a first object having a temperature exceeding a first threshold and having a predetermined shape is detected from a first image, the detection apparatus acquires a first area where the first object is located and a first size of the first object, where the first image is an image of a target area captured by an infrared detector.

The first image may be an image of the target area taken by an infrared detector. The first image may be a frame of image acquired by the acquisition device in real time, or may be a frame of image acquired from a buffer queue storing one or more frames of images acquired by the acquisition device. The obtaining mode of the first image may be set as required, which is not described in detail in this embodiment.

Optionally, in this embodiment, before acquiring the first area where the first object is located and the first size of the first object, a first image obtained by shooting the target area by the infrared detector may be acquired; a first object having a temperature exceeding a first threshold and having a predetermined shape is detected from the first image.

Before acquiring the first area where the first object is located and the first size of the first object, one frame of infrared Raw data (one frame of infrared image, first image) may be acquired.

After the first image is acquired, whether a high-temperature target exists in a picture of the first image can be detected, and if not, the step of acquiring the first image is executed again; if so, it may be detected whether the shape of the high temperature object is a preset shape (e.g., a circle-like shape), and if not, the step of acquiring the first image is re-performed, and if so, the position (first area where the first object is located) and the size (first size of the first object) of the high temperature object (first object) are acquired.

Alternatively, in this embodiment, detecting, from the first image, the first object having a predetermined shape and a temperature exceeding the first threshold may include: detecting a reference object having a temperature exceeding a first threshold from the first image; determining the reference object as the first object when the highest temperature of the reference object fluctuates and the shape of the reference object is a predetermined shape, wherein the fluctuation of the highest temperature of the reference object means that: the highest temperatures of the reference object in the continuous multi-frame images are not completely the same, the standard deviation and mean ratio of the highest temperatures in the multi-frame images is greater than or equal to a predetermined threshold, and the multi-frame images include the first image.

In view of the possibility of false alarm of single high-temperature target detection, in order to reduce false alarm, the temperature fluctuation characteristic of the cigarette end which can be burnt is judged. Because the temperature of the cigarette end can fluctuate when smoking, because the cigarette end can be more sufficient in oxygen supply and the temperature can be higher when people inhale cigarettes, the fluctuation characteristic can be utilized to reduce misinformation.

The method comprises the steps of detecting the maximum temperature fluctuation of a high-temperature target, counting the maximum temperature of multi-frame data, calculating a mean value and a standard deviation, judging whether the fluctuation exceeds a threshold value by utilizing the ratio of the standard deviation to the mean value, and further judging whether the maximum temperature fluctuates.

By the embodiment, whether the reference object is the first object or not is judged by utilizing the temperature fluctuation characteristic of the target object, so that the false alarm rate of the target object can be reduced.

Alternatively, the preset shape detection may be performed in various ways, for example, by performing Feature extraction with a Scale-invariant Feature Transform (SIFT) or speeded-Up Robust Feature (SURF) algorithm, and then performing Feature matching, and the detection process may be as shown in fig. 4, and the process includes the following steps:

step S402, initialization.

And S404, filtering out the binary image of the low-temperature object.

In step S406, a target algorithm (e.g., SIFT algorithm, SURF algorithm) is used to extract preset shape features.

And step S408, extracting the shape characteristics of the target to be detected by using a target algorithm.

And step S410, matching the preset shape characteristics with the shape characteristics of the target to be detected.

Step S412, determining whether there is a preset target in the image, if yes, executing step S414, otherwise, acquiring the next frame of image and re-executing the detection process.

Step S414, outputs the detection result.

In the case that the predetermined shape is a quasi-circular shape, there may be a plurality of detection methods for the quasi-circular shape, for example, hough transform method, barycenter method or least square method, and the procedure for detecting the quasi-circular shape may be as shown in fig. 5, and the procedure includes the following steps:

step S502, initialization.

And step S504, filtering out the binary image of the low-temperature object.

Step S506, edge detection is performed on the binarized image.

In step S508, the image is processed using a circle detection algorithm (e.g., a center of gravity method, hough transform method).

Step S510, determining whether the processed image has a circular shape, if so, executing step S512, otherwise, acquiring the next frame of image and re-executing the detection process.

And step S512, outputting the detection result.

By the embodiment, a first object which has a temperature exceeding a first threshold and has a preset shape is detected from a first image obtained by shooting the target area by the infrared sensor, so that a lot of scene data can be pre-filtered by using high-temperature characteristics (without calculation), and calculation resources are saved; the preset shape (similar to a circle or other specified shape characteristics) can filter a plurality of high-temperature false alarm sources without cigarette ends, and the target object detection efficiency is improved.

Acquiring the first area in which the first object is located and the first size of the first object may be based on training data. Optionally, in this embodiment, the acquiring the first area where the first object is located and the first size of the first object may include: the method comprises the steps of obtaining a first area where a first object is located, a first size of the first object and a first reliability, wherein the first reliability is used for representing the similarity degree of the shape of the first object and a preset shape.

The obtaining of the first area where the first object is located and the first size of the first object may be inputting the first image into the first object model, extracting the area and the size where the first object is located from the first image by the first object model, and obtaining a shape similarity between the shape of the first object and a predetermined shape of the object.

Alternatively, before step S302, each first training image in the first training image set may be input to the first training model, and the labeling information (for labeling the target object and the predetermined shape of the target object) of each first training image is used to adjust the model parameters of the first training model so that the degree of correlation (e.g., the degree of similarity) between the output result and the labeling information exceeds a first threshold (satisfies the objective function).

By the embodiment, the position and the shape of the first object in the first image and the shape similarity with the preset shape are obtained by using the first target model, so that the accuracy of target object judgment can be improved, and the false alarm rate and the false missing alarm rate are reduced.

Alternatively, in step S304, the detection apparatus may acquire a second region in which the target portion of the second object is located and a second size of the target portion in a case where the second object is detected from a second image, wherein the second image is a visualized image corresponding to the first image, and the second object is a human-shaped object having a predetermined posture.

The first image and the second image may be different types of images. The first image is a temperature-sensitive image, and the second image is a visual image. The second image may be a separate image or an image converted from the first image. Accordingly, the infrared detector may be located at the infrared camera. The infrared camera may be a single infrared camera or a multi-spectral camera for infrared and visible light.

As an alternative embodiment, before acquiring the second area of the second object where the target portion is located and the second size of the target portion, the first image may be converted into a visual image to obtain the second image.

For example, one frame of infrared Raw data can be converted into an 8bit visual image.

Through this embodiment, through using single infrared camera to carry out infrared image and obtain to convert into visual image, can detect all weather throughout the day, improve the time width who detects.

As another alternative, in the case that the infrared camera is a multispectral camera, before acquiring the first region where the first object is located and the first size of the first object, the second image captured by the multispectral camera may be acquired before acquiring the second region where the target portion of the second object is located and the second size of the target portion, wherein the multispectral camera includes an infrared detector, and the first image and the second image are an infrared image and a visible light image captured by the multispectral camera on the target region at the same time, respectively.

For example, the multispectral camera may capture the target area at the same time to obtain an infrared image and a visible light image corresponding to the time, and register the infrared image and the visible light image to obtain the infrared image and the visible light image corresponding thereto.

According to the embodiment, the multispectral camera is used for acquiring the visible light image, the visible light image can be collected, the evidence is reserved, and the accuracy and the reliability of target object detection are improved.

It should be noted that, the single infrared camera and the multispectral camera can be divided into: single infrared image detection and infrared visible light multispectral image detection. The single infrared scheme cannot capture visible light images and can detect all weather all the day; the multispectral scheme cannot detect all weather all the day, and can capture visible light images and keep evidences; the multispectral device can use the multispectral scheme during the day and can switch to the single infrared scheme at night. Of course, the multispectral scheme can also be used at night by adding a white light lamp.

Optionally, in this embodiment, before acquiring the second area where the target portion of the second object is located and the second size of the target portion, in a case that it is detected that the second image includes a target feature, it may be determined that the second image includes a human-shaped object, where the target feature is used to represent the target portion; in a case where it is determined that the posture of the human-shaped object matches the predetermined posture, it is determined that the second object is detected from the second image.

After the second image is acquired, whether a target feature is included in the second image, the target feature being used for representing a target part (for example, the target part may be a human head or a human face) may be detected, and when the target feature is detected, it may be determined that a humanoid object is included in the second image. For the detected human-shaped object, the posture of the human-shaped object may also be matched with a predetermined posture, and in the case where the two are matched, it is determined that the second object is detected from the second image. In addition to the target feature, other features corresponding to the target feature (for example, features representing hands and shoulders, and a predetermined gesture may be used to represent a human head or a gesture formed by a human face, hands and shoulders) may be detected from the second image according to the predetermined gesture.

It should be noted that, the target feature may be used to represent a human face, and in combination with the characteristics of the human face, the target feature may include at least one of the following: features for representing eyes, nose, mouth, face or relative size (relationship with corresponding body size), and the like, and the target feature may be set as needed, and this embodiment is not particularly limited.

For the second image, the person may be detected by face detection or body detection, and the face, head, body, etc. may be detected by a visible light image or an infrared image. Meanwhile, when smoking, the head or the face of a person, the hands and the shoulders form a specific gesture, and the gesture can also be used for smoking detection.

It should be noted that there are many ways to detect whether the face, shoulders and hands in the picture conform to the smoking posture, for example, a SIFT or SURF algorithm is used to perform feature extraction on the sample picture and the picture to be detected, and then feature matching is performed. The detection method is similar to the detection method of the preset shape, and is not described herein again.

By the embodiment, the judgment of the target object can be assisted by matching the posture of the human-shaped object with the preset posture, and the accuracy of the judgment of the target object can be improved.

Acquiring a second area in which the target portion of the second subject is located and a second size of the second subject may be based on the training data. Optionally, in this embodiment, the acquiring a second area where the target portion of the second object is located and a second size of the target portion includes: and acquiring a second area where the target part of the second object is located, a second size of the target part, a second reliability and a third reliability, wherein the second reliability is used for representing the reliability degree of the second object as a humanoid object, and the third reliability is the similarity degree of the posture of the second object and the preset posture.

The obtaining of the second region where the target portion of the second object is located and the second size of the second object may be inputting the second image into the second target model, extracting, by the second target model, the region and the size where the target portion of the second object is located from the second image, and detecting the reliability. The detection confidence may include a second confidence level that indicates a confidence level that the second object is a humanoid object (or alternatively, may indicate a confidence level that the location and size of the target portion is a second region and a second size). Further, the second target model may also output a third degree of confidence representing a degree of similarity of the posture of the second object to the predetermined posture.

Alternatively, before step S302, each second training image in the second training image set may be input to the second training model, and the model parameters of the second training model are adjusted using the labeling information (for labeling the position and size of the humanoid subject in each second training image) of each second training image, resulting in the first sub-model in the second target model, so that the correlation (e.g., the degree of similarity) between the output result and the labeling information exceeds the first threshold (satisfies the target function).

A second submodel in the second object model is used to obtain how similar the pose of the second object is to the predetermined pose. The training process of the second submodel is similar to that of the first submodel, and is not described herein.

According to the embodiment, the position, the size and the reliability of the target part of the second object in the second image are obtained by using the second target model, and the similarity between the posture of the second object and the preset posture is obtained, so that the accuracy of target object judgment can be improved, and the false alarm rate are reduced.

Optionally, before step S306, a fourth reliability may be determined according to the position relationship between the first area and the target area, where the fourth reliability is used to indicate a credibility degree that the first object matched with the position relationship is the target object; and determining a fifth credibility according to the size ratio of the first size to the second size, wherein the fifth credibility is used for representing the credibility degree of the first object which is matched with the size ratio and is taken as the target object.

After the second area is determined, a target area including the second area may be determined according to a range in which the second area is located, the target area may be determined according to preset parameters, the preset parameters may be preset and may include one or more parameters, for example, a square area with the second area as a center, directions of sides of the square are parallel to four directions, namely, an upper direction, a lower direction, a left direction and a right direction, and a side length of the square is a predetermined multiple (for example, 1.2) of a distance between farthest boundaries of the second area. For another example, in a rectangular area centered on the second area, the direction of each side of the rectangle is parallel to the four directions of up, down, left and right, and the distance from the center of the second area to each side is: the center of the second region is a predetermined multiple (e.g., 1.2) of the farthest distance from each direction. As another example, the second region as a whole expands outwardly by a predetermined factor.

The one or more parameters may be set based on empirical values, or a third training model may be trained using a third set of training images labeled with a human-shaped object and a burning cigarette to obtain a third target model (the one or more parameters may be model parameters). And processing the second image by using the third target model obtained by training to obtain a target area.

There is a relatively fixed relationship between the person and the burning cigarette in terms of location and size, i.e., the burning cigarette is generally near the person's face when smoking, as shown in fig. 6, where fig. 6 (a) is a case where the high temperature target is within a reasonable range, and fig. 6 (b) is a case where the high temperature target is outside the reasonable range. And the size ratio between the burning cigarette and the human face is generally within a certain reasonable range, as shown in fig. 7, wherein fig. 7 (a) is the case where the high-temperature target area is appropriate, fig. 7 (b) is the case where the high-temperature target area is small, and fig. 7 (c) is the case where the high-temperature target area is large. In this way, many disturbances can be excluded, such as a person holding the hot water cup, the relative size of the cup and the person significantly exceeding the relative size of the butt and the person.

A fourth degree of reliability indicating a degree of reliability that the first object is the target object, which matches the positional relationship, may be determined from the positional relationship between the first area and the target area. A fifth degree of reliability indicating a degree of reliability that the first object is the target object, which matches the size ratio, may be determined based on the size ratio of the first size to the second size.

The determination of the fourth confidence level and the fifth confidence level is similar to the determination process of the first confidence level, the second confidence level and the third confidence level, except that the trained model outputs the fourth confidence level and/or the fifth confidence level.

After the first confidence level, the second confidence level, the third confidence level, the fourth confidence level, and the fifth confidence level are obtained, a target confidence level may be obtained, where the target confidence level is a weighted sum of the first confidence level, the second confidence level, the third confidence level, the fourth confidence level, and the fifth confidence level.

And calculating the total reliability P = w 1P 1+ w 2P 2+ w 3P 3+ w 4P 4+ w 5P 5 of the smoking detection, wherein P1, P2, P3, P4 and P5 respectively represent a first reliability, a second reliability, a third reliability, a fourth reliability and a fifth reliability, and w1, w2, w3, w4 and w5 are weighted values of the respective reliabilities.

The coefficients w1, w2, w3, w4, and w5 may be obtained by training infrared image data of smoking people through a machine learning method, and the training method may refer to related technologies and is not described herein again.

By the embodiment, the total credibility of the target object contained in the picture is calculated by using the training data, so that the accuracy of judging the target object can be improved, and the false alarm rate and the missing alarm rate are reduced.

Optionally, in step S306, in a case that the distance between the first area and the second area is smaller than the target distance threshold, and the ratio of the first size and the second size is within the target ratio range, it is determined that the first object is the target object.

If the distance between the first area and the second area is smaller than the target distance threshold and the ratio of the first size to the second size is within the target ratio range, the first object may be determined to be the target object. Determining the first object as the target object may be: the first object may be directly determined as the target object, or may be a probability (reliability) that the first object is determined as the target object.

Optionally, in this embodiment, determining that the first object is the target object includes: and determining the first object as the target object when the target credibility is greater than or equal to the second threshold.

After the target reliability is obtained, the reliability degree of the first object as the target object may be determined based on a set second threshold, and the first object may be determined as the target object when the target reliability is greater than or equal to the second threshold.

Optionally, after determining that the first object is the target object, the detection device may send alarm information to the alarm device, where the alarm information indicates that the target object is detected in the target area; and controlling the target image acquisition equipment to shoot or record a target area to obtain a target image or a target record.

After the detection device determines that the target object is detected in the target area, the detection device can send out an alarm and trigger a thermal imaging image capture or video recording action.

By the embodiment, the alarm is triggered after the target object is detected, so that the target object can be alarmed, and the timeliness of the alarm is improved; by triggering the shooting or video recording of the target area, the evidence obtaining efficiency can be improved, and possible disputes when the target object corresponds to the main body are avoided.

The above target object detection method is explained below with reference to an alternative example. For smoking behaviour, it mainly consists of two important elements: humans and burning cigarettes. The target object detection method in this example utilizes the high temperature characteristics of a burning cigarette, the necessity of a person for smoking, and the position and size relationship characteristics between the two.

For two cases of training data and non-training data, under the condition of no training data, a small amount of test data can be used for manually summarizing an experience threshold (experience threshold) to judge, and the method has the advantages that: the product can be integrated at the highest speed without big data. Under the condition that training data exist, the big data can be used for training out proper models and parameters, and the method has the advantages that: the false alarm rate and the missing report rate are lower.

There may be four target object detection methods depending on whether there is training data and whether there is a single infrared camera or a multi-spectral camera: the method comprises a single-infrared training-free data detection method, a multispectral training-free data detection method and a multispectral training-free data detection method. In addition, in order to reduce the false alarm rate, a detection method based on the temperature fluctuation characteristic of the burning cigarette is also provided. This is explained below with reference to an alternative example.

Alternative example 1

In this alternative example, a detection method under single infrared no-training data is provided, as shown in fig. 8, the method includes the following steps:

step 1, acquiring a frame of infrared Raw data;

step 2, detecting whether a high-temperature target exists in the picture, if so, entering step 3, otherwise, returning to step 1;

step 3, detecting whether the shape of the high-temperature target is a preset shape (such as a circular shape), if so, recording the position and the size of the high-temperature target, entering step 4, and otherwise, returning to the step 1;

step 4, converting the Raw data in the step 1 into an 8bit visual image;

step 5, detecting whether the image generated in the step 4 has a human head or a human face, if so, recording the position and the size of the human head or the human face, entering the step 6, otherwise, returning to the step 1;

step 6, detecting whether the head or face, shoulders and hands in the image generated in the step 4 conform to the smoking posture, if so, recording and entering the step 7, otherwise, returning to the step 1;

step 7, judging whether the position relation between the similar high-temperature target and the nearest human head or human face is in a reasonable range, if so, entering step 8, otherwise, returning to step 1;

step 8, judging whether the relative sizes of the high-temperature-like target and the nearest human head or human face are in a reasonable range, if so, entering step 9, otherwise, returning to step 1;

step 9, considering that smoking behavior exists in the scene;

and step 10, sending an alarm and triggering thermal imaging image capturing or video recording actions.

Alternative example 2

In this alternative example, there is provided a detection method under training data in single infrared, as shown in fig. 9, the method includes the following steps:

step 1, acquiring a frame of infrared Raw data;

step 3, detecting whether the shape of the high-temperature target is a preset shape (such as a circular shape), if so, recording the position size and the shape similarity P1 of the high-temperature target, entering step 4, and if not, returning to the step 1;

step 4, converting the Raw data in the step 1 into an 8bit visual image;

step 5, detecting whether the image generated in the step 4 has a human head or a human face, if so, recording the position size of the human head or the human face and the detection reliability P2, entering the step 6, otherwise, returning to the step 1;

step 6, detecting whether the human head or the human face, the shoulders and the hands in the image generated in the step 4 conform to the smoking posture, if so, recording the smoking posture similarity P3, entering the step 7, otherwise, returning to the step 1;

step 7, calculating a position relation coefficient P4 of the similar high-temperature target and the nearest human head or human face;

step 8, calculating the size proportion coefficient P5 of the similar high-temperature target and the nearest human head or human face;

step 9, calculating the total confidence level P = P1 × w1+ P2 × w2+ P3 × w3+ P4 × w4+ P5 × w5 of the smoking detection;

step 10, judging whether the total reliability P is greater than a threshold value P0, if so, determining that smoking behavior exists in the scene, and entering step 11, otherwise, returning to step 1;

and step 11, sending an alarm and triggering thermal imaging image capture or video recording action.

The coefficients w1, w2, w3, w4, w5 and the threshold P0 in step 9 are obtained by training with a machine learning method using infrared image data of smoking people.

Alternative example 3

In this alternative example, a method for detecting without training data in multispectral is provided, as shown in fig. 10, the method includes the following steps:

step 1, registering an infrared image and a visible light image;

step 2, acquiring a frame of infrared Raw data;

step 3, detecting whether a high-temperature target exists in the picture, if so, entering step 4, otherwise, returning to step 2;

step 4, detecting whether the shape of the high-temperature target is a preset shape (such as a round shape), if so, recording the position and the size of the high-temperature target, entering step 5, otherwise, returning to step 2;

step 5, acquiring a frame of visible light image;

step 6, detecting whether the head or the face exists in the visible light image picture in the step 5, if so, recording the position and the size of the head or the face, entering the step 7, otherwise, returning to the step 2;

step 7, detecting whether the head or the face, the shoulders and the hands in the visible light image picture in the step 5 conform to the smoking posture, if so, entering the step 8, otherwise, returning to the step 2;

step 8, converting the head or face coordinates detected in the step 6 into coordinates in the infrared image according to the configuration parameters, and acquiring the position and size of the head or face coordinates in the infrared image;

step 9, judging whether the position relation between the high-temperature target in the infrared image and the nearest human head or human face is in a reasonable range, if so, entering step 10, otherwise, returning to step 2;

step 10, judging whether the relative sizes of the high-temperature target and the nearest human head or human face are in a reasonable range, if so, entering step 11, otherwise, returning to step 2;

step 11, regarding that smoking behaviors exist in the scene;

and step 12, sending an alarm and triggering a visible light image capturing or video recording action.

Alternative example 4

In this alternative example, a method for detecting under multispectral training data is provided, as shown in fig. 11, the method includes the following steps:

step 1, registering an infrared image and a visible light image;

step 2, acquiring a frame of infrared Raw data;

step 4, detecting whether the shape of the high-temperature target is a preset shape (such as a circular shape), if so, recording the position size and the shape similarity P1 of the high-temperature target, entering step 5, and otherwise, returning to step 2;

step 5, acquiring a frame of visible light image;

step 6, detecting whether the head or the face exists in the visible light image picture in the step 5, if so, recording the position and the size of the head or the face and the detection reliability P2, entering the step 7, otherwise, returning to the step 2;

step 7, detecting whether the human head or the human face, the shoulders and the hands in the visible light image picture in the step 5 conform to the smoking posture, if so, recording the smoking posture similarity P3, entering the step 8, otherwise, returning to the step 2;

step 9, calculating a position relation coefficient P4 of the high-temperature target and the nearest human head or human face;

step 10, calculating the size proportion coefficient P5 of the high-temperature target and the nearest human head or human face;

step 11, calculating the total confidence level P = P1 × w1+ P2 × w2+ P3 × w3+ P4 × w4+ P5 × w5 of the smoking detection;

step 12, judging whether the total reliability P is greater than a threshold value P0, if so, determining that smoking behavior exists in the scene, and entering step 13, otherwise, returning to step 2;

and step 13, sending an alarm and triggering a visible light image capturing or video recording action.

The coefficients w1, w2, w3, w4, w5 and the threshold P0 in step 11 are obtained by training with a machine learning method using infrared image data of smoking people.

Alternative example 5

In this alternative example, a method of detecting a temperature fluctuation characteristic of a combustion-based cigarette is provided, as shown in fig. 12, the method comprising the following process steps:

step 1, acquiring a frame of infrared Raw data;

step 3, detecting whether the highest temperature of the high-temperature target in the picture fluctuates, if so, entering step 4, otherwise, returning to step 1;

step 4, detecting whether the shape of the high-temperature target is a preset shape (such as a round shape), if so, recording the position and the size of the high-temperature target, entering step 5, otherwise, returning to step 1;

step 5, converting the Raw data in the step 1 into an 8bit visual image;

step 6, detecting whether the image generated in the step 5 has a human head or a human face, if so, recording the position and the size of the human head or the human face, entering the step 7, otherwise, returning to the step 1;

step 7, detecting whether the head or face, shoulders and hands in the image generated in the step 5 conform to the smoking posture, if so, recording and entering the step 8, otherwise, returning to the step 1;

step 8, judging whether the position relation between the similar high-temperature target and the nearest human head or human face is in a reasonable range, if so, entering step 9, otherwise, returning to step 1;

step 9, judging whether the relative sizes of the high-temperature-like target and the nearest human head or human face are in a reasonable range, if so, entering step 10, otherwise, returning to step 1;

step 10, regarding that smoking behaviors exist in a scene;

and step 11, sending an alarm and triggering thermal imaging grab or video recording action.

In step 3, the highest temperature of multiple frames of data is counted when the highest temperature fluctuation of the high-temperature target is detected, then a mean value and a standard deviation are calculated, the ratio of the standard deviation to the mean value is used for judging whether the fluctuation exceeds a threshold value, and further whether the highest temperature fluctuates is judged.

By the example, smoking behavior detection is performed by combining the temperature characteristic of the burning cigarette, the shape characteristic of the cigarette, the action characteristic of the smoker and the position relation between the smoker and the cigarette; according to the fact that whether training data exist or not and whether a single infrared scheme or a multi-spectral scheme exist, four detection schemes are provided; when a high-temperature target is detected, further detecting whether the highest temperature of the high-temperature target fluctuates, and reducing false detection by using the temperature fluctuation characteristic during cigarette burning; the method has the advantages of high detection speed, high accuracy and wide coverage range, can detect the smoking behavior without smoke or with less smoke, and is suitable for all-weather indoor and outdoor smoking monitoring all the day.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

According to another aspect of the embodiments of the present application, there is also provided a target object detection apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and which has already been described and will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 13 is a block diagram of an alternative target object detection apparatus according to an embodiment of the present invention, as shown in fig. 13, the apparatus includes:

(1) A first obtaining unit 1302, configured to obtain a first area where a first object is located and a first size of the first object when the first object having a predetermined shape and a temperature exceeding a first threshold is detected from a first image, where the first image is an image obtained by shooting a target area with an infrared detector;

(2) A second acquiring unit 1304 for acquiring, when a second object is detected from a second image, the second image being a visualized image corresponding to the first image, a second region in which a target portion of the second object is located, and a second size of the target portion, the second object being a human-shaped object having a predetermined posture;

(3) A first determining unit 1306, configured to determine that the first object is the target object when a distance between the first area and the second area is smaller than a target distance threshold and a ratio of the first size and the second size is within a target ratio range.

Alternatively, the target object detection device may be a detection device, and may also be disposed in the detection device.

In an optional embodiment, the apparatus further comprises:

(1) The third acquisition unit is used for acquiring a first image obtained by shooting a target area by the infrared detector before acquiring a first area where the first object is located and the first size of the first object;

(2) A first detection unit configured to detect a reference object whose temperature exceeds a first threshold value from the first image;

(3) A second determination unit configured to determine the reference object as the first object if the maximum temperature of the reference object fluctuates and the shape of the reference object is a predetermined shape, wherein the fluctuation of the maximum temperature of the reference object is: the highest temperatures of the reference object in the continuous multi-frame images are not identical, the standard deviation and mean ratio of the highest temperatures in the multi-frame images is larger than or equal to a preset threshold value, and the multi-frame images comprise a first image.

In an optional embodiment, the apparatus further comprises one of:

(1) The conversion unit is used for converting the first image into a visual image to obtain a second image before acquiring a second area where a target part of the second object is located and a second size of the target part;

(2) And the fourth acquisition unit is used for acquiring a second image shot by the multispectral camera before acquiring a second area where the target part of the second object is located and the second size of the target part, wherein the multispectral camera comprises an infrared detector, and the first image and the second image are respectively an infrared image and a visible light image which are shot by the multispectral camera to the target area at the same time.

In an optional embodiment, the apparatus further comprises:

(1) A third determining unit, configured to determine that the second image includes the human-shaped object when detecting that the second image includes a target feature before acquiring a second area where a target portion of the second object is located and a second size of the target portion, where the target feature is used for representing the target portion;

(2) A fourth determination unit configured to determine that the second object is detected from the second image in a case where the posture of the human-shaped object matches the predetermined posture.

In an optional embodiment, the apparatus further comprises: a fifth determining unit, a sixth determining unit, a fourth acquiring unit, the first acquiring unit 1302 includes: the first obtaining module, the second obtaining unit 1304 includes: the second obtaining module, the first determining unit 1306, includes: a determination module that determines, wherein,

(1) The device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first area where a first object is located, a first size of the first object and a first reliability, and the first reliability is used for representing the similarity degree of the shape of the first object and a preset shape;

(2) A second obtaining module, configured to obtain a second region where a target portion of a second object is located, a second size of the target portion, a second degree of reliability, and a third degree of reliability, where the second degree of reliability is used to represent a degree of reliability that the second object is a humanoid object, and the third degree of reliability is a degree of similarity between a posture of the second object and a predetermined posture;

(3) A fifth determining unit, configured to determine a fourth reliability according to a position relationship between the first area and the second area before determining that the first object is the target object, where the fourth reliability is used to indicate a credibility degree, which matches the position relationship and in which the first object is the target object;

(4) A sixth determining unit, configured to determine a fifth reliability according to a size ratio of the first size to the second size, where the fifth reliability is used to indicate a reliability degree, which is matched with the size ratio and in which the first object is a target object;

(5) A fourth obtaining unit, configured to obtain a target reliability, where the target reliability is a weighted sum of the first reliability, the second reliability, the third reliability, the fourth reliability, and the fifth reliability;

(6) And the determining module is used for determining the first object as the target object under the condition that the target credibility is greater than or equal to the second threshold.

In an optional embodiment, the apparatus further comprises:

(1) The device comprises a sending unit, a judging unit and a warning unit, wherein the sending unit is used for sending warning information to warning equipment after the first object is determined to be a target object, and the warning information is used for indicating that the target object is detected in a target area;

(2) And the control unit is used for controlling the target image acquisition equipment to shoot or record the target area to obtain the target image or the target record.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to a further aspect of an embodiment of the present application, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.

According to yet another aspect of an embodiment of the present application, there is also provided an electronic apparatus, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any one of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention shall be included in the protection scope of the present invention.

Claims

1. A method of detecting a target object, comprising:

under the condition that a first object which is higher than a first threshold value in temperature and has a preset shape is detected from a first image, acquiring a first area where the first object is located and a first size of the first object, wherein the first image is an image obtained by shooting a target area by an infrared detector;

when a second object is detected from a second image, acquiring a second area where a target part of the second object is located and a second size of the target part, wherein the second image is a visual image corresponding to the first image, and the second object is a human-shaped object with a predetermined gesture;

determining that the first object is a target object when the distance between the first area and the second area is smaller than a target distance threshold and the ratio of the first size to the second size is within a target ratio range;

wherein obtaining the first area in which the first object is located and the first size of the first object comprises: acquiring the first area where the first object is located, the first size of the first object and a first reliability, wherein the first reliability is used for representing the similarity degree of the shape of the first object and the preset shape;

obtaining the second area of the second object in which the target site is located and the second size of the target site includes: acquiring the second area of the second object where the target part is located, the second size of the target part, a second reliability and a third reliability, wherein the second reliability is used for representing the reliability degree of the second object being the human-shaped object, and the third reliability is the similarity degree of the posture of the second object and the preset posture;

before determining that the first object is the target object, the method further comprises: determining a fourth credibility according to the position relationship between the first area and the target area, wherein the fourth credibility is used for representing the credibility of the first object which is matched with the position relationship and is the target object; determining a fifth credibility according to the size ratio of the first size to the second size, wherein the fifth credibility is used for representing the credibility degree that the first object matched with the size ratio is the target object; obtaining a target reliability, wherein the target reliability is a weighted sum of the first reliability, the second reliability, the third reliability, the fourth reliability, and the fifth reliability;

determining that the first object is the target object comprises: determining that the first object is the target object when the target credibility is greater than or equal to a second threshold.

2. The method of claim 1, wherein prior to obtaining the first area in which the first object is located and the first size of the first object, the method further comprises:

acquiring the first image obtained by shooting the target area by the infrared detector;

detecting a reference object from the first image whose temperature exceeds the first threshold;

determining the reference object as the first object when the maximum temperature of the reference object fluctuates and the shape of the reference object is the predetermined shape, wherein the fluctuation of the maximum temperature of the reference object refers to: the highest temperatures of the reference object in continuous multi-frame images are not identical, the standard deviation and mean ratio of the highest temperatures in the multi-frame images is larger than or equal to a preset threshold value, and the multi-frame images comprise the first image.

3. The method of claim 1, wherein prior to acquiring the second region of the second subject in which the target site is located and the second size of the target site, the method further comprises one of:

converting the first image into the visual image to obtain the second image;

and acquiring the second image shot by a multispectral camera, wherein the multispectral camera comprises the infrared detector, and the first image and the second image are respectively an infrared image and a visible light image which are shot on the target area by the multispectral camera at the same time.

4. The method of claim 1, wherein prior to obtaining the second region of the second subject in which the target site is located and the second size of the target site, the method further comprises:

under the condition that a target feature is detected to be contained in the second image, determining that a humanoid object is contained in the second image, wherein the target feature is used for representing the target part;

in a case where it is determined that the gesture of the human-shaped object matches the predetermined gesture, it is determined that the second object is detected from the second image.

5. The method of any of claims 1-4, wherein after determining that the first object is the target object, the method further comprises:

sending alarm information to an alarm device, wherein the alarm information is used for indicating that the target object is detected in the target area;

and controlling the target image acquisition equipment to shoot or record the target area to obtain a target image or a target video.

6. An apparatus for detecting a target object, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for acquiring a first area where a first object is located and a first size of the first object under the condition that the first object with the temperature exceeding a first threshold and a preset shape is detected from a first image, and the first image is an image obtained by shooting a target area by an infrared detector;

a second acquisition unit configured to acquire, when a second object is detected from a second image that is a visualized image corresponding to the first image, a second region in which a target portion of the second object is located and a second size of the target portion, the second object being a human-shaped object having a predetermined posture;

a first determining unit, configured to determine that the first object is a target object when a distance between the first area and the second area is smaller than a target distance threshold and a ratio of the first size to the second size is within a target ratio range;

wherein the apparatus further comprises: a fifth determining unit, a sixth determining unit, and a fourth acquiring unit, the first acquiring unit including: a first acquisition module, the second acquisition unit comprising: a second obtaining module, the first determining unit comprising: a determination module that determines, among other things,

the first obtaining module is configured to obtain the first area where the first object is located, the first size of the first object, and a first reliability, where the first reliability is used to indicate a similarity degree between a shape of the first object and the predetermined shape;

the second obtaining module is configured to obtain the second region of the second object in which the target portion is located, the second size of the target portion, a second degree of reliability, and a third degree of reliability, where the second degree of reliability is used to represent a degree of reliability that the second object is the humanoid object, and the third degree of reliability is a degree of similarity between the posture of the second object and the predetermined posture;

the fifth determining unit is configured to determine a fourth degree of reliability according to a position relationship between the first area and the second area before determining that the first object is the target object, where the fourth degree of reliability is used to indicate a degree of reliability that matches the position relationship and that the first object is the target object;

the sixth determining unit is configured to determine a fifth reliability according to a size ratio of the first size to the second size, where the fifth reliability is used to indicate a degree of reliability that the first object is the target object and that matches the size ratio;

the fourth obtaining unit is configured to obtain a target reliability, where the target reliability is a weighted sum of the first reliability, the second reliability, the third reliability, the fourth reliability, and the fifth reliability;

the determining module is configured to determine that the first object is the target object when the target reliability is greater than or equal to a second threshold.

7. The apparatus of claim 6, further comprising:

a third acquiring unit, configured to acquire the first image obtained by shooting the target area with the infrared detector before acquiring the first area where the first object is located and the first size of the first object;

a first detection unit configured to detect a reference object whose temperature exceeds the first threshold from the first image;

a second determination unit configured to determine the reference object as the first object if a maximum temperature of the reference object fluctuates and a shape of the reference object is the predetermined shape, wherein the fluctuation of the maximum temperature of the reference object is: the highest temperatures of the reference object in the continuous multi-frame images are not identical, the standard deviation and mean ratio of the highest temperatures in the multi-frame images is larger than or equal to a preset threshold value, and the multi-frame images comprise the first image.

8. The apparatus of claim 6, further comprising one of:

a conversion unit, configured to convert the first image into the visual image to obtain a second image before acquiring the second area of the second object where the target portion is located and the second size of the target portion;

a fourth acquiring unit, configured to acquire the second image captured by a multispectral camera before acquiring the second area where the target portion of the second object is located and the second size of the target portion, where the multispectral camera includes the infrared detector, and the first image and the second image are an infrared image and a visible light image captured by the multispectral camera on the target area at the same time, respectively.

9. The apparatus of claim 6, further comprising:

a third determining unit, configured to determine that a human-shaped object is included in the second image when a target feature is detected to be included in the second image before acquiring the second area of the second object where the target portion is located and the second size of the target portion, where the target feature is used for representing the target portion;

a fourth determination unit configured to determine that the second object is detected from the second image if the posture of the human-shaped object matches the predetermined posture.

10. The apparatus of any one of claims 6 to 9, further comprising:

a sending unit, configured to send alarm information to an alarm device after determining that the first object is the target object, where the alarm information is used to indicate that the target object is detected in the target area;

and the control unit is used for controlling the target image acquisition equipment to shoot or record the target area to obtain a target image or a target record.

11. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 5 when executed.

12. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 5 by means of the computer program.