CN114898239A

CN114898239A - Target object detection method and device thereof

Info

Publication number: CN114898239A
Application number: CN202110099193.0A
Authority: CN
Inventors: 孔令广
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2022-08-12
Also published as: WO2022156763A1

Abstract

A target object detection method and a device thereof are provided, the method comprises the steps of carrying out target object detection on images shot by a camera, only encoding the images with the target objects and sending the images to a monitoring platform server, and not encoding the images without the target objects and sending the images without the target objects to the monitoring platform server.

Description

Target object detection method and device thereof

Technical Field

The present application relates to the field of monitoring, and in particular, to a target object detection method and apparatus.

Background

With the development of video monitoring technology, video monitoring is developed from pure video recording to intellectualization. Such an intelligentization trend also deeply affects the development of cameras, and the current cameras show a trend of more and more intelligentization. Meanwhile, the automatic identification accuracy of the intelligent camera exceeds that of human beings due to the rapid development of technologies such as computer vision, deep learning and the like.

Along with the increasing intellectualization of the camera, the monitoring efficiency is improved, the labor cost is saved, and the method plays an important role in promoting the modernization of safe cities, smart cities and the like, the automation of industrial production and the like. One of the difficulties of the related art is how to utilize a camera to reduce the analysis pressure of the background analysis system and improve the accuracy of the background analysis system.

Disclosure of Invention

In view of the above, a target object detection method and apparatus thereof are provided, which can reduce the data amount of the encoded image.

In a first aspect, an embodiment of the present application provides a target object detection method, where the method includes: acquiring a first image; detecting whether a target object is included in the first image; when the target object is not included in the first image, not performing encoding on the first image; acquiring a second image; detecting whether a target object is included in the second image; and when the second image comprises the target object, encoding the second image and sending the encoded second image to a monitoring platform server.

With the increasing intelligence of the camera, the camera can execute relevant image processing on each image in the video stream after acquiring the video stream, so that if the intelligent camera can identify effective images and transmit the effective images to the monitoring platform server, the data transmission pressure can be reduced, and the storage pressure and the data processing pressure of the monitoring platform server can be reduced. Based on this, the target detection method performs target object detection on the image taken by the camera, encodes only the image with the target object, and the image without the target object may not be encoded, thereby reducing the number of encoded images. In addition, images without target objects can not be sent to the monitoring platform server, so that the subsequent transmission amount and the storage cost of the monitoring platform server are reduced.

According to a first aspect, in a first possible implementation manner of the first aspect, acquiring a first image includes: acquiring at least one image in a preset interval, wherein the preset interval comprises a preset time interval and a preset number interval; selecting a first image from the at least one image in an image selection manner, the method further comprising: determining not to perform encoding on the at least one image.

In implementation, in consideration of the real-time detection, which greatly tests the processing capability of the smart camera, a plurality of images within a preset interval can be acquired, and then a certain image is selected from the plurality of images as a representative image, so that the processing pressure of the smart camera can be reduced. If the first image does not perform encoding, none of the images perform encoding, which may further alleviate the processing pressure of the smart camera. In other words, the processing method of the representative image is taken as the processing method of the whole plurality of images, and the processing method comprises the following steps: and coding/not coding, and sending/not sending to the monitoring platform server.

According to the first aspect, in a second possible implementation manner of the first aspect, the method further includes: and determining an evaluation value of the second image by using an evaluation mode, wherein the evaluation value is used for describing the image quality of the second image.

In the related art, generally, only an image including a single target object is evaluated, and thus, the relevance between a plurality of target objects is ignored, thereby underestimating the importance of the image. In order to better balance the value of the first image, the evaluation value of the second image already having the target object may be determined using an evaluation method. That is, the second image is evaluated by the quantization value (evaluation value), so that the user can evaluate the first image more intuitively.

According to the first aspect, in a third possible implementation manner of the first aspect, the evaluating manner includes: when a plurality of target objects are included in the second image, the evaluation value of the second image is correlated with the sub-evaluation values of the plurality of target objects in the second image.

That is, in the case where the second image includes a plurality of target objects, the evaluation of the second image with respect to the respective target objects in the second image can be determined more accurately in this way, and particularly, in the case where the second image includes a plurality of target objects, the value of the second image can be measured more accurately in this way in consideration of the correlation between the plurality of target objects.

According to the first aspect, in a fourth possible implementation manner of the first aspect, the evaluation value is related to one or more of the following: the definition of a target object area where the target object is located; the number of pixels of the target object region; a shooting angle for shooting the target object area and the number of key features of the target object.

In practice, the second image may be evaluated from one or more of the four aspects described above, enabling a more accurate evaluation of the second image.

According to the first aspect, in a fifth possible implementation manner of the first aspect, the method further includes: judging whether the evaluation value meets a preset threshold value or not; and if the preset threshold value is met, sending the second image to a monitoring platform server.

That is, the method can separately send the images with high image quality to the monitoring platform server, so that the monitoring platform server can perform separate/important analysis on the images, thereby reducing the processing pressure of the monitoring platform server and improving the processing efficiency.

According to the first aspect, in a sixth possible implementation form of the first aspect, the method further comprises discarding the first image.

That is, the method may discard the first image not including the target object after not encoding the first image, thereby saving a storage space.

In a second aspect, embodiments of the present application provide a camera, including: a lens for receiving light for generating an image; a camera body for performing the target object detection method of the first aspect or one or more of the many possible implementations of the first aspect.

In a third aspect, an embodiment provides a target object detection apparatus, including: an image acquisition unit for acquiring at least one image; a target object detection unit for determining a target image having a target object by performing target object detection on the at least one image, and discarding an image without the target object; an encoding unit configured to perform encoding on the target image and generate an encoded image; and the sending unit is used for sending the coded image to the monitoring platform server.

According to a third aspect, in a first possible implementation manner of the third aspect, the apparatus further includes: and the image quality evaluation unit is used for evaluating the image quality of the target image and determining the evaluation value of the target image.

According to the third aspect, in a second possible implementation manner of the third aspect, the image quality evaluation unit is specifically configured to determine the evaluation value using sub-evaluation values of a plurality of target objects when the plurality of target objects are included in the target image.

According to the third aspect, in a third possible implementation manner of the third aspect, the image quality evaluation unit is further configured to determine whether the evaluation value satisfies a preset threshold; and sending the target image to a cache unit when the preset threshold is met.

According to the third aspect, in a fourth possible implementation manner of the third aspect, the apparatus further includes: and the cache unit is used for storing the target image meeting the preset threshold value.

According to the third aspect, in a fifth possible implementation manner of the third aspect, the sending unit is further configured to send the image in the caching unit to the monitoring platform server.

According to the third aspect, in a sixth possible implementation manner of the third aspect, the evaluation value is related to one or more of: the definition of a target object area where the target object is located; the number of pixels of the target object region; shooting the shooting angle of the target object area; and the number of key features possessed by the target object.

According to the third aspect, in a seventh possible implementation manner of the third aspect, the target object detection unit is specifically configured to select a representative image from the at least one image; performing target object detection on the representative image; and if the representative image has the target object, determining that the at least one image is the target image with the target object.

In a fourth aspect, an embodiment of the present application provides a camera, including: the lens is used for collecting light; the sensor is used for performing photoelectric conversion on the light collected by the lens to generate an image; a processor and a cluster of processors for performing the target object detection method of the first aspect as such or in one or more of its many possible implementations.

In a fifth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium, on which computer program instructions are stored, where the computer program instructions, when executed by a processor, implement a target object detection method that performs the first aspect or one or more of the many possible implementations of the first aspect.

These and other aspects of the present application will be more readily apparent from the following description of the embodiment(s).

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the description, serve to explain the principles of the application.

FIG. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application;

FIG. 2 shows a diagram of data processing for a smart camera according to an embodiment of the present application;

FIG. 3 illustrates a schematic structural diagram of an object detection system according to an embodiment of the present application;

FIG. 4 shows a flow chart of steps of a target object detection method according to an embodiment of the present application;

fig. 5 shows a block diagram of a target object detection apparatus according to an embodiment of the present application.

Detailed Description

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

In the embodiments of the present application, "/" may indicate a relationship in which the objects associated before and after are "or", for example, a/B may indicate a or B; "and/or" may be used to describe that there are three relationships for the associated object, e.g., A and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. For convenience in describing the technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" may be used to distinguish technical features having the same or similar functions. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily differ. In the embodiments of the present application, the words "exemplary" or "such as" are used to indicate examples, illustrations or illustrations, and any embodiment or design described as "exemplary" or "e.g.," should not be construed as preferred or advantageous over other embodiments or designs. The use of the terms "exemplary" or "such as" are intended to present relevant concepts in a concrete fashion for ease of understanding.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.

The technical scheme of the application is suitable for the field of video monitoring, the video monitoring is an important component of a safety precaution system, and for convenience of understanding, an application scene of the technical scheme is briefly described below with reference to fig. 1.

Fig. 1 shows an application scenario diagram to which the technical solution provided by the present application is applicable. As shown in fig. 1, a video surveillance system may be used to monitor road conditions. Before performing video surveillance, a target object of the video surveillance needs to be set. Target objects include, by way of example, pedestrians, non-motor vehicles, and motor vehicles.

The video surveillance system may include devices with audio/video capture capabilities and a surveillance platform server that performs data communication with the devices. In fig. 1, the video surveillance system includes only four video capture devices, but in implementation, the video surveillance system may include more or fewer video/audio capture devices as desired.

As an example, the video capture device may be a video camera, and the video camera may include a general video camera and an intelligent video camera, where the general video camera is a device that converts the shot video data into a suitable code rate and uploads the code rate to the monitoring platform server, that is, the general video camera needs to process the shot video data with the aid of the monitoring platform server (data processing such as object recognition), and the intelligent video camera may utilize an intelligent processing module embedded therein to perform image processing on the video data first and upload the processed video data to the monitoring platform server, where the intelligent processing module may include a face recognition module, a license plate recognition module, and other modules.

No matter ordinary camera or intelligent camera, the camera of this application all includes camera lens and camera body. The lens is used for receiving light rays for generating an image. In particular, the lens functions to present an optical image of the observed object on the sensor of the camera, also called optical imaging. The lens combines optical parts (reflecting mirror, transmission mirror and prism) with different shapes and different media (plastic, glass or crystal) according to a certain mode, so that after light is transmitted or reflected by the optical parts, the transmission direction of the light is changed according to the needs of people and the light is received by a receiving device, and the optical imaging process of an object is completed. Generally, each lens is formed by combining a plurality of groups of lenses with different curved surface curvatures according to different pitches. The selection of the indexes such as the distance, the curvature of the lens, the light transmission coefficient and the like determines the focal length of the lens. The main parameter indexes of the lens comprise: effective focal length, aperture, maximum image plane, field angle, distortion, relative illumination and the like, and all index values determine the comprehensive performance of the lens.

The camera body may include a sensor and a processor. A sensor (also called an image sensor) is a device for converting an optical image into an electronic signal, and is widely used in digital cameras and other electronic optical devices. Common sensors include: a charge-coupled device (CCD) and a Complementary Metal Oxide Semiconductor (CMOS). Both CCDs and CMOSs have a large number (e.g., tens of millions) of photodiodes (photodiodes), each referred to as a photosite, each photosite corresponding to a pixel. During exposure, the photodiode receives light and converts the light signal into an electrical signal containing brightness (or brightness and color), and the image is reconstructed accordingly. Bayer array is a common image sensor technology that can be used in CCD and CMOS, where Bayer filters are used to make different pixels sensitive to only one of red, blue, and green primary colors, the pixels are interleaved, and then de-mosaiced to recover the original image. The bayer array may be applied to a CCD or a CMOS, and a sensor to which the bayer array is applied is also called a bayer sensor. In addition to the bayer sensor, there is a sensor technology such as X3 (developed by Foveon corporation), and the X3 technology employs three layers of photosensitive elements, each of which records one of the color channels of RGB, so that an image sensor of all colors can be captured on one pixel.

A processor (also called an image processor), such as a system on a chip (SoC), is used to convert the image generated by the sensor into a three-channel format (e.g., YUV), improve image quality, and detect whether a target object is in the image, and also to encode the image. In the case of a smart camera, the smart processing module described above may be included in the processor. In the embodiment of the present invention, there may be only one processor (for example, a multifunctional integrated SoC), or a cluster composed of multiple processors (for example, multiple processors such as an ISP and an encoder).

As shown in fig. 2, the smart camera 201 may transmit data to the monitoring platform server using a communication network, and may transmit the data to a storage unit 202 (e.g., a hard disk) of the monitoring platform server and store the data in the storage unit 202, as an example. The monitoring platform server refers to a device capable of receiving data sent by the cameras, performing relevant processing on the data, and storing the data. In implementations, the monitoring platform server may be a single computing device or a plurality of computing devices, such as a server, a cluster of servers, a public cloud/private cloud.

Since the smart camera can perform image processing on a photographed image and transmit the image after the image processing to the monitoring platform server, the video monitoring system can preset data transmitted to the monitoring platform server. As an example, the data sent to the monitoring platform server may include a video stream captured by the smart camera, a target image determined using the smart processing module, and a region of interest (ROI) in the target image. The region of interest is often the region where the target object is located and may therefore be referred to as the target object region. In addition, in implementation, the smart camera may also transmit set contents, such as a captured video stream and a recognized face image, to the storage unit 202 through the server. In an exemplary embodiment of the present application, the smart camera may transmit a target image encoded video stream including only a target object and a target image satisfying a preset image quality to a monitoring platform.

As an example, fig. 3 shows a schematic structural diagram of the object detection system. As shown in fig. 3, the object detection system 300 includes a plurality of cameras 301 to 305 and a monitoring platform server 310, wherein each of the cameras 301 to 305 may be a common camera or a smart camera, and in the case that the cameras 301 to 305 are smart cameras, the smart processing module embedded in each smart camera may be the same or different.

The cameras 301 to 305 may transmit the acquired video data to the monitoring platform server, and the interfaces between the monitoring platform server 310 and the cameras 301 to 305 may be wired or wireless communication. The wired mode may include a Transmission Control Protocol/Internet Protocol (TCP/IP) communication technology in the ethernet technology, a User Datagram Protocol (UDP) technology or a standard Universal Serial Bus (USB) port, a COM interface, and other similar standard ports. The wireless communication mode may include WiFi, bluetooth, ZigBee, or Ultra Wideband (UWB) technology. The corresponding connection mode can be selected according to the actual application scene and the hardware form of the camera.

Fig. 4 shows a flowchart of steps of a target object detection method according to an embodiment of the present application, and in implementation, the target object detection method shown in fig. 4 may be executed by a smart camera in a target detection system.

In step S410, a first image is acquired, wherein the first image is an image acquired by the smart camera described above, or an image acquired by a general camera and transmitted to the corresponding smart camera.

As an example, the method may acquire at least one image within a preset interval, and select a first image from the at least one image in an image selection manner. That is, in consideration of the processing capability of the smart camera which is very challenging to detect in real time, a plurality of images within a preset interval may be acquired, and then a certain image may be selected from the plurality of images as a representative image.

The preset interval mentioned here may be a time interval, for example, multiple frames of images captured within five seconds, or may be a number interval, for example, 10 frames of images captured continuously, and the image selection mode indicates a mode preset by the user to select a representative image from the multiple images, for example, the image selection mode may be to select an intermediate image as the representative image, for example, to select a first frame of image as the representative image, which is not limited in this application.

In step S420, it is detected whether a target object is included in the first image, wherein the target object is preset, and in the implementation, the type of the target object may be a pedestrian, a motor vehicle and/or a non-motor vehicle. The number of the target objects may be a single target object or a plurality of target objects. In the case of a plurality of target objects, it is possible to determine that the first image includes a target object as long as one target object is detected. For example, in the case where a plurality of non-motor vehicles are included in the image, or in the case where a pedestrian, a motor vehicle, and a non-motor vehicle are included in the same image at the same time, if it is detected that the first image includes a non-motor vehicle, it may be determined that the first image includes the target object.

The set target object corresponds to an intelligent processing module embedded in the smart camera, that is, the smart camera has an intelligent processing module for detecting the target object, so that whether the first image includes the target object can be determined by using the intelligent processing module. The intelligent processing module may be implemented by a SoC.

By way of example, the intelligent processing module may indicate an Artificial Intelligence (AI) module corresponding to the target, which may be a machine learning module or a deep learning module, among others, where AI module refers to loading a large amount of data into the computing device and selecting a model to "fit" the data so that the computing device derives predictions/inferences. The model used by the computing device includes both simple equations (e.g., straight line equations) and very complex logic/mathematical systems that the computing device uses to learn patterns in the data once the model to be used is selected and adjusted (i.e., the model is refined by the adjustment). Finally, the model may be utilized to perform processing on the input data.

In an implementation, the AI module may be a module with corresponding target detection capability, and the model used by the AI module may be determined by a user or a technician actually using, for example, models corresponding to face recognition, pedestrian recognition, and license plate recognition.

In step S430, when the target object is not included in the first image, encoding is not performed on the first image. That is, in the case where it is determined that the first image does not include the target object, encoding is not performed on the first image. Optionally, the first image may be deleted (discarded), so as to reduce subsequent transmission amount and storage cost of the monitoring platform server. In implementation, if the first image is taken as a representative image, the plurality of images within the preset interval represented by the first image are not encoded and are deleted.

Encoding, sometimes referred to as compression, encodes three-channel images (e.g., YUV format images) for ease of transmission and viewing by a user. For example, images encoded in JPG format, or video encoded as h.264/h.265.

As shown in fig. 4, the method may further perform step S440, obtaining a second image, and if the second image is determined to include the target object after step S420 is performed, perform step S450, perform encoding on the second image and send the encoded second image to the monitoring platform server. In an implementation, the second image may be encoded in H.264/H.265 format and transmitted to the monitoring platform server. In the present application, the second image having the target object may also be described as a target image. Note that, in fig. 4, steps S410 and S440 are executed in parallel. In other embodiments, both may also be performed in sequence, that is: steps S410, S420 and S430 are performed first, and then steps S440, S420 and S450 are performed.

Further, the method includes performing an evaluation of image quality of the second image if it is determined that the second image includes the target object. One evaluation method is as follows: the image is evaluated with a single target object in the image, for example, an image including only the pedestrian a, or an image including the pedestrian a and the pedestrian B, and evaluation of the entire image is represented with evaluation for the pedestrian a/the pedestrian B. This practice ignores the relevance between multiple target objects. As an example, when a pedestrian a, a pedestrian B, and a pedestrian C cross a road together, and the pedestrian a, the pedestrian B, and the pedestrian C cross the road after a certain time, in the process, if only an image including the pedestrian a or a relevant region in the image (i.e., a region including the pedestrian a) is evaluated, the relevance between multiple pedestrians in the image is omitted, and the importance of the image is underestimated.

For this purpose, the method may determine an evaluation value of the second image using an evaluation method, wherein the evaluation value may be used to describe the image quality of the second image. In particular, for a case where the second image includes a plurality of target objects, the evaluation value of the second image may be determined using an evaluation method.

As an example, the method may determine the number of target objects included in the second image, and then determine a specific evaluation manner according to the number of target objects. The evaluation mode may include a single target object evaluation mode in which only a single target object is included in the second image or a multi-target object evaluation mode in which at least two target objects are included in the second image.

In implementation, the above-mentioned intelligent processing module may be utilized to detect the target objects and then determine the number of detected target objects. When it is determined that only a single target object is included in the second image, a single target object evaluation manner may be employed. Specifically, the target object may be evaluated from different dimensions according to an evaluation index of image quality, and the evaluation value of the second image may be determined. In other words, the evaluation value is correlated with an evaluation index. The evaluation index includes the definition of the target object region where the target is exclusively shared, the number of key features possessed by the target object, the number of pixels of the target object region, and the shooting angle of the shot target object region.

The definition of the target object region refers to the definition of each detail shadow and its boundary on the image, and in implementation, the definition can be used for measuring each detail in the image. The AI module needs to use the feature values of the key features of the target object in the process of identifying the target object, and therefore, the number of key features of the target object determines the identification rate of the AI module. The more pixels of the target object region, the larger the region occupied by the target object in the image, which is more advantageous for performing various image processing, and the photographing angle may indicate an angle of the photographed target object, for example, a front face of the photographed target object, or a side face of the photographed target object. As an example, the second image may be evaluated using one or more of the evaluation indexes described above, and the evaluation value of the second image may be determined.

Subsequently, the method determines whether the evaluation value is higher than a predetermined threshold, and if so, indicates that the image quality of the second image is high, and if so, indicates that the image quality of the second image is not high. The second image is sent to the buffer unit in case the evaluation value is above a predetermined threshold, and may be discarded in case the evaluation value is below the predetermined threshold. As an example, the method may send the images stored in the cache unit to the monitoring platform server together with the video stream generated by the encoding.

In implementation, the evaluation values may be divided into different levels, each corresponding to an evaluation value range, and if the evaluation value of the second image falls within the evaluation value range, the second image corresponds to the level. As an example, the evaluation value may be divided into five levels, which may include excellent, good, medium, passing, and failing. Subsequently, the method may transmit an image whose evaluation value is above a certain level to the buffer unit, and if it is below the certain level, discard the image, and the certain level may be a good level as an example. And finally, independently sending the images stored in the cache unit to a monitoring platform server.

When it is determined that at least two target objects are included in the second image, that is, the second image includes a plurality of target objects, the method may determine the evaluation value of the second image in a multi-target object evaluation manner in which the evaluation value of the second image is determined after each of the target objects included in the second image is evaluated. That is, the multi-target object evaluation manner is correlated with the sub-evaluation values of the plurality of target objects in the second image.

Specifically, the evaluation value of the second image can be determined using the sub-evaluation value acquired for each target object and the corresponding sub-evaluation weight. In an implementation, the sub-evaluation value corresponding to each target object of the plurality of target objects may be calculated, that is, the sub-evaluation value of the target object region corresponding to each target object is evaluated from different dimensions using the above-described evaluation index. For example, in a case where it is determined that the first target object, the second target object, and the third target object are included in the second image, a first target object region corresponding to the first target object, a second target object region corresponding to the second target object, and a third target object corresponding to the third target object in the second image may be determined.

The sub-evaluation values of the first target object, the second target object, and the third target object are calculated using the first target object area, the second target object area, and the third target object area, respectively. In practice, for each target object, index values corresponding to the respective evaluation indexes may be calculated and sub-evaluation values of the target object may be finally calculated using the evaluation values. It should be noted that in implementation, the manner of evaluation may be the same or different for each target object. As an example, different evaluation manners may be respectively determined according to the categories of the target objects. For example, the evaluation for pedestrians is different from the evaluation for motor vehicles.

After the sub-evaluation values of the first, second, and third target objects have been calculated, the evaluation value of the second image can be calculated using the following equation 1:

S＝∑a _i b _i (1)

where S indicates an evaluation value of the second image, a _i Sub-evaluation value indicating ith target object in second image, b _i Indicating the sub-rating weight of the ith target object. In an implementation, the sub-evaluation weight may be a weight set by a user in advance. The weight may be determined according to the feature value of the target object, for example, different sub-evaluation weights may be assigned according to the category of the target object, different sub-evaluation weights may be assigned according to the size of each target object region, and the like.

Subsequently, it is determined whether the evaluation value is higher than a predetermined threshold, and if the evaluation value is higher than the predetermined threshold, it indicates that the image quality of the second image is high, and if the evaluation value is lower than the predetermined threshold, it indicates that the image quality of the second image is not high. The second image is sent to the buffer unit in case the evaluation value is above a predetermined threshold, and may be discarded in case the evaluation value is below the predetermined threshold.

Likewise, different ranks may be divided according to the evaluation value, each rank corresponding to an evaluation value range, and if the evaluation value of the second image falls within the evaluation value range, the second image corresponds to the rank. The rating may be divided into five levels, which may include excellent, good, medium, passing, and failing, according to the evaluation value. Subsequently, an image whose evaluation value is above a certain level may be sent to the buffer unit, and if it is below the certain level, the image may be discarded, and the certain level may be a good level, as an example. As an example, the method may send the image that meets the preset requirement to the monitoring platform server together with the video stream generated by encoding, that is, send the image stored in the cache unit to the monitoring platform server together with the video stream.

It is to be understood that the above-mentioned terminal and the like include hardware structures and/or software modules corresponding to the respective functions for realizing the above-mentioned functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

In the embodiment of the present application, the terminal and the like may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or at least two functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and another division manner may be available in actual implementation.

Fig. 5 shows a block diagram of a target object detection apparatus according to an embodiment of the present application, in a case where each functional module is divided with corresponding functions. The target object detection apparatus 500 includes an image acquisition unit 510, a target object detection unit 520, an encoding unit 530, and a transmission unit 540, wherein the image acquisition unit 510 is configured to acquire at least one image; the target object detection unit 520 is configured to determine a target image having a target object by performing target object detection on the at least one image, and discard images without the target object; the encoding unit 530 is configured to perform encoding on the target image to generate an encoded image; and a sending unit 540, configured to send the encoded image to the monitoring platform server.

Optionally, the target object detection apparatus 500 further includes an image quality evaluation unit configured to perform evaluation on the image quality of the target image and determine an evaluation value of the target image.

Alternatively, the image quality evaluation unit is specifically configured to determine the evaluation value using sub-evaluation values of a plurality of target objects when the plurality of target objects are included in the target image.

Optionally, the image quality evaluation unit is further configured to determine whether the target image meets a preset threshold; and sending the target image to a cache unit under the condition that the preset threshold value is met.

Optionally, the target object detection apparatus 500 further comprises: and the cache unit is used for storing the target image meeting the preset threshold value.

Optionally, the sending unit 540 is further configured to send the image in the cache unit to the monitoring platform server.

Optionally, the evaluation value is related to one or more of: the definition of a target object area where the target object is located; the number of pixels of the target object region; the shooting angle of the target object area and the number of key features of the target object are shot.

Optionally, the target object detecting unit 510 is specifically configured to select a representative image from the at least one image; performing target object detection on the representative image; and if the representative image has the target object, determining that the at least one image is the target image with the target object.

An embodiment of the present application provides a camera including: the lens is used for collecting light; the sensor is used for performing photoelectric conversion on the light collected by the lens to generate an image; a processor and a cluster of processors for performing the method as described above.

Embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer program instructions may be executed by a camera or general purpose computer.

Embodiments of the present application provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable Programmable Read-Only Memory (EPROM or flash Memory), a Static Random Access Memory (SRAM), a portable Compact Disc Read-Only Memory (CD-ROM), a Digital Versatile Disc (DVD), a Memory stick, a floppy disk, a mechanical coding device, a punch card or an in-groove protrusion structure, for example, having instructions stored thereon, and any suitable combination of the foregoing.

The computer readable program instructions or code described herein may be downloaded to the respective computing/processing device from a computer readable storage medium, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present application may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present application by utilizing state information of the computer-readable program instructions to personalize custom electronic circuitry, such as Programmable Logic circuits, Field-Programmable Gate arrays (FPGAs), or Programmable Logic Arrays (PLAs).

Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is also noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., a Circuit or an ASIC) for performing the corresponding function or action, or by combinations of hardware and software, such as firmware.

While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Having described embodiments of the present application, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A target object detection method, comprising:

acquiring a first image;

detecting whether a target object is included in the first image;

when the target object is not included in the first image, not performing encoding on the first image;

acquiring a second image;

detecting whether a target object is included in the second image;

and when the second image comprises the target object, encoding the second image and sending the encoded second image to a monitoring platform server.

2. The method of claim 1, wherein acquiring the first image comprises:

acquiring at least one image in a preset interval, wherein the preset interval comprises a preset time interval and a preset number interval;

selecting the first image from the at least one image in an image selection manner, the method further comprising:

determining not to perform encoding on the at least one image.

3. The method of claim 1, further comprising:

and determining an evaluation value of the second image by using an evaluation mode, wherein the evaluation value is used for describing the image quality of the second image.

4. The method of claim 3, wherein the evaluation mode comprises:

when a plurality of target objects are included in the second image, the evaluation value of the second image is correlated with the sub-evaluation values of the plurality of target objects.

5. A method according to claim 3 or 4, wherein the rating value relates to one or more of:

the definition of a target object area where the target object is located;

the number of pixels of the target object region;

shooting the shooting angle of the target object area; and

the number of key features possessed by the target object.

6. The method of any of claims 3 to 5, further comprising:

judging whether the evaluation value meets a preset threshold value or not;

and if the preset threshold value is met, sending the second image to a monitoring platform server.

7. The method of any of claims 1 to 6, further comprising:

discarding the first image.

8. A camera, characterized in that the camera comprises:

a lens for receiving light for generating an image;

a camera body for performing the method of any one of claims 1 to 7.

9. A target object detection apparatus, characterized by comprising:

an image acquisition unit for acquiring at least one image;

a target object detection unit for determining a target image having a target object by performing target object detection on the at least one image, and discarding an image without the target object;

an encoding unit configured to perform encoding on the target image to generate an encoded image;

and the sending unit is used for sending the coded image to the monitoring platform server.

10. The apparatus of claim 9, further comprising:

and the image quality evaluation unit is used for evaluating the image quality of the target image and determining the evaluation value of the target image.

11. The apparatus according to claim 10, wherein the image quality evaluation unit is specifically configured to determine the evaluation value using sub-evaluation values of a plurality of target objects when the plurality of target objects are included in the target image.

12. The apparatus according to claim 10 or 11, wherein the image quality evaluation unit is further configured to determine whether the evaluation value satisfies a preset threshold; and sending the target image to a cache unit under the condition that the preset threshold value is met.

13. The apparatus as recited in claim 12, further comprising:

and the cache unit is used for storing the target image meeting the preset threshold value.

14. The apparatus of claim 13, wherein the sending unit is further configured to send the image in the caching unit to a monitoring platform server.

15. An apparatus as claimed in any one of claims 11 to 14, wherein the evaluation value relates to one or more of:

the definition of a target object area where the target object is located;

the number of pixels of the target object region;

shooting the shooting angle of the target object area; and

the number of key features possessed by the target object.

16. The device according to any of the claims 9 to 15, characterized in that the target object detection unit is specifically adapted to select a representative image from the at least one image; performing target object detection on the representative image; and if the representative image has the target object, determining that the at least one image is the target image with the target object.

17. A camera, comprising:

the lens is used for collecting light;

the sensor is used for performing photoelectric conversion on the light collected by the lens to generate an image;

processor and cluster of processors for performing the method of any of claims 1-7.

18. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 1-7.