WO2022156763A1

WO2022156763A1 - Target object detection method and device thereof

Info

Publication number: WO2022156763A1
Application number: PCT/CN2022/073151
Authority: WO
Inventors: 孔令广
Original assignee: 华为技术有限公司
Priority date: 2021-01-25
Filing date: 2022-01-21
Publication date: 2022-07-28
Also published as: CN114898239A

Abstract

A target object detection method and a device thereof. The method comprises performing target object detection on images photographed by a camera, wherein only an image having a target object is encoded and transmitted to a monitoring platform server, and an image having no target object is not encoded and is not transmitted to the monitoring platform server.

Description

Target object detection method and device thereof

This application claims the priority of the Chinese patent application with the application number 202110099193.0 and the application name "Target object detection method and apparatus thereof" filed with the Chinese Patent Office on January 25, 2021, the entire contents of which are incorporated into this application by reference .

technical field

The present application relates to the field of monitoring, and in particular, to a target object detection method and device thereof.

Background technique

With the development of video surveillance technology, video surveillance has evolved from simply recording video to intelligence. This intelligent trend also deeply affects the development of cameras, and the current cameras are showing a trend of more and more intelligence. At the same time, thanks to the rapid development of computer vision, deep learning and other technologies, the automatic recognition accuracy of smart cameras has surpassed that of humans.

As cameras become more and more intelligent, the monitoring efficiency is improved and labor costs are saved, which plays an important role in promoting the modernization of safe cities and smart cities, as well as industrial production automation. One of the difficulties in the related art is how to use the camera to reduce the analysis pressure of the background analysis system and improve the accuracy of the background analysis system.

SUMMARY OF THE INVENTION

In view of this, a target object detection method and device thereof are proposed, which can reduce the data amount of the encoded image.

In a first aspect, an embodiment of the present application provides a target object detection method, the method includes: acquiring a first image; detecting whether the first image includes a target object; When the target object is included, do not perform encoding on the first image; acquire a second image; detect whether the second image includes the target object; when the second image includes the target object, perform the encoding on the second image Encoding and sending the encoded second image to the monitoring platform server.

As cameras become more and more intelligent, the cameras can perform relevant image processing on each image in the video stream after acquiring the video stream. Therefore, if the intelligent camera can identify valid images and transmit them to the monitoring platform server, not only It can reduce the data transmission pressure, and can reduce the storage pressure and data processing pressure of the monitoring platform server. Based on this, the target detection method performs target object detection on the images captured by the camera, and encodes only the images with the target objects, and the images without the target objects may not be encoded, thus reducing the number of encoded images. In addition, images without a target object may not be sent to the monitoring platform server, thereby reducing the subsequent transmission amount and the storage cost of the monitoring platform server.

According to the first aspect, in a first possible implementation manner of the first aspect, acquiring the first image includes: acquiring at least one image within a preset interval, where the preset interval includes a preset time interval and a preset a number interval; selecting a first image from the at least one image in an image selection manner, the method further comprising: determining not to perform encoding on the at least one image.

In the implementation, considering that real-time detection is very challenging for the processing capability of the smart camera, multiple images within a preset interval can be acquired, and then an image can be selected from the multiple images as the representative image, which can reduce the processing pressure of the smart camera. . If the first image is not encoded, these images are not encoded, which can further reduce the processing pressure of the smart camera. In other words, the processing method representing the image is taken as the processing method of the multiple images as a whole, and the processing method includes: encoding/non-encoding, sending to the monitoring platform server/not sending to the monitoring platform server.

According to the first aspect, in a second possible implementation manner of the first aspect, the method further includes: using an evaluation method to determine an evaluation value of the second image, where the evaluation value is used to describe the quality of the second image. Image Quality.

In the related art, generally only images including a single target object are evaluated, thus ignoring the relevance among multiple target objects, thereby underestimating the importance of the images. In order to better measure the value of the first image, an evaluation method may be used to determine the evaluation value of the second image that already has the target object. That is, evaluating the second image by the quantized value (evaluation value) enables the user to evaluate the first image more intuitively.

According to the first aspect, in a third possible implementation manner of the first aspect, the evaluation method includes: when the second image includes multiple target objects, the evaluation value of the second image is the same as the first The sub-evaluation values of the multiple target objects in the two images are correlated.

That is to say, in the case where the second image includes multiple target objects, the evaluation of the second image involves each target object in the second image. In this way, the value of the second image can be more accurately determined, especially Specifically, in the case that the second image includes multiple target objects, this method takes into account the correlation between the multiple target objects, so that the value of the second image can be more accurately measured.

According to the first aspect, in a fourth possible implementation manner of the first aspect, the evaluation value is related to one or more of the following: the definition of the target object area where the target object is located; the pixels of the target object area Quantity; the shooting angle of the target object area and the number of key features possessed by the target object.

In implementation, the second image may be evaluated from one or more of the above four aspects, so that the second image can be more accurately evaluated.

According to the first aspect, in a fifth possible implementation manner of the first aspect, the method further includes: judging whether the evaluation value satisfies a preset threshold; if it satisfies the preset threshold, storing the second image Sent to the monitoring platform server.

That is, the method can individually send images with high image quality to the monitoring platform server, so that the monitoring platform server can perform separate/focus analysis on these images, thereby reducing the processing pressure of the monitoring platform server and improving processing efficiency.

According to the first aspect, in a sixth possible implementation manner of the first aspect, the method further includes discarding the first image.

That is, after the method does not encode the first image that does not include the target object, the first image can be discarded, thereby saving storage space.

In a second aspect, embodiments of the present application provide a camera, where the camera includes: a lens for receiving light for generating an image; a camera body for implementing the first aspect or multiple possibilities of the first aspect One or several target object detection methods in the implementation manner.

In a third aspect, the embodiments provide a target object detection device, the device includes: an image acquisition unit for acquiring at least one image; a target object detection unit for performing target object detection on the at least one image, A target image with a target object is determined, and images without a target object are discarded; an encoding unit is used to perform encoding on the target image to generate an encoded image; and a sending unit is used to send the encoded image to the monitoring platform server.

According to a third aspect, in a first possible implementation manner of the third aspect, the device further includes: an image quality evaluation unit, configured to perform evaluation on the image quality of the target image, and determine an evaluation value of the target image .

According to a third aspect, in a second possible implementation manner of the third aspect, the image quality evaluation unit is specifically configured to use the sub-evaluation values of the multiple target objects when the target image includes multiple target objects The evaluation value is determined.

According to a third aspect, in a third possible implementation manner of the third aspect, the image quality evaluation unit is further configured to determine whether the evaluation value satisfies a preset threshold; The target image is sent to the cache unit.

According to a third aspect, in a fourth possible implementation manner of the third aspect, the device further includes: a cache unit, configured to store a target image that meets the preset threshold.

According to the third aspect, in a fifth possible implementation manner of the third aspect, the sending unit is further configured to send the image in the cache unit to the monitoring platform server.

According to a third aspect, in a sixth possible implementation manner of the third aspect, the evaluation value is related to one or more of the following: the definition of the target object area where the target object is located; the pixels of the target object area number; the shooting angle of the target object area; and the number of key features possessed by the target object.

According to a third aspect, in a seventh possible implementation manner of the third aspect, the target object detection unit is specifically configured to select a representative image from the at least one image; perform target object detection on the representative image; If the representative image has a target object, it is determined that the at least one image is a target image with the target object.

In a fourth aspect, an embodiment of the present application provides a camera, including: a lens for collecting light; a sensor for generating an image by performing photoelectric conversion on the light collected by the lens; a processor and a processor cluster for executing the above The first aspect or one or more target object detection methods in multiple possible implementation manners of the first aspect.

In a fifth aspect, embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the above-mentioned first aspect is implemented Or one or more target object detection methods in multiple possible implementation manners of the first aspect.

These and other aspects of the present application will be more clearly understood in the following description of the embodiment(s).

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features and aspects of the application and together with the description, serve to explain the principles of the application.

FIG. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application;

FIG. 2 shows a diagram of data processing of a smart camera according to an embodiment of the present application;

3 shows a schematic structural diagram of a target detection system according to an embodiment of the present application;

FIG. 4 shows a flow chart of steps of a target object detection method according to an embodiment of the present application;

FIG. 5 shows a block diagram of a target object detection apparatus according to an embodiment of the present application.

Detailed ways

Various exemplary embodiments, features and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures denote elements that have the same or similar functions. While various aspects of the embodiments are shown in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.

In this embodiment of the present application, "/" may indicate that the objects associated before and after are an "or" relationship, for example, A/B may indicate A or B; "and/or" may be used to describe that there are three types of associated objects A relationship, for example, A and/or B, can mean that A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural. In order to facilitate the description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" may be used to distinguish technical features with the same or similar functions. The words "first", "second" and the like do not limit the quantity and execution order, and the words "first", "second" and the like do not limit the difference. In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations, and any embodiment or design solution described as "exemplary" or "for example" should not be construed are preferred or advantageous over other embodiments or designs. The use of words such as "exemplary" or "such as" is intended to present the relevant concepts in a specific manner to facilitate understanding.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, in order to better illustrate the present application, numerous specific details are given in the following detailed description. It should be understood by those skilled in the art that the present application may be practiced without certain specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present application.

The technical solution of the present application is applicable to the field of video surveillance, and video surveillance is an important part of a security protection system. For ease of understanding, the application scenario of the technical solution will be briefly described below with reference to FIG. 1 .

FIG. 1 is a schematic diagram of an application scenario to which the technical solution provided by the present application is applicable. As shown in Figure 1, a video surveillance system can be used to monitor road conditions. Before performing video surveillance, it is necessary to set the target object of video surveillance. As examples, target objects include pedestrians, non-motor vehicles, and motor vehicles.

The video surveillance system may include devices with audio/video capture functions and a surveillance platform server that performs data communication with these devices. In FIG. 1 , the video surveillance system includes only four video capture devices, but in practice, the video surveillance system may include more or less video/audio capture devices as needed.

As an example, the video capture device may be a camera, and the camera may include a common camera and a smart camera, where the common camera refers to a device that converts the captured video data into a suitable bit rate and uploads it to the monitoring platform server, that is, Ordinary cameras need to use the monitoring platform server to process the captured video data (data processing such as object recognition), while smart cameras can use the intelligent processing module embedded in them to first perform image processing on the video data, and then process the processed video data. The video data is uploaded to the monitoring platform server, wherein the intelligent processing module may include modules such as a face recognition module, a license plate recognition module, and the like.

Regardless of an ordinary camera or a smart camera, the camera of the present application includes a lens and a camera body. The lens is used to receive the light used to generate the image. Specifically, the function of the lens is to present the light image of the observed target on the sensor of the camera, also known as optical imaging. The lens combines various optical parts (reflectors, transmission mirrors, prisms) of different shapes and different media (plastic, glass or crystal) in a certain way, so that after the light is transmitted or reflected by these optical parts, according to people's expectations. It is necessary to change the transmission direction of the light to be received by the receiving device to complete the optical imaging process of the object. Generally speaking, each lens is composed of multiple groups of lenses with different curved curvatures combined at different intervals. The focal length of the lens is determined by the selection of indicators such as spacing, lens curvature, and light transmittance. The main parameters of the lens include: effective focal length, aperture, maximum image plane, field of view, distortion, relative illumination, etc. The value of each index determines the overall performance of the lens.

The camera body may include a sensor and a processor. A sensor (also known as an image sensor) is a device that converts an optical image into an electronic signal, and is widely used in digital cameras and other electronic and optical devices. Common sensors include: charge-coupled device (CCD) and complementary metal oxide semiconductor (complementary MOS, CMOS). Both CCD and CMOS have a large number (eg, tens of millions) of photodiodes, each photodiode is called a photosensitive cell, and each photosensitive cell corresponds to a pixel. During exposure, the photodiode converts the light signal into an electrical signal containing brightness (or brightness and color) after receiving light, and the image is reconstructed accordingly. Bayer array is a common image sensor technology that can be used in CCD and CMOS. Bayer array uses Bayer color filter to make different pixels only sensitive to one of the three primary colors of red, blue and green. These pixels are interleaved and then interpolated by demosaicing to restore the original image. Bayer arrays can be applied to CCD or CMOS, and sensors using Bayer arrays are also called Bayer sensors. In addition to the Bayer sensor, there are also sensor technologies such as X3 (developed by Foveon). X3 technology uses three layers of photosensitive elements, each layer records one of the color channels of RGB, so it can capture all colors on one pixel. Image sensor.

The processor (aka image processor), such as a system-on-chip (SoC), is used to convert the image produced by the sensor into a three-channel format (such as YUV), improve the image quality, and detect whether there is a target object in the image, and is also used for Encode the image. In the case of a smart camera, the above-mentioned smart processing module may be included in the processor. In this embodiment of the present invention, there may be only one processor (eg, a multi-function integrated SoC), or a cluster composed of multiple processors (eg, multiple processors including an ISP and an encoder).

As shown in FIG. 2 , the smart camera 201 can transmit the data to the monitoring platform server by using the communication network. As an example, the data can be transmitted to the storage unit 202 (such as a hard disk) of the monitoring platform server and store the data in the storage unit 202 (such as a hard disk). in unit 202. The monitoring platform server refers to a device that can receive the data sent by the camera, perform related processing on the data, and store the data. In implementation, the monitoring platform server may be a single computing device or multiple computing devices, such as a server, server cluster, public cloud/private cloud.

Since the smart camera can perform image processing on the captured images and send the processed images to the monitoring platform server, the video monitoring system can preset the data sent to the monitoring platform server. As an example, the data sent to the monitoring platform server may include a video stream collected by the smart camera, a target image determined by an intelligent processing module, and a region of interest (ROI) in the target image. The region of interest is often the region where the target object is located, so it can be called the target object region. In addition, in implementation, the smart camera can also send the set content to the storage unit 202 through the server, for example, the captured video stream and the recognized face image. In an exemplary embodiment of the present application, the smart camera may send to the monitoring platform an encoded video stream that only includes a target image of the target object and a target image that meets a preset image quality.

As an example, Figure 3 shows a schematic structural diagram of the target detection system. As shown in FIG. 3 , the target detection system 300 includes a plurality of cameras 301 to 305 and a monitoring platform server 310, wherein each camera in the plurality of cameras 301 to 305 may be an ordinary camera or a smart camera, In the case where the cameras 301 to 305 are smart cameras, the smart processing modules embedded in each smart camera may be the same or different.

The cameras 301 to 305 can transmit the acquired video data to the monitoring platform server, and the interface connecting the monitoring platform server 310 and the cameras 301 to 305 can be wired or wireless communication. The wired mode may include the transmission control protocol/internet protocol (TCP/IP) communication technology, the user datagram protocol (UDP) technology or the standard universal serial bus ( universal serial bus, USB) port, COM interface and other similar standard ports. The wireless communication method may include technologies such as WiFi, Bluetooth, ZigBee or ultra wideband (UWB). The corresponding connection method can be selected according to the actual application scenario and the hardware form of the camera.

FIG. 4 shows a flow chart of steps of a target object detection method according to an embodiment of the present application. In implementation, the target object detection method shown in FIG. 4 can be executed by a smart camera in a target detection system.

In step S410, a first image is acquired, wherein the first image is an image acquired by the smart camera described above, or an image acquired by an ordinary camera, and the image is sent to the corresponding smart camera.

As an example, the method may acquire at least one image within a preset interval, and select the first image from the at least one image in an image selection manner. That is to say, considering that real-time detection is very challenging for the processing capability of the smart camera, it is possible to acquire multiple images within a preset interval, and then select an image from the multiple images as a representative image.

The preset interval mentioned here may be a time interval, such as multiple frames of images captured within five seconds, or may be a number interval, such as 10 frames of images captured continuously. The method of selecting a representative image from among the multiple images, for example, the image selection method may be to select an intermediate image as the representative image, or, for example, to select the first frame of image as the representative image, which is not limited in this application.

In step S420, it is detected whether a target object is included in the first image, wherein the target object is preset, and in implementation, the type of the target object may be a pedestrian, a motor vehicle and/or a non-motor vehicle. The number of the target objects may be a single target object or multiple target objects. In the case of multiple target objects, as long as one target object is detected, it can be determined that the first image includes the target object. For example, if the image includes multiple non-motor vehicles, or if the same image includes pedestrians, motor vehicles and non-motor vehicles at the same time, if it is detected that the first image includes non-motor vehicles, the It is determined that the first image includes the target object.

The set target object corresponds to the intelligent processing module embedded in the smart camera, that is, the smart camera has an intelligent processing module for detecting the target object, so that the intelligent processing module can be used to determine whether the first image includes the target object. The intelligent processing module can be implemented by an SoC.

As an example, the intelligent processing module may indicate an artificial intelligence (AI) module corresponding to the target, and may also be a machine learning module or a deep learning module, etc., wherein the AI module refers to loading a large amount of data into a computing device And choose a model to "fit" the data so that the computing device comes up with predictions/inferences. Models used by computing devices include both simple equations (such as equations for a straight line) and very complex logical/mathematical systems, once the model to be used is selected and adjusted (that is, the model is improved by adjustment), Computing devices use the model to learn patterns in the data. Finally, the model can be used to perform processing on the input data.

In implementation, the AI module may be a module with corresponding target detection capability, and the model used by it may be determined by actual users or technicians, for example, models corresponding to face recognition, pedestrian recognition, and license plate recognition.

In step S430, when the target object is not included in the first image, encoding is not performed on the first image. That is, in a case where it is determined that the first image does not include the target object, encoding is not performed on the first image. Optionally, the first image may be deleted (discarded), thereby reducing the subsequent transmission amount and the storage cost of the monitoring platform server. In implementation, if the first image is used as the representative image, encoding is not performed on multiple images within the preset interval represented by the first image, and all are deleted.

Encoding, sometimes called compression, is the encoding of three-channel images (such as YUV format images) for ease of transmission and viewing by users. For example, an image encoded in JPG format, or a video encoded in H.264/H.265.

As shown in FIG. 4 , the method may further perform step S440 to acquire a second image. If it is determined that the second image includes the target object after performing step S420 in the method, step S450 is performed to perform step S450 on the second image. The encoding is performed and the encoded second image is sent to the monitoring platform server. In an implementation, the second image may be encoded in H.264/H.265 format and transmitted to the monitoring platform server. In this application, the second image with the target object may also be described as the target image. It should be noted that, in FIG. 4 , steps S410 and S440 are executed in parallel. In other embodiments, the two may also be performed sequentially, that is, steps S410, S420 and S430 are performed first, and then steps S440, S420 and S450 are performed.

Furthermore, the method further includes performing an evaluation on the image quality of the second image if it is determined that the second image includes the target object. One evaluation method is to use a single target object in the image to evaluate the image, for example, for the image including only pedestrian A, or for the image including pedestrian A and pedestrian B, use the evaluation for pedestrian A/pedestrian B. Evaluation to represent the evaluation of the entire image. This approach ignores the association between multiple target objects. As an example, pedestrian A, pedestrian B, and pedestrian C cross the road together. After a period of time after pedestrian A and pedestrian B walk together, pedestrian A, pedestrian B, and pedestrian C walk together. During this process, if only the images including pedestrian A or If the relevant area in the image (that is, the area including pedestrian A) is evaluated, the correlation between multiple pedestrians in the image will be ignored, thus underestimating the importance of the image.

To this end, the method may utilize an evaluation method to determine an evaluation value of the second image, wherein the evaluation value may be used to describe the image quality of the second image. In particular, in the case where the second image includes multiple target objects, the evaluation value of the second image may be determined by using an evaluation method.

As an example, the method may first determine the number of target objects included in the second image, and then determine a specific evaluation manner according to the number of target objects. The evaluation method may include a single target object evaluation method in which only a single target object is included in the second image or a multi-target object evaluation method in which at least two target objects are included in the second image.

In implementation, the above-mentioned intelligent processing module can be used to detect target objects, and then determine the number of detected target objects. When it is determined that only a single target object is included in the second image, a single-target object evaluation method may be adopted. Specifically, the target object can be evaluated from different dimensions according to the evaluation index of image quality, and the evaluation value of the second image can be determined. In other words, the evaluation value is related to the evaluation index. The evaluation index includes the clarity of the target object area where the target is exclusively located, the number of key features possessed by the target object, the number of pixels in the target object area, and the shooting angle of the photographed target object area.

The sharpness of the target object area refers to the sharpness of each detail shadow and its boundary on the image, and in implementation, the sharpness can be used to measure each detail in the image. As mentioned above, the AI module needs to use the feature values of the key features of the target object in the process of recognizing the target object. Therefore, the number of key features possessed by the target object determines the recognition rate of the AI module. The more pixels in the target object area, the larger the area occupied by the target object in the image, which is more conducive to various image processing. , or the profile of the target subject being photographed. As an example, the second image may be evaluated by using one or more of the above-mentioned evaluation indicators to determine the evaluation value of the second image.

Then, the method determines whether the evaluation value is higher than a predetermined threshold, if higher than the predetermined threshold, the image quality of the second image is high, and if it is lower than the predetermined threshold, the image quality of the second image is not high. If the evaluation value is higher than the predetermined threshold, the second image is sent to the buffer unit, and if the evaluation value is lower than the predetermined threshold, the second image may be discarded. As an example, the method may send the image stored in the buffer unit together with the video stream generated by encoding to the monitoring platform server.

In implementation, different grades may be divided according to the evaluation value, each grade corresponds to a range of evaluation values, and if the evaluation value of the second image falls within the range of evaluation values, the second image corresponds to this grade. As an example, the evaluation value may be divided into five grades, which may include excellent, good, moderate, pass, and fail. Subsequently, the method may send an image with an evaluation value above a certain level to the cache unit, and if it is lower than a certain level, discard the image, as an example, the certain level may be a good level. Finally, the images stored in the cache unit are individually sent to the monitoring platform server.

When it is determined that the second image includes at least two target objects, that is, the second image includes multiple target objects, the method may adopt a multi-target object evaluation method to determine the evaluation value of the second image, wherein the multi-object The object evaluation method is a method in which the evaluation value of the second image is determined after each target object included in the second image is evaluated separately. That is to say, the multi-target evaluation method is related to the sub-evaluation values of the plurality of target objects in the second image.

Specifically, the evaluation value of the second image may be determined by using the sub-evaluation value obtained for each target object and the corresponding sub-evaluation weight. In implementation, a corresponding sub-evaluation value may be calculated for each target object in the plurality of target objects, that is, the target object area corresponding to each target object may be evaluated from different dimensions by using the evaluation indicators described above. The sub-evaluation value of . For example, in the case where it is determined that the second image includes the first target object, the second target object and the third target object, the first target object area corresponding to the first target object in the second image and the The second target object area corresponding to the second target object and the third target object corresponding to the third target object.

Using the first target object area, the second target object area and the third target object area, the sub-evaluation values of the first target object, the second target object and the third target object are calculated respectively. In implementation, for each target object, index values corresponding to each evaluation index can be calculated, and the sub-evaluation values of the target object can be finally calculated by using these evaluation values. It should be noted that, in implementation, the evaluation methods for each target object may be the same or different. As an example, different evaluation methods may be determined according to the category of the target object. For example, the evaluation method for pedestrians is different from the evaluation method for motor vehicles.

After the sub-evaluation values of the first target object, the second target object and the third target object have been calculated, the following formula 1 can be used to calculate the evaluation value of the second image:

S=∑a _i b _i (1)

Wherein, S indicates the evaluation value of the second image, a _i indicates the sub-evaluation value of the ith target object in the second image, and b _i indicates the sub-evaluation weight of the ith target object. In implementation, the sub-evaluation weight may be a weight preset by a user. It may also be a weight determined according to the characteristic value of the target object. For example, different sub-evaluation weights may be assigned according to the category of the target object, and different sub-evaluation weights may also be assigned according to the size of each target object area, and so on.

Then, it is determined whether the evaluation value is higher than a predetermined threshold. If the evaluation value is higher than the predetermined threshold, the image quality of the second image is high, and if it is lower than the predetermined threshold, the image quality of the second image is not high. If the evaluation value is higher than the predetermined threshold, the second image is sent to the buffer unit, and if the evaluation value is lower than the predetermined threshold, the second image may be discarded.

Similarly, it can be divided into different grades according to the evaluation value, each grade corresponds to a range of evaluation values, and if the evaluation value of the second image falls within the range of evaluation values, the second image corresponds to this grade. It can be divided into five grades according to the evaluation value, and the five grades can include excellent, good, medium, pass and fail. Subsequently, an image whose evaluation value is above a certain level may be sent to the buffer unit, and if it is lower than a certain level, the image may be discarded. As an example, the certain level may be a good level. As an example, the method may send an image that meets the preset requirements together with a video stream generated by encoding to the monitoring platform server, that is, send the image stored in the cache unit to the monitoring platform server together with the video stream.

It can be understood that, in order to realize the above-mentioned functions, the above-mentioned terminal and the like include corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that, in conjunction with the units and algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Experts may use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of the embodiments of the present application.

In this embodiment of the present application, functional modules may be divided into the above terminal and the like according to the above method examples. For example, each functional module may be divided corresponding to each function, or two or at least two functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.

In the case where each functional module is divided according to each function, FIG. 5 shows a block diagram of a target object detection device according to an embodiment of the present application. The target object detection device 500 includes an image acquisition unit 510, a target object detection unit 520, an encoding unit 530 and a sending unit 540, wherein the image acquisition unit 510 is used for acquiring at least one image; the target object detection unit 520 is used for Performing target object detection on the at least one image, determining a target image with a target object, and discarding images without the target object; the encoding unit 530 is configured to perform encoding on the target image to generate an encoded image; the sending unit 540 is used for Send the encoded image to the monitoring platform server.

Optionally, the target object detection device 500 further includes an image quality evaluation unit, configured to perform evaluation on the image quality of the target image, and determine the evaluation value of the target image.

Optionally, the image quality evaluation unit is specifically configured to use the sub-evaluation values of the multiple target objects to determine the evaluation value when the target image includes multiple target objects.

Optionally, the image quality evaluation unit is further configured to determine whether the target image satisfies a preset threshold; if the preset threshold is met, send the target image to a cache unit.

Optionally, the target object detection device 500 further includes: a cache unit, configured to store the target image satisfying the preset threshold.

Optionally, the sending unit 540 is further configured to send the image in the cache unit to the monitoring platform server.

Optionally, the evaluation value is related to one or more of the following: the clarity of the target object area where the target object is located; the number of pixels in the target object area; the shooting angle for photographing the target object area and the The number of key features the target object possesses.

Optionally, the target object detection unit 510 is specifically configured to select a representative image from the at least one image; perform target object detection on the representative image; if there is a target object in the representative image, determine the at least one An image is a target image with a target object.

An embodiment of the present application provides a camera, including: a lens for collecting light; a sensor for generating an image by performing photoelectric conversion on the light collected by the lens; a processor and a processor cluster for executing the above-mentioned method.

Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method. Computer program instructions can be executed by a video camera or by a general purpose computer.

Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.

A computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .

The computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

The computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet). In some embodiments, electronic circuits, such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions. Logic Array, PLA), the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.

Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams. These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.

Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions. In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented by a combination of hardware and software, such as firmware.

While the invention has been described herein in connection with various embodiments, those skilled in the art will understand and understand from a review of the drawings, the disclosure, and the appended claims in practicing the claimed invention. Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.

Various embodiments of the present application have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein was chosen to best explain the principles of the various embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the various embodiments disclosed herein.

Claims

A target object detection method, comprising:

get the first image;

detecting whether a target object is included in the first image;

When the target object is not included in the first image, encoding is not performed on the first image;

get the second image;

detecting whether a target object is included in the second image;

When the target object is included in the second image, encoding is performed on the second image and the encoded second image is sent to the monitoring platform server.
The method of claim 1, wherein acquiring the first image comprises:

acquiring at least one image within a preset interval, wherein the preset interval includes a preset time interval and a preset number of intervals;

Selecting the first image from the at least one image in an image selection manner, the method further comprising:

It is determined that no encoding is performed on the at least one image.
The method of claim 1, further comprising:

Using the evaluation method, an evaluation value of the second image is determined, and the evaluation value is used to describe the image quality of the second image.
The method of claim 3, wherein the evaluation method comprises:

When the second image includes a plurality of target objects, the evaluation value of the second image is related to the sub-evaluation values of the plurality of target objects.
The method of claim 3 or 4, wherein the evaluation value is related to one or more of the following:

The clarity of the target object area where the target object is located;

the number of pixels in the target object area;

a shooting angle for shooting the target object area; and

The number of key features that the target object possesses.
The method of any one of claims 3 to 5, further comprising:

judging whether the evaluation value satisfies a preset threshold;

If the preset threshold is met, the second image is sent to the monitoring platform server.
The method of any one of claims 1 to 6, further comprising:

The first image is discarded.
A camera, characterized in that the camera comprises:

a lens to receive the light used to generate the image;

A camera body for carrying out the method as claimed in any one of claims 1 to 7.
A target object detection device, characterized in that it includes:

an image acquisition unit for acquiring at least one image;

a target object detection unit for determining a target image with a target object by performing target object detection on the at least one image, and discarding images without the target object;

an encoding unit, configured to perform encoding on the target image to generate an encoded image;

The sending unit is used for sending the encoded image to the monitoring platform server.
The device of claim 9, further comprising:

An image quality evaluation unit, configured to perform evaluation on the image quality of the target image, and determine an evaluation value of the target image.
The device according to claim 10, wherein the image quality evaluation unit is specifically configured to determine the evaluation value by using sub-evaluation values of the plurality of target objects when the target image includes a plurality of target objects.
The device according to claim 10 or 11, wherein the image quality evaluation unit is further configured to determine whether the evaluation value satisfies a preset threshold; in the case of satisfying the preset threshold, the image quality evaluation unit The target image is sent to the cache unit.
The apparatus of claim 12, further comprising:

The cache unit is used for storing the target image satisfying the preset threshold.
The device according to claim 13, wherein the sending unit is further configured to send the image in the cache unit to the monitoring platform server.
The device according to any one of claims 11 to 14, wherein the evaluation value is related to one or more of the following:

The clarity of the target object area where the target object is located;

the number of pixels in the target object area;

a shooting angle for shooting the target object area; and

The number of key features that the target object possesses.
The device according to any one of claims 9 to 15, wherein the target object detection unit is specifically configured to select a representative image from the at least one image; perform target object detection on the representative image; If the representative image has a target object, it is determined that the at least one image is a target image with the target object.
A camera, characterized in that it includes:

lens, for collecting light;

The sensor is used to generate an image by photoelectric conversion of the light collected by the lens;

A processor and a processor cluster for performing the method of any one of claims 1-7.
A non-volatile computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the method described in any one of claims 1-7 is implemented.