CN111753609B

CN111753609B - Target identification method and device and camera

Info

Publication number: CN111753609B
Application number: CN201910713004.7A
Authority: CN
Inventors: 张睿轩
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2023-12-26
Anticipated expiration: 2039-08-02
Also published as: WO2021023106A1; CN111753609A

Abstract

The application provides a target identification method, a target identification device and a camera, wherein the method comprises the following steps: acquiring a video frame and a depth image of a monitoring area; detecting an object in motion from the video frame; when the target triggering preset event rule is tracked, acquiring a target video frame and a target depth image at the moment when the target triggers the event rule; identifying a target type of a target in the target video frame; determining a physical size of the target based on the target depth image; and determining whether the target is a specified target or not according to the physical size and the target type. According to the embodiment of the application, the targets with unreasonable sizes can be filtered by combining the depth map, the appointed targets can be accurately determined, the false alarm probability is reduced, and the accuracy of perimeter precaution is improved.

Description

Target identification method and device and camera

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a method and apparatus for target recognition, and a camera.

Background

With the continuous progress of society, the application range of video monitoring systems is becoming wider and wider. The existing digital monitoring system can not meet the requirements of many application occasions, mainly reflected in the fact that the intelligent degree is not high enough, and is in a semi-automatic state, and many occasions also need manual intervention, for example, when abnormal conditions (moving targets and the like) are found, the targets can not be automatically identified, and because of hysteresis of manual operation, a lot of important information is lost. In order to solve the problems, the perimeter guard is applied, a moving target can be automatically detected through the perimeter guard, alarm information of the target is generated according to rules configured by a user, and relevant personnel are immediately notified to process the alarm information, so that the method is an active monitoring mode, and the practical application value of monitoring is greatly exerted.

The information of the man-made objects in the video is a concerned part in video monitoring, so that the detection and alarm of the human body are also a core function of perimeter precaution. The problems to be solved under the current technical framework are as follows:

1. the target at a far distance is inaccurate in detection, and whether the target is a real person or a vehicle or only one false alarm is difficult to distinguish.

2. The interference of leaves, lights, animals, rainwater and the like is large, and false alarms are easy to generate.

Disclosure of Invention

In view of the foregoing, the present application provides a method, an apparatus, and a camera for object recognition.

Specifically, the application is realized by the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for identifying an object, where the method includes:

acquiring a video frame and a depth image of a monitoring area;

detecting an object in motion from the video frame;

when the target triggering preset event rule is tracked, acquiring a target video frame and a target depth image at the moment when the target triggers the event rule;

identifying a target type of a target in the target video frame;

determining a physical size of the target based on the target depth image;

and determining whether the target is a specified target or not according to the physical size and the target type.

Optionally, the detected object includes a circumscribed rectangular box of the object, and the identifying the object type of the object in the object video frame includes:

intercepting a local image containing the target from the target video frame, wherein the local image is intercepted by taking a circumscribed rectangular frame of the target as a boundary;

and inputting the local image into a trained deep learning model, carrying out target recognition on the local image by the deep learning model, and outputting the target type of the target.

Optionally, the determining the physical size of the target based on the target depth image includes:

mapping each pixel point of the circumscribed rectangular frame of the target in the target depth image to determine a corresponding mapping point of each pixel point in the target depth image;

acquiring point cloud data of each mapping point;

and calculating the boundary length of the circumscribed rectangular frame according to the point cloud data of the mapping points corresponding to the vertex pixel points of the circumscribed rectangular frame, and taking the boundary length as the physical size of the target.

Optionally, the determining whether the target is a specified target according to the physical size and the target type includes:

when the target type and the physical size are matched with the specified target, judging that the target is the specified target;

and when the object type is not matched with any of the physical dimensions and the specified object, judging that the object is not the specified object.

Optionally, detecting whether the target triggers a preset event rule by:

and if the circumscribed rectangle frame of the target is detected to be intersected with a preset warning line or a warning area, judging that the target frame object triggers a preset event rule.

Optionally, the method further comprises:

and triggering alarm processing after determining that the target is a specified target.

In a second aspect, embodiments of the present application provide an object recognition apparatus, including:

the image acquisition module is used for acquiring video frames and depth images of the monitoring area;

the target detection module is used for detecting a target in a motion state from the video frame;

the target image determining module is used for acquiring a target video frame and a target depth image at the moment when the target triggers the event rule when the target triggers the preset event rule is tracked;

the target type identification module is used for identifying the target type of the target in the target video frame;

a physical size determining module for determining a physical size of the target based on the target depth image;

and the target judging module is used for combining the physical size and the target type to determine whether the target is a specified target.

Optionally, the detected object includes a circumscribed rectangular box of the object, and the object type identification module includes:

a local image intercepting sub-module, configured to intercept a local image including the target from the target video frame, where the local image is intercepted with a circumscribed rectangular frame of the target as a boundary;

and the target type determining submodule is used for inputting the local image into a trained deep learning model so as to carry out target identification on the local image by the deep learning model and output the target type of the target.

Optionally, the physical size determining module includes:

the pixel point mapping sub-module is used for mapping each pixel point of the circumscribed rectangular frame of the target in the target depth image so as to determine a corresponding mapping point of each pixel point in the target depth image;

the point cloud data acquisition sub-module is used for acquiring point cloud data of each mapping point;

and the physical dimension calculation sub-module is used for calculating the boundary length of the circumscribed rectangular frame according to the point cloud data of the mapping points corresponding to the vertex pixel points of the circumscribed rectangular frame and taking the boundary length as the physical dimension of the target.

In a third aspect, embodiments of the present application provide a video camera comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method described above when the program is executed.

The embodiment of the application has the following beneficial effects:

the embodiment can acquire the video frame and the depth image of the monitoring area, detect the target in the motion state according to the video frame, and acquire the target video frame and the target depth image at the moment when the target triggers the event rule when the target triggers the preset event rule is tracked. The method comprises the steps of determining the target type of a target through a target video frame, determining the physical size of the target through a target depth image, finally combining the physical size of the target and the target type to determine whether the target is a specified target, combining a depth image can filter out the target with unreasonable size, accurately determining the specified target, reducing the false alarm probability and improving the accuracy of perimeter precaution.

Drawings

FIG. 1 is a flow chart illustrating steps of an embodiment of a method for object recognition according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of area intrusion event detection as shown in an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram illustrating cross-line intrusion event detection according to an exemplary embodiment of the present application;

FIG. 4 is a hardware configuration diagram of the apparatus in which the device of the present application is located;

fig. 5 is a block diagram illustrating an embodiment of an object recognition apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for object recognition according to an exemplary embodiment of the present application may include the following steps:

step 101, obtaining a video frame and a depth image of a monitoring area.

In this step, the monitoring area may be a monitoring range of the camera, and the video frame may be acquired by an image sensor of the camera.

The gray value of each pixel of the Depth map may be used to characterize how far or near a point in the scene is from the camera. When the method is implemented, a general depth image acquisition mode can be adopted to acquire the depth image of the monitoring area. For example, the depth image of the monitoring area may be acquired as follows, but the embodiment is not limited thereto:

a passive ranging sensing method. One such passive ranging sensing method is binocular stereo vision, two images of the same scene are acquired simultaneously by two sensors separated by a certain distance, corresponding pixels in the two images are found by a stereo matching algorithm, parallax information is calculated according to the principle of trigonometry, and the parallax information can be used for representing depth information of objects in the scene by conversion. Based on the stereo matching algorithm, a depth image of the scene can also be obtained by taking a group of images of different angles under the same scene. In addition, the depth image of the scene can be obtained by analyzing and indirectly estimating the photometric features, the bright and dark features and other features of the image.

An active ranging sensing method. The difference between active ranging sensing and passive ranging sensing is: the device itself needs to transmit energy to complete the acquisition of depth information. This also ensures that the depth image is acquired independently of the color image. Methods of active ranging sensing mainly include TOF (Time of Flight), structured light, lidar scanning, etc. Wherein,

the principle of acquiring depth images by a TOF camera is as follows: by emitting successive near infrared pulses to the target scene, the light pulses reflected back by the object are then received by the sensor. By comparing the phase difference between the emitted light pulses and the light pulses reflected by the object, the transmission delay between the light pulses can be calculated, so that the distance between the object and the emitter can be obtained, and finally a depth image can be obtained.

The depth image acquisition principle based on structured light is: structured light is projected onto the scene and a corresponding pattern with structured light is captured by the image sensor. Because the pattern of the structured light deforms according to the shape of the object, the depth information of each point in the scene can be obtained by calculating the position and deformation degree of the pattern image in the captured image by using the triangle principle.

The laser radar ranging technology obtains three-dimensional information of a scene in a laser scanning mode. The basic principle is that laser is emitted to the space according to a certain time interval, signals of all scanning points are recorded, the signals reach objects in a tested scene from the laser radar, and then the distance between the surface of the object and the laser radar is calculated through the interval time that the objects reflect back to the laser radar.

It should be noted that, when acquiring the video frame and the depth image of the monitoring area, if at least two sensors are involved, the at least two sensors may be integrated into one camera or may be disposed in different cameras, which is not limited in this embodiment.

In one possible implementation scenario, if the at least two sensors are integrated in the same camera, this embodiment may be performed by the camera when the camera is mounted with a processing chip. When the camera is not processing a chip, different sensors in the camera may transmit the acquired data to the designated platform, which performs the present embodiment.

In another possible implementation scenario, when multiple sensors are deployed at different cameras, respectively, different ones of the different cameras may transmit the acquired data to a designated platform, which performs the present embodiment.

The embodiments of the present application may be applied to any of the above scenarios, but the embodiments of the present application are not limited thereto, and all embodiments according to the ideas of the present application are within the scope of protection of the present application.

Step 102, detecting an object in a motion state from the video frame.

For example, the detected object may include a circumscribed rectangular box of the object.

In one embodiment, the object in motion may be identified from the video frame by a method of moving object detection. The moving object detection refers to a process of effectively extracting an object generating spatial position change by subtracting redundant information in time and space in a video through a computer vision method. For example, a background model may be established, and then the background difference method is used to classify the pixels of the moving object and the background, so as to detect the object in the moving state. For example, a median method may be used to build the background model, i.e. using the median of the pixel values of a sequence of consecutive N frames of images as the background model. Or background modeling can be performed in a single Gaussian or mixed Gaussian mode, and whether the pixel points belong to the foreground is judged by utilizing a threshold value; alternatively, the pixel point is estimated by using a standard kernel function through the latest image sample information, and the moving object is extracted. Or, randomly selecting a certain number of pixel values for background modeling according to a certain rule for each pixel point, and classifying the foreground and the background for the pixel points by adopting Euclidean distance.

In one embodiment, after moving objects are detected, each object may be tracked to determine whether each object is a stable object, i.e., whether the object is in motion. For example, an object in motion may be detected in every video frame and have a stable displacement for the object.

And step 103, when the target triggering preset event rule is tracked, acquiring a target video frame and a target depth image at the moment when the target triggers the event rule.

In this step, during the tracking of the target, logic analysis may also be performed on the target to determine whether the target triggers a preset event rule.

In one possible implementation manner of this embodiment, the following manner may be adopted to determine whether the target triggers a preset event rule:

By way of example, the preset event rules may include alarm events such as area intrusion, cross-line intrusion, etc.

For example, as shown in the area intrusion event detection schematic diagram of fig. 2, assuming that the warning area generated in advance is F1, the circumscribed rectangular frame of the target human body is F2, when the target human body starts to enter the limiting area F1, and when the F2 intersects with the F1, the circumscribed rectangular frame F2 is divided into A, B parts by the boundary of the F1, so that it can be determined that the position of the target human body is within the limiting area F1, and then it can be determined that the target triggers the event rule of the preset area intrusion event.

As another example, as shown in the cross line intrusion event detection schematic diagram of fig. 3, assuming that the pre-generated warning line is L, and the circumscribed rectangular frame of the target human body is F2, when the target human body crosses the warning line L, the warning line L intersects with F2, that is, the L divides the circumscribed rectangular frame F2 into two parts A, B, so that it can be determined that the position of the target human body triggers the warning line L, and then it can be determined that the target triggers the event rule of the preset cross line intrusion event.

In this embodiment, when it is determined that the target triggers the preset event rule, the target video frame and the target depth image at the moment of the target triggering event rule may be acquired.

In one possible implementation, the rule judging module may judge whether the target triggers a preset event rule, if the rule judging module judges that the target triggers the preset event rule, the rule judging module may send an alarm signal to the video frame identifying module and the depth image identifying module, the video frame identifying module may take a video frame obtained at the moment of receiving the alarm signal as a target video frame, and the depth image identifying module may take a depth image obtained at the moment of receiving the alarm signal as a target depth image.

Step 104, identifying a target type of a target in the target video frame.

In this step, after the target video frame is determined, target detection may be performed on the target video frame to determine a target type of the target.

In one example, the object type of the object may be determined by the video frame identification module based on the object video frame. As one example, the target type may include a human, motor vehicle, small animal, or the like type.

In one possible implementation of this embodiment, step 104 may include the following sub-steps:

and S11, intercepting a local image containing the target from the target video frame, wherein the local image is intercepted by taking a circumscribed rectangular frame of the target as a boundary.

In one example, after the video frame identification module determines the target video frame according to the alarm signal, a matting process may be performed, and the circumscribed rectangular frame containing the target is cut out from the target video frame, so as to obtain a local image.

And a substep S12, inputting the local image into a trained deep learning model to perform target recognition on the local image by the deep learning model, and outputting the target type of the target.

In this step, after the partial image is cut out according to the circumscribed rectangular frame of the target, the partial image may be input into a trained deep learning model, and target recognition may be performed on the partial image by the deep learning model, so as to output a target type of the target as a person, a motor vehicle, a small animal, or a false alarm.

Step 105, determining a physical size of the target based on the target depth image.

In this step, after the target depth image is determined, the physical size of the target may be determined from the target depth image.

In one example, the physical size of the target may be determined based on the target depth image by a depth image recognition module.

In one possible implementation of the present embodiment, step 105 may include the following sub-steps:

and S21, mapping each pixel point of the circumscribed rectangular frame of the target in the target depth image to determine a corresponding mapping point of each pixel point in the target depth image.

In implementation, the video frame and the depth image corresponding to the same moment are accurate, and the pixels of the video frame and the depth image are mapped one by one, that is, for one pixel on the depth image, one pixel on the color image (i.e., the video frame) can be found to correspond to the pixel.

For the present embodiment, the target video frame and the target depth image are also aligned with each other, and correspond to each other on the pixels one by one. After the circumscribed rectangle frame of the target is determined in the target video frame, according to the positions of the pixel points in the circumscribed rectangle frame of the target, the pixel point corresponding to each pixel point can be searched in the target depth image and used as a mapping point corresponding to the pixel point in the target depth image.

In one embodiment, each pixel in the circumscribed rectangular frame of the object may include all pixels within the circumscribed rectangular frame and pixels at the boundary of the circumscribed rectangular frame. In other embodiments, in order to reduce the calculated data amount, the mapping process may be performed by taking only the pixels that circumscribe the boundary of the rectangular frame.

In the substep S22, point cloud data of each mapping point is acquired.

In one possible implementation, the target depth image may first be converted into a point cloud image.

The point cloud image is a massive set of points which are characterized by the surface of a target, each point in the point cloud image represents the position of the actual space geometry, each point comprises three-dimensional coordinates, and some points possibly also comprise color information (RGB) or reflection Intensity information (Intensity).

In this embodiment, a general method may be employed to convert the depth image into a point cloud image. In one implementation, the internal parameters and the external parameters of the camera can be combined, and the corresponding three-dimensional point cloud image is calculated according to the depth image, so that three-dimensional coordinate information of any position is obtained.

By way of example, one of the conversion processes may be:

the depth image is a depth value matrix of each pixel, and the depth value matrix is two-dimensionally arranged, and the two dimensions represent the number of rows and columns of the pixel in the depth image. If the camera cannot collect the depth value of a certain pixel, the depth value of the pixel is set to a specific value, such as 0. The point cloud coordinates of each pixel point in the depth image can be calculated according to the internal parameters of the camera, and the point cloud coordinates are three-dimensional coordinates and can be expressed as (x, y, z).

Specifically, a UV rectangular coordinate system may be defined in the depth image, where the pixel coordinates (u, v) of each pixel point in the depth image respectively represent the number of rows and columns of the pixel point in the depth image. The origin O of the UV rectangular coordinate system represents the intersection point of the camera optical axis and the plane in which the depth image is located, and the coordinates of the origin O in the UV rectangular coordinate system are (u) ₀ ,v ₀ ) The physical dimensions of each pixel point in the direction u and the direction v are dx and dy, and the focal length of the camera lens is f. Generally u will ₀ 、v ₀ F/dx and f/dy are referred to as camera references.

After determining the pixel coordinates (u, v) of each pixel point and the depth value z of the pixel point in the depth image, the following formula may be used to calculate x and y corresponding to each pixel point, where the formula is: x=z (u-u) ₀ )*dx/f；y＝z*(v-v ₀ )*dy/f。

Further, point cloud coordinates (x, y, z) of each pixel point in the depth image may be determined.

In other embodiments, in order to save the calculation amount, the point cloud data corresponding to each mapping point of the circumscribed rectangular frame may be directly calculated according to the above-mentioned manner of calculating the point cloud coordinates, without calculating the point cloud map of the entire target depth image.

And S23, calculating the boundary length of the circumscribed rectangular frame according to the point cloud data of the mapping points corresponding to the vertex pixel points of the circumscribed rectangular frame, and taking the boundary length as the physical size of the target.

In this step, after the point cloud data of the mapping point corresponding to each pixel point of the circumscribed rectangular frame of the target is obtained, the boundary length of the circumscribed rectangular frame may be calculated according to the point cloud data corresponding to the vertex pixel point located at the vertex of the circumscribed rectangular frame, for example, the width and the height of the circumscribed rectangular frame may be calculated as the physical size of the target.

And 106, determining whether the target is a specified target or not according to the physical size and the target type.

In one possible implementation of this embodiment, step 106 may include the following sub-steps:

when the target type and the physical size are matched with the specified target, judging that the target is the specified target; and when the object type is not matched with any of the physical dimensions and the specified object, judging that the object is not the specified object.

For example, assuming that the target type of the judgment target is a human body, but its physical size is clearly different from the size of the human body, and assuming that the size of the motor vehicle is the size, it may be determined that the current target is not the human body. On the contrary, if the target type of the target is determined to be a human body and its physical size is matched with the size of the human body, it can be determined that the current target is a human body.

In one possible implementation manner of the embodiment, the method may further include the following steps:

In this step, when the target is determined to be the specified target, alarm processing may be performed, for example, an alarm signal such as an alarm sound is sent, or alarm information is sent to related personnel, etc., so as to implement a function of perimeter precaution.

Corresponding to the embodiments of the method described above, the present application also provides an embodiment of the object recognition device.

The device embodiment of the application can be applied to electronic equipment such as radar or cameras. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking a software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of a device where the device is located for operation. In terms of hardware, as shown in fig. 4, a hardware structure diagram of a device where an apparatus of the present application is located is shown in fig. 4, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 4, the device where the apparatus is located in an embodiment generally may include other hardware according to an actual function of the apparatus, which is not described herein again.

Referring to fig. 5, a block diagram illustrating an embodiment of an object recognition device according to an exemplary embodiment of the present application may specifically include the following modules:

the image acquisition module 501 is configured to acquire a video frame and a depth image of a monitored area;

an object detection module 502, configured to detect an object in a motion state from the video frame;

a target image determining module 503, configured to acquire a target video frame and a target depth image at a moment when the target triggers a preset event rule when the target triggers the event rule;

a target type identifying module 504, configured to identify a target type of a target in the target video frame;

a physical size determining module 505, configured to determine a physical size of the target based on the target depth image;

and a target determining module 506, configured to determine whether the target is a specified target in combination with the physical size and the target type.

In one possible implementation manner of this embodiment, the detected object includes a circumscribed rectangular frame of the object, and the object type identifying module 504 may include the following sub-modules:

In one possible implementation manner of this embodiment, the physical dimension determining module includes:

In one possible implementation manner of this embodiment, the target determining module 506 is specifically configured to:

In one possible implementation manner of this embodiment, the following manner is adopted to detect whether the target triggers a preset event rule:

In a possible implementation manner of this embodiment, the apparatus further includes:

and the alarm module is used for triggering alarm processing after determining that the target is the designated target.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

The embodiment of the application also provides a camera, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the embodiment of the method when executing the program.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a vehicle-mounted terminal, a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method of object recognition, the method comprising:

acquiring a video frame and a depth image of a monitoring area;

detecting an object in motion from the video frame;

identifying a target type of a target in the target video frame;

determining a physical size of the target based on the target depth image;

determining whether the target is a specified target by combining the physical size and the target type;

wherein the detected object includes a circumscribed rectangular box of the object, and the identifying the object type of the object in the object video frame includes:

2. The method of claim 1, wherein the determining the physical size of the target based on the target depth image comprises:

acquiring point cloud data of each mapping point;

3. The method according to claim 1 or 2, wherein said determining whether the target is a specified target based on the physical size and the target type comprises:

4. The method of claim 1, wherein detecting whether the target triggers a preset event rule is performed by:

5. The method according to claim 1 or 2 or 4, wherein the method further comprises:

6. An object recognition apparatus, characterized in that the apparatus comprises:

the target judging module is used for combining the physical size and the target type to determine whether the target is a specified target or not;

wherein the detected object includes a circumscribed rectangular frame of the object, and the object type identification module includes:

7. The apparatus of claim 6, wherein the physical size determining module comprises:

8. A video camera comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-5 when the program is executed by the processor.