CN109727275B

CN109727275B - Object detection method, device, system and computer readable storage medium

Info

Publication number: CN109727275B
Application number: CN201811630396.2A
Authority: CN
Inventors: 李姣; 刘朋樟; 刘通
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2022-04-12
Anticipated expiration: 2038-12-29
Also published as: CN109727275A

Abstract

The invention discloses a target detection method, a device, a system and a computer readable storage medium, and relates to the technical field of image processing. The target detection method comprises the following steps: inputting a sequence to be detected comprising a plurality of frames of images into a target detection model to obtain a plurality of images with detection frames output by the target detection model; determining pixels in the detection frame having a motion optical flow; according to the number of pixels with motion light flow in the detection frame, retaining part or all of the detection frames; and determining the object in the reserved detection frame as the target object in the corresponding image. The embodiment of the invention can improve the accuracy of target detection.

Description

Object detection method, device, system and computer readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a system, and a computer-readable storage medium for target detection.

Background

Target detection refers to accurately finding the position of an object in a given image and marking the category of the object. Object detection techniques are one of the research hotspots in the current field of computer vision and machine learning. Today, there are two main categories of research directions for target detection. The method mainly comprises target feature extraction, target recognition, target positioning and the like. Secondly, target detection based on deep learning, the method mainly completes extraction of deep features, target identification and positioning through a deep network model.

Disclosure of Invention

The inventor recognizes that the generalization capability of the current detection model based on deep learning is low for different application scenes, so that under the condition of complex scenes, the detection model can generate a large amount of false detection results, and the detection accuracy is low.

The embodiment of the invention aims to solve the technical problem that: how to improve the accuracy of the target detection method.

According to a first aspect of some embodiments of the present invention, there is provided a target detection method, comprising: inputting a sequence to be detected comprising a plurality of frames of images into a target detection model to obtain a plurality of images with detection frames output by the target detection model; determining pixels in the detection frame having a motion optical flow; according to the number of pixels with motion light flow in the detection frame, retaining part or all of the detection frames; and determining the object in the reserved detection frame as the target object in the corresponding image.

In some embodiments, determining pixels in the detection frame that have motion optical flow comprises: determining the displacement of a pixel according to the position of the pixel in the detection frame of one frame of image and the position of the same pixel of the previous frame of image; in the event that the displacement of the pixel is greater than a displacement threshold, the pixel is determined to have motion light flow.

In some embodiments, the target detection method further comprises: the feature points in the detection frame of the image are detected so as to calculate the displacement of each feature point in the detection frame of the image relative to the same feature point in the image of the previous frame of the image, and in the case that the displacement of the feature point is greater than a displacement threshold value, it is determined that the pixel corresponding to the feature point has a moving optical flow.

In some embodiments, depending on the number of pixels in the detection frame with the stream of moving light, retaining some or all of the detection frames comprises: and deleting the detection frame of the image under the condition that the number of pixels with the motion light flow in the detection frame of the image is less than a preset threshold value and the number of pixels with the motion light flow in a range corresponding to the detection frame of the image in the first several frames of the image is also less than the preset threshold value.

In some embodiments, the target detection model is a neural network model; the target detection method further includes: training a neural network model by adopting a training image to obtain a target detection model, wherein the training image comprises a positive sample image and a negative sample image, each positive sample image has the position information of a marked target object, each negative sample image does not have the target object, and the negative sample image comprises an image in a detection frame which is mistakenly identified by the target detection model.

In some embodiments, the target detection method further comprises: generating a virtual image by adopting a generating type countermeasure network based on the acquired real image; and taking the virtual image as a training image.

In some embodiments, the target detection method further comprises: responding to the opening of a cabinet door of the vending device, collecting a video or continuously collecting a plurality of images as a sequence to be detected so as to detect a target object in the images, wherein the target object is a taken article; an image of the target object is identified to determine an identity of the item being picked up.

According to a second aspect of some embodiments of the present invention, there is provided a target object detection apparatus comprising: the detection frame output module is configured to input a sequence to be detected comprising a plurality of frames of images into a target detection model, and obtain a plurality of images with detection frames output by the target detection model; a motion optical flow determination module configured to determine pixels in the detection frame having motion optical flows; a detection frame screening module configured to retain part or all of the detection frames according to the number of pixels having the moving optical flow in the detection frames; and the target object determining module is configured to determine the object in the reserved detection frame as the target object in the corresponding image.

According to a third aspect of some embodiments of the present invention, there is provided a target object detection apparatus comprising: a memory; and a processor coupled to the memory, the processor configured to execute, based on instructions stored in the memory, a target object detection method for performing operations comprising: inputting a sequence to be detected comprising a plurality of frames of images into a target detection model to obtain a plurality of images with detection frames output by the target detection model; determining pixels in the detection frame having a motion optical flow; according to the number of pixels with motion light flow in the detection frame, retaining part or all of the detection frames; and determining the object in the reserved detection frame as the target object in the corresponding image.

In some embodiments, the operations further comprise: the feature points in the detection frame of the image are detected so as to calculate the displacement of each feature point in the detection frame of the image relative to the same feature point in the image of the previous frame of the image, and in the case that the displacement of the feature point is greater than a displacement threshold value, it is determined that the pixel corresponding to the feature point has a moving optical flow.

In some embodiments, the target detection model is a neural network model; the operations further comprise:

training a neural network model by adopting a training image to obtain a target detection model, wherein the training image comprises a positive sample image and a negative sample image, each positive sample image has the position information of a marked target object, each negative sample image does not have the target object, and the negative sample image comprises an image in a detection frame which is mistakenly identified by the target detection model.

In some embodiments, the operations further comprise: generating a virtual image by adopting a generating type countermeasure network based on the acquired real image; and taking the virtual image as a training image.

According to a fourth aspect of some embodiments of the present invention there is provided a target object detection system comprising: the target object detection device is configured to input a sequence to be detected comprising a plurality of frames of images into a target detection model, and obtain a plurality of images with detection frames output by the target detection model; determining pixels in the detection frame having a motion optical flow; according to the number of pixels with motion light flow in the detection frame, retaining part or all of the detection frames; determining the object in the reserved detection frame as a target object in the corresponding image; and the camera equipment is configured to collect a sequence to be detected comprising a plurality of frames of images.

In some embodiments, the target object detection system further comprises: a vending apparatus; the camera device is positioned in the vending apparatus and is further configured to collect a video or continuously collect a plurality of images as a sequence to be detected in response to the cabinet door of the vending apparatus being opened; the target object detection device is further configured to detect a target object in the image and recognize the image of the target object to determine an identity of the picked item, wherein the target object is the picked item.

According to a fifth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements any one of the aforementioned target object detection methods.

Some embodiments of the above invention have the following advantages or benefits: the embodiment of the invention can adopt a target detection model to identify the target objects possibly existing in the single-frame image based on the static image characteristics, and then adopt the motion optical flow based on the inter-frame dynamic characteristics to screen the target objects possibly existing to obtain the detection result. Even under the condition of complex scene, the embodiment of the invention can also carry out detection in the secondary screening mode, thereby improving the accuracy of target detection.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of a target detection method according to some embodiments of the invention.

FIG. 2 is a flow diagram illustrating a method for pixel determination with motion optical flow according to some embodiments of the invention.

FIG. 3 is a flow diagram illustrating a method for pixel determination with motion optical flow according to further embodiments of the present invention.

FIG. 4 is a schematic flow chart of a method of screening test frames according to some embodiments of the invention.

FIG. 5 is a schematic flow chart of a method for screening test frames according to other embodiments of the present invention.

FIG. 6 is a flow diagram of a method of training a target detection model according to some embodiments of the invention.

FIG. 7 is a flow diagram of a training image generation method according to some embodiments of the invention.

FIG. 8 is a flow chart illustrating a vending method according to some embodiments of the present invention.

FIG. 9 is a schematic diagram of a target detection apparatus according to some embodiments of the invention.

FIG. 10 is a schematic diagram of a target object detection system according to some embodiments of the invention.

Fig. 11 is a schematic structural diagram of a target object detection apparatus according to other embodiments of the present invention.

Fig. 12 is a schematic structural diagram of a target object detection apparatus according to further embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

FIG. 1 is a schematic flow diagram of a target detection method according to some embodiments of the invention. As shown in fig. 1, the object detection method of this embodiment includes steps S102 to S108.

In step S102, a sequence to be detected including multiple frames of images is input into a target detection model, and multiple images with detection frames output by the target detection model are obtained. The object in the detection box is considered as a target object by the target detection model.

The sequence to be detected may be, for example, a video or an image sequence consisting of a plurality of images captured by a camera continuously over a period of time. The sequence to be detected can be collected through a shooting device with a fixed view angle, and the shooting device can be a monitoring camera which is fixed at a preset position in the same shooting process and has unchanged shooting angle and focal length. For example, the camera may be a camera placed at an unmanned sales counter for photographing the behavior of the user to take the article.

In some embodiments, the target detection model is a neural network model, for example, a model based on a mobile terminal neural network-SSD (Single Shot multi box Detector) network framework. There may be one or more detection boxes in the image output by the object detection model.

The inventors have recognized that the target detection model is based on image features in a single frame image to determine the target object. This is a static feature based recognition approach. When the identified target object is an object in motion, in order to further improve the accuracy of target detection, the output result of the target detection model can be further screened.

In step S104, pixels in the detection frame having a moving optical flow are determined.

The motion optical flow reflects the motion information of an object between adjacent frames. In embodiments of the present invention, whether a pixel has a motion optical flow or not may be determined, for example, using change information generated when the same pixel is in a different frame.

In step S106, a part or all of the detection frames are retained in accordance with the number of pixels having a moving optical flow in the detection frame.

In some embodiments, some or all of the detection frames in a frame of image may be retained according to the number of pixels with moving optical flow in the detection frame of the image of the frame and a preset number of frames before the frame. When the number of optical streams is 0 or less, it is indicated that there is a low possibility that a moving object exists in the detection frame, which is likely to be misrecognized, or that a background object is recognized instead of a target object in motion.

In step S108, the object in the remaining detection frame is determined as the target object in the corresponding image.

By the method of the embodiment, the target detection model is adopted to identify the target objects which may exist in the single-frame image based on the static image features, and then the moving optical flow based on the inter-frame dynamic features is adopted to screen the target objects which may exist to obtain the detection result. Even under the condition of complex scene, the embodiment of the invention can also carry out detection in the secondary screening mode, thereby improving the accuracy of target detection.

An embodiment of the pixel determination method with motion optical flow of the present invention is described below with reference to fig. 2 and 3.

FIG. 2 is a flow diagram illustrating a method for pixel determination with motion optical flow according to some embodiments of the invention. As shown in fig. 2, the pixel determination method with a moving optical flow of this embodiment includes steps S202 to S204.

In step S202, the displacement of the pixel is determined according to the position of the pixel in the detection frame of one frame image and the position of the same pixel in the previous frame image.

In step S204, in the case where the displacement of the pixel is larger than the displacement threshold, it is determined that the pixel has a motion optical flow.

When the position change of the pixel at different time is larger than the displacement threshold, the pixel is indicated to have motion information, so that the motion optical flow corresponding to the pixel can reflect the motion characteristics of the object. With the method of this embodiment, pixels in the detection frame having a moving optical flow can be accurately detected.

In some embodiments, the same pixel in different frames may be identified by feature points. FIG. 3 is a flow diagram illustrating a method for pixel determination with motion optical flow according to further embodiments of the present invention. As shown in fig. 3, the pixel determination method with a moving optical flow of this embodiment includes steps S302 to S306.

In step S302, feature points in a detection frame of an image are detected. The Feature points may be, for example, corner features, SIFT (Scale-Invariant Feature Transform) features, and the like, and may be selected by those skilled in the art as needed.

In some embodiments, all feature points in the image may be detected according to all pixels in each frame of image, and then the feature points falling into the detection frame may be screened out. So that the feature points can be detected more comprehensively. In some embodiments, the feature points may also be detected only from the pixel points in the detection frame. Thereby, the detection speed of the feature point can be improved.

In step S304, the displacement of each feature point in the detection frame of the image with respect to the same feature point in the image of the previous frame of the image is calculated.

In some embodiments, the image may be mapped into a coordinate system, with each pixel location in the image corresponding to a coordinate point in the coordinate system. Thus, the distance of the coordinate points of the same feature point in two adjacent frames of images can be determined as the displacement of the feature point.

In step S306, in the case where the displacement of the feature point is larger than the displacement threshold, it is determined that the pixel corresponding to the feature point has a motion optical flow.

By the method of the embodiment, the same pixel in different image frames can be detected more accurately, and the accuracy of detection of the pixel with the motion optical flow is improved.

After determining the pixels with motion light flow in the detection frame, some or all of the detection frames may be retained according to the number of pixels with motion light flow in the detection frame. When deleting the detection frame in one frame image, the processing may be performed not only based on the number of pixels having a moving optical flow in the detection frame of the frame image but also based on the number of pixels having a moving optical flow in the detection frame of several frames of images preceding the frame image. An embodiment of the detection frame screening method of the present invention is described below with reference to fig. 4 and 5.

FIG. 4 is a schematic flow chart of a method of screening test frames according to some embodiments of the invention. As shown in fig. 4, the detection box screening method of this embodiment includes steps S402 to S404.

In step S402, pixels in the detection frame having a moving optical flow are determined.

In step S404, in a case where the number of pixels having a moving optical flow in the detection frame of the image is smaller than a preset threshold value, and the number of moving optical flows in a range corresponding to the detection frame of the image in the first several frames of the image is also smaller than the preset threshold value, the detection frame of the image is deleted.

If no moving object appears in the detection frame within a period of time, the content in the detection frame is likely to be a background scene, and thus the detection frame can be deleted.

FIG. 5 is a schematic flow chart of a method for screening test frames according to other embodiments of the present invention. As shown in fig. 5, the detection box screening method of this embodiment includes steps S502 to S510.

In step S502, a multi-frame image having a detection frame output by the target detection model is acquired.

In step S504, pixels having a moving optical flow in the detection frame in the multi-frame image are determined.

Steps S506 to S10 are exemplary processing manners for one detection frame in one to-be-processed image of the multi-frame image. The other detection frames in the image to be processed and other frame images can be processed by the same or similar means.

In step S506, it is determined whether the number of pixels having a moving optical flow within the to-be-processed detection frame of the to-be-processed image is smaller than a preset threshold. If not, judging that a moving object exists in the to-be-processed detection frame of the to-be-processed image, and reserving the to-be-processed detection frame; if so, go to step S508.

In step S508, it is determined whether the number of pixels having a moving optical flow in a range corresponding to the detection frame to be processed in the previous frame of image of the image to be processed is smaller than a preset threshold. If not, the fact that the moving object possibly does not exist in the detection frame to be processed is shown, but the moving object exists at the same position of the detection frame to be processed in the previous frame, so that the image collected when the target object stops after moving for a period of time is likely to exist in the detection frame to be processed, and the detection frame to be processed can be reserved; if so, go to step S510.

In step S510, it is determined whether the number of pixels having a moving optical flow in a range corresponding to the detection frame to be processed in the first N (N is a positive integer) frame images of the image to be processed is smaller than a preset threshold. If not, the detection box to be processed can be reserved; if the value is less than the threshold value, the fact that no moving object exists at the position of the detection frame to be processed within a period of time is indicated, the detection frame to be processed is likely to be a background object, and therefore the detection frame to be processed can be deleted.

By the method of the embodiment, whether the target object exists in the detection frame can be comprehensively judged by combining the optical flow number of the multi-frame images, and the accuracy of target object detection is improved.

In order to further improve the accuracy of the target detection model, the training of the target detection model can be optimized. In some embodiments, the neural network model may be trained using training images to obtain a target detection model, where the training images include positive sample images and negative sample images, each positive sample image having location information of a labeled target object, and each negative sample image having no target object therein. The negative sample image comprises an image in a detection frame which is mistakenly identified by the target detection model. An embodiment of the object detection model training method of the present invention is described below with reference to fig. 6.

FIG. 6 is a flow diagram of a method of training a target detection model according to some embodiments of the invention. As shown in fig. 6, the target detection model training method of this embodiment includes steps S602 to S608.

In step S602, training images are acquired, the training images including positive and negative sample images.

In step S604, the training image is input to the neural network model, and an output prediction image is obtained. The partial prediction image has a detection frame.

In step S606, the model of the neural network is adjusted according to the prediction accuracy of the neural network model.

In step S608, in response to the predicted image corresponding to the negative sample image output by the neural network model having the detection frame, the image in the detection frame of the predicted image corresponding to the negative sample image is added to the training image as a new negative sample.

Therefore, the images which are wrongly identified in the training process can be retrained, so that the identification capability of the neural network model on difficult cases (Hard examplesare also called Hard Negative and Hard instant) is improved, and the identification accuracy of the target detection model is improved.

In some embodiments, negative examples may also be enriched during use to continually update the target detection model. For example, steps S610 to S614 may be further included.

In step S610, a sequence to be detected including multiple frames of images is input into a target detection model, and multiple images with detection frames output by the target detection model are obtained.

In step S612, some or all of the detection frames are retained based on the moving optical flow, and the object in the retained detection frame is determined as the target object in the corresponding image.

In step S614, the image in the deleted detection frame is added to the training image as a new negative sample.

In some embodiments, the training images may include real images, and may also include virtual images. The virtual image may be generated from the real image. An embodiment of the training image generation method of the present invention is described below with reference to fig. 7.

FIG. 7 is a flow diagram of a training image generation method according to some embodiments of the invention. As shown in fig. 7, the training image generation method of this embodiment includes steps S702 to S704.

In step S702, a virtual image is generated using a generative confrontation network based on the captured real image.

In some embodiments, a plurality of generative confrontation networks corresponding to a plurality of scenarios, for example, scenarios to which the target detection method in the foregoing embodiments may be applied, may be trained in advance. By inputting the acquired real images into a generative confrontation network, the generative confrontation network can generate virtual images under the corresponding scenes.

In step S704, the virtual image is used as a training image. The training image may include a real image in addition to the virtual image.

Therefore, a large number of virtual images can be generated for training based on a small number of real images, so that the trained target detection model can have good adaptability to various scenes, and the training efficiency and the target detection accuracy are improved.

Embodiments of the present invention may be applied, for example, to vending scenarios of vending apparatus. For example, when a user opens a cabinet door of the unmanned sales counter to take goods, the camera mounted in the unmanned sales counter can capture video or images of the user when taking the goods. Then, the commodity taken by the user in the hand can be identified by the target detection method of the invention. Since the commodities are in a moving state most of the time in the process of taking the commodities by the user, after part of the commodities in the image are identified by the target detection model, the surrounding environment in the image and the static commodities which are not taken by the user and are placed in the sales counter can be further screened out based on the motion light flow, and the commodities taken by the user can be determined. An embodiment of the vending method of the unmanned sales counter according to the present invention is described below with reference to fig. 8.

FIG. 8 is a flow chart illustrating a vending method according to some embodiments of the present invention. As shown in fig. 8, the vending method of this latter embodiment includes steps S802 to S812.

In step S802, in response to the cabinet door of the sales counter being opened, a video is captured or a plurality of images are continuously captured as a sequence to be detected.

In some embodiments, the capturing of video or images may be stopped in response to a user closing the cabinet door.

In step S804, the sequence to be detected is input into the target detection model, and a plurality of images with detection frames output by the target detection model are obtained.

In step S806, pixels in the detection frame having a moving optical flow are determined.

In step S808, some or all of the detection frames are retained in accordance with the number of pixels having a moving optical flow in the detection frame.

In step S810, the object in the remaining detection frame is determined as the target object in the corresponding image. The target object is the item taken.

In step S812, the image of the target object is recognized to determine the identity of the item taken. Therefore, information such as SKU (Stock Keeping Unit), name, price, specification and the like of the goods taken by the user can be determined, so that the goods taken by the user can be settled, and the automatic vending process is realized.

By the method of the embodiment, the articles taken by the user can be accurately identified according to the characteristic that the articles are in a motion state at most of time when the user takes the articles from the vending machine, so that the vending efficiency of the vending machine and the accuracy of commodity settlement can be improved.

An embodiment of the object detection device of the present invention is described below with reference to fig. 9.

FIG. 9 is a schematic diagram of a target detection apparatus according to some embodiments of the invention. As shown in fig. 9, the object detection device 90 of this embodiment includes: a detection frame output module 910 configured to input a sequence to be detected including multiple frames of images into a target detection model, and obtain multiple images with detection frames output by the target detection model; a motion optical flow determination module 920 configured to determine pixels in the detection frame having motion optical flow; a detection frame screening module 930 configured to retain part or all of the detection frames according to the number of pixels having a moving optical flow in the detection frames; a target object determination module 940 configured to determine the object in the reserved detection frame as the target object in the corresponding image.

In some embodiments, the motion optical flow determination module 920 is further configured to determine a displacement of a pixel according to a position of the pixel in the detection frame of one frame of image and a position of the same pixel of a previous frame of image; in the event that the displacement of the pixel is greater than a displacement threshold, the pixel is determined to have motion light flow.

In some embodiments, the object detection device 90 further comprises: a feature point detection module 950 configured to detect feature points in a detection frame of an image, so as to calculate a displacement of each feature point in the detection frame of the image relative to the same feature point in an image of a previous frame of the image, and determine that a pixel corresponding to the feature point has a motion optical flow if the displacement of the feature point is greater than a displacement threshold.

In some embodiments, the detection frame filtering module 930 is further configured to delete the detection frame of the image in a case where the number of pixels with the motion light flow in the detection frame of the image is less than a preset threshold, and the number of pixels with the motion light flow in a range corresponding to the detection frame of the image in the first several frames of the image is also less than the light flow number threshold.

In some embodiments, the target detection model is a neural network model; the object detection device 90 further includes: the training module 960 is further configured to train the neural network model with training images to obtain a target detection model, wherein the training images include positive sample images and negative sample images, each positive sample image has position information of a labeled target object, each negative sample image does not have a target object therein, and the negative sample images include images in a detection box misrecognized by the target detection model.

In some embodiments, the object detection device 90 further comprises: a virtual image generation module 970 configured to generate a virtual image using a generative confrontation network based on the acquired real image; and taking the virtual image as a training image.

An embodiment of the target object detection system of the present invention is described below with reference to fig. 10.

FIG. 10 is a schematic diagram of a target object detection system according to some embodiments of the invention. As shown in fig. 10, the target object detection system 100 of this embodiment includes: a target object detection means 1010 and an image pickup apparatus 1020. The image pickup apparatus 1020 is configured to capture a sequence to be detected including a plurality of frame images. The specific implementation of the target object detection apparatus 1010 can refer to the target detection apparatus 90 in the embodiment of fig. 9, and is not described herein again.

Fig. 11 is a schematic structural diagram of a target object detection apparatus according to other embodiments of the present invention. As shown in fig. 11, the target object detection apparatus 110 of this embodiment includes: a memory 1110 and a processor 1120 coupled to the memory 1110, the processor 1120 being configured to perform a target object detection method in any of the embodiments described above based on instructions stored in the memory 1110.

Memory 1110 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.

Fig. 12 is a schematic structural diagram of a target object detection apparatus according to further embodiments of the invention. As shown in fig. 12, the target object detection apparatus 120 of this embodiment includes: the memory 1210 and the processor 1220 may further include an input/output interface 1230, a network interface 1240, a storage interface 1250, and the like. These

interfaces

1230, 1240, 1250, as well as the memory 1210 and the processor 1220, may be connected via a bus 1260, for example. The input/output interface 1230 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 1240 provides a connection interface for a variety of networking devices. The storage interface 1250 provides a connection interface for external storage devices such as an SD card and a usb disk.

An embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, wherein the program is configured to implement any one of the aforementioned target object detection methods when executed by a processor.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of target detection, comprising:

inputting a sequence to be detected comprising a plurality of frames of images into a target detection model to obtain a plurality of images with detection frames output by the target detection model;

determining pixels in a detection frame having a motion optical flow, comprising: detecting feature points in a detection frame of an image, wherein the feature points comprise corner features and Scale Invariant Feature Transform (SIFT) features, identifying the same pixel in different frames through the feature points, calculating the displacement of each feature point in the detection frame of the image relative to the same feature point in the image of the previous frame of the image, and determining that the pixel corresponding to the feature point has a motion light stream under the condition that the displacement of the feature point is greater than a displacement threshold value;

according to the number of pixels with motion optical flow in the detection frame, retaining part or all of the detection frames, comprising: deleting a detection frame of an image under the condition that the number of pixels with motion light flow in the detection frame of the image is smaller than a preset threshold value and the number of pixels with motion light flow in a range corresponding to the detection frame of the image in the first frames of images of the image is also smaller than the preset threshold value;

and determining the object in the reserved detection frame as the target object in motion in the corresponding image.

2. The object detection method according to claim 1, wherein detecting the feature point in the detection frame of the image comprises: and detecting all feature points in the image according to all pixels in each frame of image, and screening out the feature points falling into the detection frame.

3. The object detection method according to claim 1, wherein detecting the feature point in the detection frame of the image comprises: and detecting the characteristic points according to the pixel points in the detection frame.

4. The object detection method according to claim 1, wherein the object detection model is a neural network model;

the target detection method further includes:

training a neural network model by using training images to obtain a target detection model, wherein the training images comprise positive sample images and negative sample images, each positive sample image has position information of a marked target object, each negative sample image does not have the target object, and the negative sample images comprise images in a detection frame recognized by the target detection model in error.

5. The object detection method of claim 4, further comprising:

generating a virtual image by adopting a generating type countermeasure network based on the acquired real image;

and taking the virtual image as a training image.

6. The object detection method of claim 1, further comprising:

responding to the opening of a cabinet door of the vending device, collecting a video or continuously collecting a plurality of images as a sequence to be detected so as to detect a target object in the images, wherein the target object is a taken article;

an image of the target object is identified to determine an identity of the item being picked up.

7. A target object detection apparatus comprising:

the detection frame output module is configured to input a sequence to be detected comprising a plurality of frames of images into a target detection model, and obtain a plurality of images with detection frames output by the target detection model;

a motion optical flow determination module configured to determine pixels in the detection frame having motion optical flow, comprising: detecting feature points in a detection frame of an image, wherein the feature points comprise corner features and Scale Invariant Feature Transform (SIFT) features, identifying the same pixel in different frames through the feature points, calculating the displacement of each feature point in the detection frame of the image relative to the same feature point in the image of the previous frame of the image, and determining that the pixel corresponding to the feature point has a motion light stream under the condition that the displacement of the feature point is greater than a displacement threshold value;

a detection frame screening module configured to retain part or all of the detection frames according to the number of pixels with moving optical flow in the detection frames, comprising: deleting a detection frame of an image under the condition that the number of pixels with motion light flow in the detection frame of the image is smaller than a preset threshold value and the number of pixels with motion light flow in a range corresponding to the detection frame of the image in the first frames of images of the image is also smaller than the preset threshold value;

and the target object determining module is configured to determine the object in the reserved detection frame as the target object in motion in the corresponding image.

8. A target object detection apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to execute, based on instructions stored in the memory, a target object detection method for performing operations comprising:

9. The target object detection apparatus of claim 8, wherein the target detection model is a neural network model;

the operations further include:

10. The target object detection apparatus of claim 9, wherein the operations further comprise:

and taking the virtual image as a training image.

11. A target object detection system comprising:

the target object detection device of any one of claims 8 to 10; and

the image pickup device is configured to acquire a sequence to be detected including a plurality of frames of images.

12. The target object detection system of claim 11,

further comprising: a vending apparatus;

the camera device is located in the vending apparatus and is further configured to capture a video or continuously capture a plurality of images as a sequence to be detected in response to a cabinet door of the vending apparatus being opened.

13. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the object detection method of any one of claims 1 to 6.