WO2018090912A1 - 目标对象检测方法、装置及系统和神经网络结构 - Google Patents

目标对象检测方法、装置及系统和神经网络结构 Download PDF

Info

Publication number
WO2018090912A1
WO2018090912A1 PCT/CN2017/110953 CN2017110953W WO2018090912A1 WO 2018090912 A1 WO2018090912 A1 WO 2018090912A1 CN 2017110953 W CN2017110953 W CN 2017110953W WO 2018090912 A1 WO2018090912 A1 WO 2018090912A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
region
interest
current frame
frame
Prior art date
Application number
PCT/CN2017/110953
Other languages
English (en)
French (fr)
Inventor
康恺
李鸿升
欧阳万里
王晓刚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018090912A1 publication Critical patent/WO2018090912A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of video image processing, and in particular, to a target object detection method, apparatus and system, and neural network structure.
  • the video target object detection/tracking is a static image target object detection in the neighborhood of the video, and multi-category, multi-target object detection/tracking is performed in each frame of the video.
  • the video target object detection/tracking system is mainly based on static target object detection, and some post-processing techniques are added on the basis of the static target object detection result to implement video target object detection/tracking.
  • the embodiment of the present application provides a target object detection method, device, system, and neural network structure to implement time domain information multiplexing between different frame images.
  • a target object detecting method including:
  • each of the regions of interest comprising at least partially information of at least one target object; respectively extracting target objects in at least one region of interest of the current frame And predicting at least one region of interest of the current frame according to the feature of the target object, to obtain a prediction result; and determining, according to the prediction result, the region of interest to be detected by the subsequent frame.
  • the prediction result includes: the region of interest includes a probability of a target object and a predicted location of the target object.
  • the determining, according to the prediction result, the region of interest to be detected by the subsequent frame comprises: using a predicted location of the target object of the current frame as a region of interest of the subsequent frame to be detected.
  • the determining, according to the prediction result, the region of interest to be detected by the subsequent frame including: acquiring the current frame The boundary position of the target object in the region of interest is weighted; the boundary position corresponding to the region where the target object is located is weighted to obtain a boundary position of the target object region of the subsequent frame corresponding to the region of interest of the current frame.
  • the region of interest of the current frame at least partially includes information of a plurality of target objects; and acquiring the boundary location of the region where the target object is located in the region of interest of the current frame includes: acquiring the regions of interest separately The boundary position of the area where the target object is located; the boundary position corresponding to the area where the target object is located is weighted, and the boundary position of the target object area of the subsequent frame corresponding to the area of interest is obtained, including: for each type of target The boundary position of the region where the object is located is weighted to obtain the boundary position of the target object region of the subsequent frame corresponding to the region of interest.
  • the region of interest of the current frame at least partially includes information of a plurality of target objects; and acquiring the boundary location of the region where the target object is located in the region of interest of the current frame includes: acquiring the regions of interest separately The boundary position of the area where the target object is located; the boundary position corresponding to the area where the target object is located is weighted to obtain the boundary position of the target object area of the subsequent frame corresponding to the area of interest, including: for each type of target object The boundary position of the region and the probability that each type of target object is included in the region of interest of the current frame are weighted to obtain a boundary position of the target object region of the subsequent frame corresponding to the region of interest.
  • the method further includes: setting the at least one region of interest in a start frame of the sequence of video images based on a preset rule.
  • separately extracting the feature of the target object in the at least one region of interest of the current frame comprises: separately extracting feature trajectories of the target object that is memorized by the at least one region of interest of the current frame.
  • the feature track comprises: a feature of the target object in the at least one region of interest of the current frame and a feature track of the target object memorized by the region of interest of the previous frame of the current frame.
  • the predicting the at least one region of interest according to the feature of the target object, and obtaining the prediction result including: performing, by using a feature track of the target object of the current frame, the at least one region of interest Forecast, get prediction results.
  • a target object detecting apparatus including:
  • a first module of interest configured to determine at least one region of interest to be detected in a current frame of the sequence of video images, each of the regions of interest at least partially containing information of at least one target object; and a feature extraction module for separately extracting a feature of the target object in the at least one region of interest of the current frame; a prediction module, configured to predict at least one region of interest of the current frame according to a feature of the target object, to obtain a prediction result;
  • the interest module is configured to determine, according to the prediction result, a region of interest to be detected by a subsequent frame.
  • the prediction result includes: the region of interest includes a probability of a target object and a predicted location of the target object.
  • the second module of interest is configured to use a predicted location of the target object of the current frame as a region of interest of the subsequent frame to be detected.
  • the second module of interest includes: a location acquiring unit, configured to acquire a boundary position of the target object in the region of interest of the current frame in the region; and a location generating unit, configured to correspond to the region where the target object is located The boundary position is weighted to obtain a boundary position of a target object region of a subsequent frame corresponding to the region of interest.
  • the region of interest of the current frame at least partially includes information of a plurality of target objects;
  • the location acquiring unit is configured to respectively acquire boundary locations of regions in which the target objects of the various regions in the region of interest are located;
  • the boundary positions of the regions in which the various target objects are located are weighted to obtain the boundary positions of the target object regions of the subsequent frames corresponding to the regions of interest.
  • the region of interest of the current frame at least partially includes information of a plurality of target objects; the location obtaining unit is configured to respectively acquire boundary locations of regions in which the various types of target objects in the region of interest are located; and further includes: the location The generating unit is configured to weight the boundary position of the area where the various target objects are located and the probability that the various types of target objects are included in the region of interest of the current frame, to obtain a target object region of a subsequent frame corresponding to the region of interest Boundary position.
  • the target object detecting apparatus further includes: a starting module, configured to set the at least one region of interest in a start frame of the video image sequence based on a preset rule.
  • the feature extraction module is configured to separately extract feature trajectories of the target object that is memorized by the at least one region of interest of the current frame.
  • the feature track comprises: a feature of the target object in the at least one region of interest of the current frame and a feature track of the target object in the region of interest of the previous frame of the current frame.
  • the prediction module is further configured to predict the at least one region of interest by using a feature trajectory of the target object of the current frame to obtain a prediction result.
  • a target object detection system including:
  • An image acquiring device configured to acquire video image sequence data of a video image to be detected; a processor, configured to receive video image sequence data of the video image to be detected, to perform operations in the foregoing method; and a memory for storing at least one Executing instructions, the executable instructions cause the processor to perform operations corresponding to the above methods.
  • a neural network structure for target object detection including:
  • each layer of neural network is used to receive one frame of image data in a sequence of video images, for generating at least one region of interest for image data, and predicting target object detection for at least one region of interest
  • the prediction result includes the location of the target object
  • the prediction result of the current layer neural network is used as the input of the next layer of neural network
  • the next layer of neural network generates the image data received by the next layer of the neural network according to the prediction result of the layer neural network.
  • At least one feeling Interest area, and target object detection to get prediction results.
  • an electronic device including:
  • the processor runs the target object detecting device
  • the unit in the target object detecting device according to any of the embodiments of the present application is operated.
  • another electronic device including:
  • the processor runs the target object detection system
  • the unit in the target object detection system described in any of the embodiments of the present application is executed.
  • a further electronic device including:
  • the processor runs the neural network structure
  • the units in the neural network structure described in any of the embodiments of the present application are executed.
  • another electronic device comprising: one or more processors, a memory, a communication component, and a communication bus, the processor, the memory, and the communication component passing through Said communication bus completes communication with each other;
  • the memory is configured to store at least one executable instruction that causes the processor to perform an operation corresponding to the target object detection method as described in any of the embodiments of the present application.
  • a computer program comprising computer readable code, the processor in the device executing the implementation of the present application when the computer readable code is run on a device
  • a computer readable storage medium for storing computer readable instructions, and when the instructions are executed, implementing the target object detecting method according to any one of the embodiments of the present application The operation of each step in the process.
  • the instructions include: determining, at a current frame of the video image sequence, at least one region of interest to be detected, each region of interest including at least a portion of information of the at least one target object; extracting at least one of the current frames of interest, respectively An instruction of a feature of the target object in the region; predicting at least one region of interest of the current frame according to a feature of the target object, obtaining an instruction of the prediction result; and determining, according to the prediction result, an instruction of the region of interest of the subsequent frame to be detected, and many more.
  • the technical solution provided by the embodiment of the present application determines at least one region of interest to be detected in a current frame of the video image sequence, and then predicts the at least one region of interest according to characteristics of the at least one region of interest to obtain a prediction result, And determining a region of interest of the subsequent frame according to the prediction result of the at least one region of interest of the current frame. Therefore, when detecting the target object, the information of the current frame can be transmitted to the subsequent frame, and
  • the multiplexing of time domain information utilizes long-term time domain features, which provides a time domain basis for dealing with complex situations such as changes in the appearance of objects;
  • At least one region of interest to be detected is determined in a current frame of the video image sequence, and then the at least one region of interest is predicted according to characteristics of the at least one region of interest to obtain a prediction result, since at least one image frame is determined.
  • the region of interest and the prediction of the region of interest the technical solution of the embodiment of the present application is based on the prediction of the regionalized feature of the image data itself, and can perform parallel detection/tracking on the target object, thereby reducing the time-consuming detection.
  • FIG. 1 is a flowchart of an embodiment of a method for detecting a target object of the present application
  • FIG. 2 is a flowchart of another embodiment of a target object detecting method of the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a neural network structure for target object detection according to the present application.
  • FIG. 4 is a schematic structural diagram of a memory model in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of a target object detecting apparatus according to the present application.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flowchart of an embodiment of a method for detecting a target object of the present application.
  • the method for detecting a target object of the embodiment includes the following steps:
  • Step S100 determining at least one region of interest of the current frame.
  • At least one Region of Interest is determined in a current frame of the sequence of video images, wherein each region of interest at least partially contains information of at least one target object.
  • the at least one region of interest of the current frame may be determined to be generated according to a previous frame (eg, a previous frame) of the current frame.
  • a previous frame e.g. a previous frame
  • the expansion description of the region of interest of the subsequent frame is determined, and details are not described herein again.
  • each frame image of the video image sequence may include one target object, or may include multiple target objects; in the generated at least one region of interest, each of the regions of interest may partially include one or more Goals
  • the object information may also contain one or more target object information in its entirety.
  • the subsequent frames are in the same video image sequence, and the detection timing is located after the current frame, and the subsequent frames may be image frames that lag behind the current frame in the time domain when detecting in the time domain. It may also be an image frame that is located before the current frame in the time domain when the time domain is reverse detected.
  • step S100 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first module of interest 100 being executed by the processor.
  • Step S200 extracting target object features in the at least one region of interest of the current frame, respectively.
  • the features of each region of interest may be extracted in a parallel manner, thereby extracting the target object in the region of interest from the background.
  • the feature may be implemented through a neural network.
  • other algorithms may be used to implement feature extraction of each region of interest.
  • the extracted features may be, for example, appearance features of the target object.
  • step S200 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by feature extraction module 200 executed by the processor.
  • Step S300 predicting the at least one region of interest of the current frame according to the feature of the target object, to obtain a prediction result.
  • the prediction result includes a probability p of the region of interest including the target object and a predicted location of the target object.
  • the target object may be one or more target objects of the same type, such as multiple vehicles, multiple aircrafts, etc.; or different types of target objects, such as airplanes, automobiles, bicycles, people, etc. random combination.
  • the number of target objects of each category may also be one or more.
  • the probability (probability) of each object included in each region of interest (RoI) and the prediction of each object position may be predicted according to the characteristics of the target object.
  • the position of each target object may be represented by coordinates of a boundary (eg, a border, each vertex, etc.) of the pixel area where the target object is located.
  • a boundary eg, a border, each vertex, etc.
  • the pixel block of interest region covers the range, it is also possible to characterize the location of each type of object based on a certain regular manner (for example, the center coordinates of the region of interest).
  • the position of the target object predicted for the region of interest of the current frame has a certain positional offset with respect to the region of interest of the plurality of target objects generated by the current frame.
  • the prediction results include the bounding box regression and the prediction probability p of each type of target object.
  • step S300 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by The prediction module 300, which is executed by the processor, executes.
  • Step S400 determining a region of interest of the subsequent frame to be detected.
  • the region of interest of the subsequent frame is determined according to the prediction result of the at least one region of interest of the current frame.
  • step S400 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second module of interest 400 being executed by the processor.
  • the following frame is taken as an example of the next frame of the current frame.
  • the predicted position of the target object of the current frame may be used as the region of interest of the subsequent frame (for example, the next frame), that is, the target region predicted by the current frame is directly generated.
  • the boundary position of the target object in the at least one region of interest of the current frame may be acquired; and the boundary position corresponding to the region where the target object is located is weighted to obtain a subsequent frame corresponding to the region of interest ( For example, the boundary position of the target object region of the next frame, thereby generating the region where each target object of the subsequent frame (for example, the next frame) is located.
  • the determined boundary position can be used as the interest of the subsequent frame (eg, the next frame) to be detected. region.
  • the predicted location area coordinate or the weighted coordinate area may be used as the target object of the subsequent frame (eg, the next frame) Frame t+1.
  • the area is obtained to obtain the region of interest; then, the subsequent frame (for example, the next frame) Frame t+1 is predicted to obtain the predicted position region as the next subsequent frame (for example, two frames after the current frame), and the target object of Frame t+2 is located.
  • the “next frame” in which “subsequent frame” is “current frame” is taken as an example. In other embodiments, “subsequent frame” may also be after “current frame”.
  • the weighting coefficient may be reasonably determined according to the difference in the number of frames of the "subsequent frame” and the "current frame", or the region of interest for the "subsequent frame” may be implemented in combination with motion estimation or the like. Determine more accurately.
  • the target object when the target object detection method of the embodiment is applied, the target object may be tracked when the target object is continuously detected in the time domain; or may be equally spaced or not in the time domain. Sampling detection of several image frames at equal intervals; some image frame sub-sequences to be detected may also be determined in the video image sequence, and then these determined image frame sub-sequences may be detected and/or tracked; and single-frame images may also be detected.
  • At least one region of interest to be detected is determined in a current frame of the video image sequence, and then the at least one region of interest is predicted according to characteristics of the at least one region of interest to obtain a prediction result, and, according to the current The prediction result of the at least one region of interest of the frame determines the region of interest of the subsequent frame. Therefore, when the target object is detected, the information of the current frame can be transmitted to the subsequent frame, and time domain information between different frame images can be realized. Reuse, use long The time domain characteristics of the process, and thus provide a time domain basis for dealing with complex situations such as changes in the appearance of objects;
  • At least one region of interest to be detected is determined in a current frame of the video image sequence, and then the at least one region of interest is predicted according to characteristics of the at least one region of interest to obtain a prediction result, since at least one image frame is determined.
  • the region of interest and the prediction of the region of interest the technical solution of the embodiment of the present application is based on the prediction of the regionalized feature of the image data itself, and can perform parallel detection/tracking on the target object, thereby reducing the time-consuming detection.
  • the current frame is of interest.
  • a region may contain, at least in part, information about multiple target objects.
  • the boundary position d c of each region in the region of interest may be respectively obtained, where c is an integer, and 1 ⁇ c ⁇ C, C is the number of target objects; and then, for the current frame
  • Each target object included in each region of interest is weighted by the boundary position d c of the region where each target object is located, and the boundary position of the region where the target object of the subsequent frame corresponding to the region of interest of the current frame is obtained is obtained by the weighting
  • the resulting boundary position can be obtained as the region of the target object of the subsequent frame as the region of interest of the subsequent frame corresponding to the region of interest of the current frame.
  • the weights may be weighted by the probability that each target object is included.
  • the probability p c of each target object included in the region of interest of the current frame may be separately obtained;
  • the region boundary position d c and the probability p c of each target object included in the region of interest of the current frame are weighted to obtain a boundary position of the region of the target object of the subsequent frame corresponding to the region of interest of the current frame.
  • the horizontal and vertical coordinates of the upper left corner and the lower right corner of the region of interest of the cth target object, respectively, may of course be replaced by other boundary coordinates; then, the boundary position of the target object and the target object are included in the region of interest.
  • the probability of the weighting is obtained, and the boundary position of the region of the target object of the subsequent frame corresponding to the region of interest of the current frame is obtained.
  • the boundary of the target object region of the subsequent frame may be obtained by weighting the following formula:
  • At least one region of interest may be set in the start frame of the video image sequence based on a preset rule to predict each region of interest of the start frame. Get the predicted results.
  • the region of interest of the start frame may be set by using, for example, a Region Proposal Network (RPN).
  • RPN Region Proposal Network
  • other network proposals may also be used to set the start frame. Area of interest.
  • respectively extracting the features of the target object in the region of interest of the current frame comprises: respectively extracting feature trajectories of the target object of the region of interest of the current frame, the feature trajectory may include a sense of the current frame The feature of the target object in the region of interest and the feature track of the target object memorized by the region of interest of the previous frame of the current frame. Therefore, when predicting each region of interest according to the feature of the target object, each region of interest can be predicted by the feature trajectory of the target object of the current frame to obtain a prediction result.
  • the former frame refers to an image frame or an image frame set in the same video image sequence that is located before the current frame at the detection timing, and may be an image frame or image that leads the current frame along the time domain.
  • the frame set that is, the previous frame may be an image frame that is ahead of the current frame in the time domain, or may be a set of image sequences formed by several image frames ahead of the current frame.
  • the previous frame may also be reversed along the time domain.
  • the method may further include:
  • Step S510 The feature of the target object in the at least one region of interest of the current frame corresponding to the current time is memorized based on the preset duration.
  • FIG. 4 is a schematic structural diagram of an embodiment of a memory model in the embodiment of the present application.
  • it may be implemented by, for example, Long Short-term Memory (LSTM), such as the LSTM marked in FIG.
  • the model can memorize the characteristics (x t , x t-1 , x t+1 ) of the corresponding current frame by the memory units c t , c t-1 , c t+1 , wherein the memory unit c t memory
  • the feature of the current frame corresponding to time t, c t-1 stores the feature of the current frame corresponding to time t-1
  • c t+1 stores the feature of the current frame corresponding to time t+1, and the like.
  • the control of the preset duration can be implemented by the forgetting gate.
  • the memory control of the feature of the t-1 moment is realized, by forgetting the gate f t
  • the memory control of the feature at time t is realized, and the memory control of the feature at time t+1 is realized by forgetting the gate f t+1 .
  • the attitude change frequency of the target object may be acquired, and then the length of the preset duration is adjusted according to the posture change frequency to complete the memory control of the feature of the forgetting gate.
  • the forgetting gate may be turned off to achieve faster memory of the feature of the current frame, and achieve fast update of the feature.
  • Step S520 the feature of the target object in the at least one region of interest that is memorized is used as a memory input of the subsequent frame.
  • the memory unit at the current moment can transfer the characteristics of its memory to the memory unit at the next moment. For example, referring to FIG. 4, c t-1 is transmitted to c t , and c t is transmitted to c t . +1 , whereby the features of the trajectory are stored in the time domain. It should be noted that by storing the features of the trajectory in the time domain, it is possible to more effectively determine whether the posture change of the feature is significant.
  • the region of interest is determined in the subsequent frame, whether the feature of the target object changes can be determined according to the feature of the memory input, thereby determining Whether it is possible to inherit the features of the previous moment in the time domain.
  • the memory unit of the previous moment can transfer the features of its memory to the memory unit of the next moment, the feature of the target object of the previous frame memory can be remembered as the feature of the current frame. Thereby, the probability of occurrence of tracking failure due to the disappearance of the target object feature can be reduced.
  • the input memory can be controlled by the input gate (the input gate corresponding to the t-1, t, and t+1 moments in FIG. 3 respectively).
  • the input gate the input gate corresponding to the t-1, t, and t+1 moments in FIG. 3 respectively.
  • i t-1 , i t , i t+1 whether the input gate control needs to change the memory unit with the current input. Therefore, in the case of current frame object occlusion and motion blur, the input gate can be closed to memorize the features of the previous frame so as not to affect the storage of the target object feature in the time domain.
  • the flow of information may also be controlled by adding other logic gate structures.
  • FIG. 4 for example, an output gate, as shown in FIG. 3 at times t-1, t, and t+1, respectively.
  • the control of the output gate when the tracking fails, the detection/tracking is exited in time, so that the system operation load can be effectively reduced.
  • Any target object detection method provided by the embodiment of the present application may be performed by any suitable device having data processing capability, including but not limited to: a terminal device, a server, and the like.
  • any target object detection method provided by the embodiment of the present application may be executed by a processor, such as the processor executing any one of the target object detection methods mentioned in the embodiments of the present application by calling corresponding instructions stored in the memory. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the embodiment further discloses a target object detecting device.
  • the target object detecting device of each embodiment of the present application can be used to implement the foregoing target object detecting method embodiments of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of the target object detecting apparatus.
  • the target object detecting apparatus of the embodiment includes: a first interested module 100 , a feature extracting module 200 , a prediction module 300 , and a second interested module 400 . ,among them:
  • the first module of interest 100 is configured to determine at least one region of interest to be detected in a current frame of the sequence of video images, each region of interest at least partially containing information of at least one target object;
  • the feature extraction module 200 is configured to respectively extract features of the target object in the at least one region of interest of the current frame
  • the prediction module 300 is configured to predict the at least one region of interest of the current frame according to the feature of the target object, to obtain a prediction result;
  • the second module of interest 400 is configured to determine a region of interest to be detected by the subsequent frame according to the prediction result of the at least one region of interest of the current frame.
  • the prediction result may include the probability that the region of interest includes the target object and the predicted location of the target object.
  • the second module of interest 400 is configured to use the predicted position of the target object of the current frame as the region of interest of the subsequent frame to be detected.
  • the second module of interest 400 includes: a location acquiring unit, configured to acquire a boundary position of the target object in the region of interest of the current frame, and a location generating unit, configured to target the region of the target object The corresponding boundary position is weighted to obtain a boundary position of the target object region of the subsequent frame corresponding to the region of interest.
  • the region of interest of the current frame at least partially contains information of the plurality of types of target objects; the location obtaining unit is configured to respectively acquire the boundary locations of the regions in which the various target objects in the region of interest are located; the location generating unit is configured to: The boundary positions of the regions in which the various target objects are located are weighted to obtain the boundary positions of the target object regions of the subsequent frames corresponding to the region of interest.
  • the region of interest of the current frame at least partially contains information of the plurality of types of target objects; the location obtaining unit is configured to respectively acquire the boundary locations of the regions in which the various target objects in the region of interest are located; the location generating unit is configured to: The boundary positions of the regions in which the various target objects are located and the probability that the various types of target objects are included in the region of interest of the current frame are weighted to obtain the boundary position of the target object region of the subsequent frame corresponding to the region of interest.
  • the method further includes: a starting module, configured to set at least one region of interest in a starting frame of the sequence of video images based on a preset rule.
  • the feature extraction module 200 is configured to separately extract feature trajectories of the target object of the region of interest of the current frame.
  • the feature trajectory comprises: a feature of the target object in the at least one region of interest of the current frame and a feature trajectory of the target object in the region of interest of the previous frame of the current frame.
  • the prediction module is further configured to predict the at least one region of interest by using a feature trajectory of the target object of the current frame to obtain a prediction result.
  • the embodiment of the present application further discloses a target object detection system, including:
  • An image acquiring device configured to acquire video image sequence data of a video image to be detected; a processor, receiving video image sequence data of the video image to be detected, for performing an operation in the target object detecting method according to any one of the embodiments of the present application; And storing at least one executable instruction, the executable instruction causing the processor to perform the operation corresponding to the target object detection target object detection method of any one of the foregoing embodiments of the present application.
  • the neural network structure includes:
  • each layer of neural network is used to receive one frame of image data in a sequence of video images, for generating at least one region of interest for image data, and predicting target object detection for at least one region of interest
  • the prediction result includes the location of the target object
  • the prediction result of the current layer neural network is used as the input of the next layer of neural network
  • the next layer of neural network generates the image data received by the next layer of the neural network according to the prediction result of the layer neural network.
  • the embodiment of the present application further discloses an electronic device, including:
  • the processor runs the target object detecting device
  • the unit in the target object detecting device of any of the above embodiments of the present application is operated.
  • Another embodiment of the present application further discloses another electronic device, including:
  • the processor runs the target object detection system
  • the unit in the target object detection system of any of the above embodiments of the present application is executed.
  • the embodiment of the present application further discloses another electronic device, including:
  • the processor When the processor is running a neural network structure, the units in the neural network structure of any of the above embodiments of the present application are executed.
  • the embodiment of the present application further discloses another electronic device, including: one or more processors, a memory, a communication component, and a communication bus, where the processor, the memory, and the communication component complete communication with each other through the communication bus;
  • the memory is configured to store at least one executable instruction that causes the processor to perform an operation corresponding to the target object detection method of any of the above embodiments of the present application.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, an industrial computer (IPC), and the like.
  • the computer system 600 includes one or more processors and a communication unit.
  • the one or more processors for example: one or more central processing units (CPUs) 601, and/or One or more image processor (GPU) 613 or the like, the processor may load executable instructions stored in read only memory (ROM) 602 or executable instructions from random storage memory (RAM) 603 from storage portion 608.
  • the communication unit 612 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (InfiniBand) network card.
  • the processor can communicate with the read-only memory 602 and/or the random access memory 630 to execute executable instructions, connect to the communication unit 612 via the bus 604, and communicate with other target devices via the communication unit 612, thereby completing the embodiments of the present application.
  • Corresponding operation of any one of the methods for example, determining at least one region of interest to be detected in a current frame of the sequence of video images, each region of interest containing at least a portion of information of the target object; extracting the current frame respectively The operation of the feature of the target object in the region of interest; predicting each region of interest of the current frame according to the feature of the target object, obtaining an operation of the prediction result; determining the subsequent frame to be detected according to the prediction result of each region of interest of the current frame The operation of the area of interest, and so on.
  • RAM 603 various programs and data required for the operation of the device can be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • ROM 602 is an optional module.
  • the RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the processor 601 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the communication unit 612 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • FIG. 6 is only an optional implementation manner.
  • the number and type of components in FIG. 6 may be selected, deleted, added, or replaced according to actual needs;
  • Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication can be separated, or integrated on the CPU or GPU. ,and many more.
  • the embodiment of the present application further provides a computer program, comprising computer readable code, when the computer readable code is run on a device, the processor in the device executes the target object for implementing any of the above embodiments of the present application.
  • the instructions for each step in the method are detected.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, determining, at a current region of the video image sequence, at least one region of interest to be detected, each region of interest containing at least one instruction of information of the target object; An instruction to extract a feature of the target object in the region of interest of the current frame; predicting each region of interest of the current frame according to the feature of the target object, obtaining an instruction of the prediction result; determining according to the prediction result of each region of interest of the current frame Instructions for subsequent regions of interest to be detected, and so on.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the present application are performed.
  • the embodiment of the present application further provides a computer readable storage medium for storing computer readable instructions, which when executed, implements the operations of the steps in the target object detecting method of any of the above embodiments of the present application.
  • the methods, apparatus, and apparatus of the present application may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order of the above optional description unless otherwise specified.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种目标对象检测方法、装置及系统和神经网络结构,其中,所述方法包括:在视频图像序列的当前帧确定至少一个待检测的感兴趣区域(S100),每个感兴趣区域至少部分包含至少一个目标对象的信息;分别提取当前帧的至少一个感兴趣区域中的目标对象的特征(S200);根据目标对象的特征对当前帧的至少一个感兴趣区域进行预测,得到预测结果(S300);根据所述预测结果确定后续帧待检测的感兴趣区域(S400)。因此,在对目标对象进行检测时,能够将当前帧的信息传送至后续帧,实现不同帧图像之间的时域信息复用,充分利用了长程的时域特征,进而为处理对象外貌变化等复杂情况提供了时域依据。

Description

目标对象检测方法、装置及系统和神经网络结构
本申请要求在2016年11月15日提交中国专利局、申请号为CN 201611013117.9、发明名称为“目标对象检测方法、装置及系统和神经网络结构”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频图像处理领域,具体涉及一种目标对象检测方法、装置及系统和神经网络结构。
背景技术
视频目标对象检测/跟踪是静态图像目标对象检测在视频中邻域的拓展,在视频的每一帧中进行多类别、多目标对象的检测/跟踪。
现有技术中,视频目标对象检测/跟踪系统主要基于静态目标对象检测,在静态目标对象检测结果的基础上加入一些后期处理技术来实现视频目标对象检测/跟踪。
发明内容
本申请实施例提供一种目标对象检测方法、装置及系统和神经网络结构,以实现不同帧图像之间的时域信息复用。
根据本申请实施例的一方面,提供了一种目标对象检测方法,包括:
在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个所述感兴趣区域至少部分包含至少一个目标对象的信息;分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征;根据所述目标对象的特征对所述当前帧的至少一个感兴趣区域进行预测,得到预测结果;根据所述预测结果确定后续帧待检测的感兴趣区域。
可选地,所述预测结果包括:所述感兴趣区域包含目标对象的概率和所述目标对象的预测位置。
可选地,所述根据所述预测结果确定后续帧待检测的感兴趣区域,包括:将所述当前帧的所述目标对象的预测位置作为所述后续帧的待检测的感兴趣区域。
可选地,所述根据所述预测结果确定后续帧待检测的感兴趣区域,包括:获取所述当前帧 的感兴趣区域中目标对象在区域的边界位置;对目标对象所在区域对应的所述边界位置进行加权,得到与所述当前帧的感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;所述获取所述当前帧的感兴趣区域中目标对象所在区域的边界位置,包括:分别获取感兴趣区域中各类目标对象所在区域边界位置;所述对目标对象所在区域对应的所述边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置,包括:对各类目标对象所在区域边界位置进行加权得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;所述获取所述当前帧的感兴趣区域中目标对象所在区域的边界位置,包括:分别获取感兴趣区域中各类目标对象所在区域边界位置;所述对目标对象所在区域对应的所述边界位置进行加权得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置,包括:对各类目标对象所在区域边界位置和各类目标对象包含在所述当前帧的感兴趣区域中的概率进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,还包括:基于预设规则在所述视频图像序列的起始帧设置所述至少一个感兴趣区域。
可选地,分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征包括:分别提取所述当前帧的至少一个感兴趣区域记忆的目标对象的特征轨迹。
可选地,所述特征轨迹包含:所述当前帧的至少一个感兴趣区域中的目标对象的特征和所述当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。
可选地,所述根据所述目标对象的特征对所述至少一个感兴趣区域进行预测,得到预测结果,包括:通过所述当前帧的目标对象的特征轨迹对所述至少一个感兴趣区域进行预测,得到预测结果。
根据本申请实施例的另一方面,提供了一种目标对象检测装置,包括:
第一感兴趣模块,用于在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个所述感兴趣区域至少部分包含至少一个目标对象的信息;特征提取模块,用于分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征;预测模块,用于根据所述目标对象的特征对所述当前帧的至少一个感兴趣区域进行预测,得到预测结果;第二感兴趣模块,用于根据所述预测结果确定后续帧待检测的感兴趣区域。
可选地,所述预测结果包括:所述感兴趣区域包含目标对象的概率和所述目标对象的预测位置。
可选地,所述第二感兴趣模块用于将所述当前帧的所述目标对象的预测位置作为所述后续帧的待检测的感兴趣区域。
可选地,所述第二感兴趣模块包括:位置获取单元,用于获取所述当前帧的感兴趣区域中目标对象在区域的边界位置;位置生成单元,用于对目标对象所在区域对应的所述边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;所述位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;所述位置生成单元用于对各类目标对象所在区域边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;所述位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;还包括:所述位置生成单元用于对各类目标对象所在区域边界位置和各类目标对象包含在所述当前帧的感兴趣区域中的概率进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
可选地,目标对象检测装置还包括:起始模块,用于基于预设规则在视频图像序列的起始帧设置所述至少一个感兴趣区域。
可选地,所述特征提取模块用于分别提取所述当前帧的至少一个感兴趣区域记忆的目标对象的特征轨迹。
可选地,所述特征轨迹包含:当前帧的至少一个感兴趣区域中的目标对象的特征和当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。
可选地,所述预测模块还用于通过所述当前帧的目标对象的特征轨迹对所述至少一个感兴趣区域进行预测,得到预测结果。
根据本申请实施例的又一方面,提供了一种目标对象检测系统,包括:
图像获取装置,用于获取待检测视频图像的视频图像序列数据;处理器,用于接收待检测视频图像的视频图像序列数据,用于执行上述方法中的操作;存储器,用于存放至少一可执行指令,可执行指令使处理器执行上述方法对应的操作。
根据本申请实施例的再一方面,提供了一种用于目标对象检测的神经网络结构,包括:
级联的多层神经网络,各层神经网络用于接收视频图像序列中的一帧图像数据,用于对图像数据生成至少一个感兴趣区域,并对至少一个感兴趣区域进行目标对象检测得到预测结果,预测结果包括目标对象的位置;本层神经网络的预测结果作为下一层神经网络的输入,下一层神经网络根据本层神经网络的预测结果对下一层神经网络接收的图像数据生成至少一个感兴 趣区域,并进行目标对象检测得到预测结果。
根据本申请实施例的再一方面,提供了一种电子设备,包括:
处理器和本申请任一实施例所述的目标对象检测装置;
在处理器运行所述目标对象检测装置时,本申请任一实施例所述的目标对象检测装置中的单元被运行。
根据本申请实施例的再一方面,提供了另一种电子设备,包括:
处理器和本申请任一实施例所述的目标对象检测系统;
在处理器运行所述目标对象检测系统时,本申请任一实施例所述的目标对象检测系统中的单元被运行。
根据本申请实施例的再一方面,提供了又一种电子设备,包括:
处理器和本申请任一实施例所述的神经网络结构;
在处理器运行所述神经网络结构时,本申请任一实施例所述的神经网络结构中的单元被运行。
根据本申请实施例的再一方面,提供了再一种电子设备,包括:一个或多个处理器、存储器、通信部件和通信总线,所述处理器、所述存储器和所述通信部件通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如本申请任一实施例所述的目标对象检测方法对应的操作。
根据本申请实施例的再一方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例所述的目标对象检测方法中各步骤的指令。
根据本申请实施例的再一方面,提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请任一实施例所述的目标对象检测方法中各步骤的操作。
例如,所述指令包括:在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个感兴趣区域至少部分包含至少一个目标对象的信息的指令;分别提取当前帧的至少一个感兴趣区域中的目标对象的特征的指令;根据目标对象的特征对当前帧的至少一个感兴趣区域进行预测,得到预测结果的指令;根据所述预测结果确定后续帧待检测的感兴趣区域的指令,等等。
本申请实施例技术方案,可以实现如下至少一技术效果:
本申请实施例提供的技术方案,在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,而后,根据该至少一个感兴趣区域的特征对该至少一个感兴趣区域进行预测得到预测结果, 并且,根据当前帧的至少一个感兴趣区域的预测结果确定后续帧的感兴趣区域,因此,在对目标对象进行检测时,能够将当前帧的信息传送至后续帧,可以实现不同帧图像之间的时域信息复用,利用了长程的时域特征,进而为处理对象外貌变化等复杂情况提供了时域依据;
此外,在视频图像序列的当前帧确定待检测的至少一个感兴趣区域,而后,根据该至少一个感兴趣区域的特征对该至少一个感兴趣区域进行预测得到预测结果,由于将图像帧确定至少一个感兴趣区域,并对感兴趣区域进行预测,本申请实施例技术方案基于图像数据本身区域化特征的预测,能够对目标对象进行并行检测/跟踪,减少了检测耗时。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
为了更清楚地说明本申请可选实施方式或现有技术中的技术方案,下面将对可选实施方式或现有技术描述中所需要使用的附图作简单地介绍,构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请目标对象检测方法一实施例的流程图;
图2为本申请目标对象检测方法另一实施例的流程图;
图3为本申请用于目标对象检测的神经网络结构一个实施例的结构示意图;
图4为本申请实施例中一种记忆模型的结构示意图;
图5为本申请目标对象检测装置一个实施例的结构示意图;
图6为适于用来实现本申请实施例的计算机系统的结构示意图;
具体实施方式
下面将结合附图对本申请的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。应注意到:除非另外可选说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下, 所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
此外,下面所描述的本申请不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
通常视频图像为时域上连续的图像集合,不同帧图像之间的特征具有一定的关联性。因此,为了充分利用时域信息,实现不同帧图像的特征(例如时域信息)复用,以提高目标对象检测效率和精度。本实施例公开了一种目标对象检测方法,需要说明的是,在时域上进行连续检测时,可以实现对目标对象的跟踪。请参考图1,为本申请目标对象检测方法一实施例的流程图,该实施例的目标对象检测方法包括如下步骤:
步骤S100,确定当前帧至少一个感兴趣区域。
本申请的各实施例中,在视频图像序列的当前帧确定至少一个感兴趣区域(Region of interest,RoI),其中,每个感兴趣区域至少部分包含至少一个目标对象的信息。在一个可选实施例中,当前帧的至少一个感兴趣区域可以根据当前帧的先前帧(例如上一帧)来确定生成,可选地,可参见下文步骤S400中有关根据当前帧的预测结果确定后续帧的感兴趣区域的展开说明,在此不再赘述。
本申请的各实施例中,视频图像序列的各帧图像中可以包含一个目标对象,也可以包含多个目标对象;在生成的至少一个感兴趣区域中,各个感兴趣区域可能部分包含一个或多个目标 对象信息,也可能完整地包含一个或多个目标对象信息。
本申请的各实施例中,后续帧为同一视频图像序列中,检测时序位于当前帧之后的帧图像,后续帧可以是沿时域正向检测时在时域上滞后于当前帧的图像帧,也可以是沿时域反向检测时在时域上位于当前帧之前的图像帧。
在一个可选示例中,步骤S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一感兴趣模块100执行。
步骤S200,分别提取当前帧的上述至少一个感兴趣区域中的目标对象特征。
本申请的各实施例中,可以示例性地采用并行的方式提取各感兴趣区域的特征,从而将感兴趣区域中的目标对象从背景中提取出来,可选地,可以通过神经网络来实现特征的提取,作为一个例子,可以采用卷积神经网络、GoogleNet、VGG、ResNet等网络,当然,在可选的实施例中,还可以采用其它算法来实现各感兴趣区域的特征提取。本申请各实施例中,提取的特征可以是例如目标对象的外貌特征。
在一个可选示例中,步骤S200可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的特征提取模块200执行。
步骤S300,根据目标对象的特征对当前帧的上述至少一个感兴趣区域进行预测,得到预测结果。
本申请的各实施例中,预测结果包括感兴趣区域包含目标对象的概率p和目标对象的预测位置。本申请的各实施例中,目标对象可以是同一类的一个或多个目标对象,例如多辆汽车、多架飞机等;也可以是不同类的目标对象,例如飞机、汽车、自行车、人等任意组合。在不同类的目标对象中,各类别的目标对象数目也可以是一个或多个。在一个可选实施例中,当神经网络训练完成后,可以根据目标对象的特征预测每个感兴趣区域(RoI)内所包含每一个对象的可能性(概率)以及每一个对象位置的预测,可选地,每个目标对象的位置可以通过目标对象所在像素区域的边界(例如边框、各顶角等)坐标表示,当然,当各感兴趣区域为相同大小或者具有一定规律形状或者能够推断出感兴趣区域像素块覆盖范围时,也可以基于一定规则的方式(例如感兴趣区域中心坐标)来表征各类对象所处的位置。通常,针对当前帧的感兴趣区域预测得到的目标对象的位置,相对于当前帧生成的多个目标对象的感兴趣区域会有一定的位置偏移量。作为一个例子,请参考图3,通过神经网络的卷积层(convolutional layers)进行预测,得到预测结果,该预测结果包括了每一类目标对象的预测位置(bounding box regression)以及预测概率p。
在一个可选示例中,步骤S300可以由处理器调用存储器存储的相应指令执行,也可以由 被处理器运行的预测模块300执行。
步骤S400,确定后续帧的待检测的感兴趣区域。
可选地,根据当前帧的上述至少一个感兴趣区域的预测结果,确定后续帧的感兴趣区域。
在一个可选示例中,步骤S400可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二感兴趣模块400执行。
本实施例中,以后续帧为当前帧的下一帧为例进行说明。在一种实施例中,可以将当前帧的目标对象的预测位置作为后续帧(例如下一帧)的待检测的感兴趣区域,即直接将当前帧预测得到的各个目标对象预测位置区域对应生成后续帧(例如下一帧)的各个目标对象所在的区域,以分别作为后续帧(例如下一帧)的待检测的感兴趣区域。在另一种实施例中,也可以获取当前帧的至少一个感兴趣区域中目标对象在区域的边界位置;对目标对象所在区域对应的边界位置进行加权,得到与感兴趣区域对应的后续帧(例如下一帧)的目标对象区域的边界位置,从而生成后续帧(例如下一帧)的各个目标对象所在的区域。
本申请各实施例中,在确定了后续帧(例如下一帧)的目标对象所在区域的边界位置后,通过确定的边界位置即可作为后续帧(例如下一帧)的待检测的感兴趣区域。请参考图3,在预测得到当前帧Frame t目标对象预测位置区域后,可将该预测位置区域坐标或者加权后的坐标区域作为后续帧(例如下一帧)Frame t+1的目标对象所在的区域,以得到感兴趣区域;而后,再将后续帧(例如下一帧)Frame t+1预测得到预测位置区域作为下一后续帧(例如当前帧之后两帧)Frame t+2的目标对象所在的区域,以得到感兴趣区域。
需要说明的是,上述实施例中,以“后续帧”为“当前帧”的“下一帧”为例进行说明,在其它实施例中,“后续帧”也可能是“当前帧”之后的若干帧,在一可选实施过程中,可以根据“后续帧”与“当前帧”的帧数差来合理确定加权系数,或者还可以结合运动估计等来实现对“后续帧”的感兴趣区域更准确地确定。
需要说明的是,在应用本实施例的目标对象检测方法时:可以在对目标对象进行时域上的连续检测时,即可实现对目标对象进行跟踪;也可以在时域上等间隔或者不等间隔对若干图像帧进行抽样检测;还可以在视频图像序列中确定一些待检测的图像帧子序列,而后对这些确定的图像帧子序列进行检测和/或跟踪;还可以检测单帧图像。
本实施例中,在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,而后,根据该至少一个感兴趣区域的特征对该至少一个感兴趣区域进行预测得到预测结果,并且,根据当前帧的至少一个感兴趣区域的预测结果确定后续帧的感兴趣区域,因此,在对目标对象进行检测时,能够将当前帧的信息传送至后续帧,可以实现不同帧图像之间的时域信息复用,利用了长 程的时域特征,进而为处理对象外貌变化等复杂情况提供了时域依据;
此外,在视频图像序列的当前帧确定待检测的至少一个感兴趣区域,而后,根据该至少一个感兴趣区域的特征对该至少一个感兴趣区域进行预测得到预测结果,由于将图像帧确定至少一个感兴趣区域,并对感兴趣区域进行预测,本申请实施例技术方案基于图像数据本身区域化特征的预测,能够对目标对象进行并行检测/跟踪,减少了检测耗时。
为了实现对多类目标对象的检测,作为一个可选的实施例,当检测/跟踪的目标对象为多个(可以是同类目标对象,也可以是不同类目标对象)时,当前帧的感兴趣区域可能至少部分包含多个目标对象的信息。针对每个感兴趣区域,可以分别获取感兴趣区域中各个目标对象所在区域边界位置dc,其中,c为整数,且1≤c≤C,C为目标对象的个数;而后,针对当前帧中每个感兴趣区域所包含的各个目标对象,对各个目标对象所在区域边界位置dc进行加权,得到与当前帧的感兴趣区域对应的后续帧的目标对象所在区域的边界位置,通过该加权后得到的边界位置可以得到后续帧的目标对象所在区域,以作为与当前帧的感兴趣区域对应的后续帧的感兴趣区域。
作为一个可选的实施例,可以通过各个目标对象被包含的概率来进行加权,可选地,可以分别获取各个目标对象包含在当前帧的感兴趣区域中的概率pc;对各个目标对象所在区域边界位置dc和各个目标对象包含在当前帧的感兴趣区域中的概率pc进行加权,得到与当前帧的感兴趣区域对应的后续帧的目标对象所在区域的边界位置。作为一个例子,以当前帧中某一个感兴趣区域为例进行说明:分别获取各个目标对象包含在当前帧的感兴趣区域中的概率pc,c=1,2,3…C;并获取该感兴趣区域中各个目标对象预测得到位置
Figure PCTCN2017110953-appb-000001
其中,
Figure PCTCN2017110953-appb-000002
分别为第c个目标对象所在感兴趣区域的左上角、右下角的横纵坐标,当然,也可以采用其它的边界坐标替代;而后,对目标对象所在边界位置和目标对象包含在该感兴趣区域的概率进行加权,得到与当前帧的感兴趣区域对应的后续帧的目标对象所在区域的边界位置,可选地,可以采用下述公式进行加权得到后续帧的目标对象区域的边界位置:
Figure PCTCN2017110953-appb-000003
其中,d*为与当前帧的感兴趣区域对应的后续帧的目标对象区域的边界位置;c为整数,且1≤c≤C,C为目标对象的个数;dc为各个目标对象所在区域边界位置;pc为各个目标对象包含在当前帧该感兴趣区域中的概率。
为了实现对起始帧感兴趣区域的确定。在一个可选的实施例中,对于视频图像序列的起始帧,可以基于预设规则在视频图像序列的起始帧设置至少一个感兴趣区域,以对起始帧的各个感兴趣区域进行预测得到预测结果。可选地,在对起始帧的感兴趣区域预测时,可参见上述实施例中当前帧的感兴趣区域的预测方式,在此不再赘述。在一个可选实施例中,可以利用例如区域提议网络(Region Proposal Network,RPN)设置起始帧的感兴趣区域,当然,在其它实施例中,还可以采用其它的网络提议设置起始帧的感兴趣区域。
为了实现在时域上对目标对象特征的记忆,减小因目标对象特征消失而导致发生跟踪失败的概率。在一个可选的实施例中,分别提取当前帧的感兴趣区域中的目标对象的特征包括:分别提取当前帧的感兴趣区域记忆的目标对象的特征轨迹,该特征轨迹可以包含当前帧的感兴趣区域中的目标对象的特征和当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。由此,在根据目标对象的特征对各个感兴趣区域进行预测时,可以通过当前帧的目标对象的特征轨迹对各个感兴趣区域进行预测,得到预测结果。本申请各实施例中,所称先前帧是指同一视频图像序列中,在检测时序上位于当前帧之前的图像帧或图像帧集,可以是沿时域上超前于当前帧的图像帧或图像帧集,即:先前帧可以是时域上超前于当前帧的一帧图像帧,也可以是超前于当前帧的若干图像帧构成的图像序列集合,另外,先前帧也可以是沿时域反向检测时在时域上位于当前帧之后的图像帧或图像帧集。在一个可选实施例中,请参考图2,在执行步骤S200之后,还可以包括:
步骤S510,基于预设时长记忆当前时刻对应的当前帧的至少一个感兴趣区域中目标对象的特征。
请参考图4,为本申请实施例中记忆模型一个实施例的结构示意,可选地,可以通过例如长短期记忆(Long Short-term Memory,LSTM)来实现,如图3中标记的LSTM,该模型可以通过记忆单元ct、ct-1、ct+1对各自对应的当前帧的特征(xt、xt-1、xt+1)进行记忆,其中,记忆单元ct记忆t时刻对应的当前帧的特征,ct-1记忆t-1时刻对应的当前帧的特征,ct+1记忆t+1时刻对应的当前帧的特征等等。在本本申请的各实施例中,可以通过遗忘门来实现预设时长的控制,作为一个例子,例如通过遗忘门ft-1来实现t-1时刻特征的记忆控制,通过遗忘门ft来实现t时刻特征的记忆控制,通过遗忘门ft+1来实现t+1时刻特征的记忆控制。本申请各实施例中,可以获取目标对象的姿态变化频率,而后,根据姿态变化频率调整预设时长的长短,以完成遗忘门对特征的记忆控制。可选地,当步骤S200提取的特征相对于之前帧的姿态变化显著时,可以关闭遗忘门,以实现更快地记忆当前帧的特征,实现特征的快速更新。
步骤S520,将记忆的至少一个感兴趣区域中目标对象的特征作为后续帧的记忆输入。
本申请的各实施例中,当前时刻的记忆单元可以将其记忆的特征传递到下一时刻的记忆单元,例如:请参考图4,ct-1传递至ct,ct传递至ct+1,从而,在时域上储存着轨迹的特征。需要说明的是,通过在时域上储存轨迹的特征,可以更有效地判断特征的姿态变化是否显著。在将记忆的各感兴趣区域中目标对象的特征作为后续帧的记忆输入之后,在后续帧确定感兴趣区域时,可以根据记忆输入的特征来判断目标对象的特征是否变化,由此,可以确定是否可以在时域上继承前一时刻记忆的特征。
本申请的各实施例中,由于前一时刻的记忆单元可以将其记忆的特征传递到下一时刻的记忆单元,因此,可以将先前帧记忆的目标对象的特征作为当前帧的特征进行记忆,从而能够减小因目标对象特征消失而导致发生跟踪失败的概率。
需要说明的是,在可选实施例中,请参考图4,可以通过输入门来控制各时刻记忆单元记忆的特征(如图3中t-1、t、t+1时刻分别对应的输入门it-1、it、it+1),输入门控制是否需要用当前的输入来改变记忆单元。因此,在当前帧对象遮挡和运动模糊的情况下可以关闭输入门,以记忆先前帧的特征,从而不影响时域上对目标对象特征的储存。
需要说明的是,在一个可选实施例中,还可以通过加入其它逻辑门结构来控制信息流向,请参考图4,例如输出门,如图3中t-1、t、t+1时刻分别对应的输出门ot-1、ot、ot+1来控制是否需要输出各时刻对应预测的输出特征ht-1、ht、ht+1,在跟踪失败时,可以关闭对应的输出门,对应的输出特征输出为空,即可停止后续时刻的跟踪。本实施例中,通过输出门的控制,在跟踪失败时,及时退出检测/跟踪,从而能够有效地减少系统运行负荷。
本申请实施例提供的任一种目标对象检测方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种目标对象检测方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种目标对象检测方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
另外,本实施例还公开了一种目标对象检测装置,本申请各实施例的目标对象检测装置可用于实现本申请上述各目标对象检测方法实施例。请参考图5,为该目标对象检测装置一个实施例的结构示意图,该实施例的目标对象检测装置包括:第一感兴趣模块100、特征提取模块200、预测模块300和第二感兴趣模块400,其中:
第一感兴趣模块100用于在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个感兴趣区域至少部分包含至少一个目标对象的信息;
特征提取模块200用于分别提取当前帧的上述至少一个感兴趣区域中的目标对象的特征;
预测模块300用于根据目标对象的特征对当前帧的上述至少一个感兴趣区域进行预测,得到预测结果;
第二感兴趣模块400用于根据当前帧的上述至少一个感兴趣区域的预测结果确定后续帧待检测的感兴趣区域。
在可选的实施例中,预测结果可以包括:感兴趣区域包含目标对象的概率和目标对象的预测位置。
在一个可选的实施例中,第二感兴趣模块400用于将当前帧的目标对象的预测位置作为后续帧的待检测的感兴趣区域。
在一个可选的实施例中,第二感兴趣模块400包括:位置获取单元,用于获取当前帧的感兴趣区域中目标对象在区域的边界位置;位置生成单元,用于对目标对象所在区域对应的边界位置进行加权,得到与感兴趣区域对应的后续帧的目标对象区域的边界位置。
在一个可选的实施例中,当前帧的感兴趣区域至少部分包含多类目标对象的信息;位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;位置生成单元用于对各类目标对象所在区域边界位置进行加权,得到与感兴趣区域对应的后续帧的目标对象区域的边界位置。
在一个可选的实施例中,当前帧的感兴趣区域至少部分包含多类目标对象的信息;位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;位置生成单元用于对各类目标对象所在区域边界位置和各类目标对象包含在当前帧的感兴趣区域中的概率进行加权,得到与感兴趣区域对应的后续帧的目标对象区域的边界位置。
在一个可选的实施例中,还包括:起始模块,用于基于预设规则在视频图像序列的起始帧设置至少一个感兴趣区域。
在一个可选的实施例中,特征提取模块200用于分别提取当前帧的感兴趣区域记忆的目标对象的特征轨迹。
在一个可选的实施例中,特征轨迹包含:当前帧的至少一个感兴趣区域中的目标对象的特征和当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。
在一个可选的实施例中,预测模块还用于通过当前帧的目标对象的特征轨迹对至少一个感兴趣区域进行预测,得到预测结果。
本申请实施例还公开了一种目标对象检测系统,包括:
图像获取装置,用于获取待检测视频图像的视频图像序列数据;处理器,接收待检测视频图像的视频图像序列数据,用于执行本申请上述任一实施例目标对象检测方法中的操作;存储器,用于存放至少一可执行指令,可执行指令使处理器执行本申请上述任一实施例目标对象检测方法目标对象检测对应的操作。
本实施例还公开了一种用于对象检测的神经网络结构,请参考图3,该神经网络结构包括:
级联的多层神经网络,各层神经网络用于接收视频图像序列中的一帧图像数据,用于对图像数据生成至少一个感兴趣区域,并对至少一个感兴趣区域进行目标对象检测得到预测结果,预测结果包括目标对象的位置;本层神经网络的预测结果作为下一层神经网络的输入,下一层神经网络根据本层神经网络的预测结果对下一层神经网络接收的图像数据生成多个感兴趣区域,并进行目标对象检测得到预测结果。
本申请实施例还公开了一种电子设备,包括:
处理器和本申请上述任一实施例的目标对象检测装置;
在处理器运行目标对象检测装置时,本申请上述任一实施例的目标对象检测装置中的单元被运行。
本申请实施例还公开了另一种电子设备,包括:
处理器和本申请上述任一实施例的目标对象检测系统;
在处理器运行目标对象检测系统时,本申请上述任一实施例的目标对象检测系统中的单元被运行。
本申请实施例还公开了又一种电子设备,包括:
处理器和本申请上述任一实施例的神经网络结构;
在处理器运行神经网络结构时,本申请上述任一实施例的神经网络结构中的单元被运行。
本申请实施例还公开了再一种电子设备,包括:一个或多个处理器、存储器、通信部件和通信总线,处理器、存储器和通信部件通过通信总线完成相互间的通信;
存储器用于存放至少一可执行指令,可执行指令使处理器执行如本申请上述任一实施例的目标对象检测方法对应的操作。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器、工控机(IPC)等。下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备600的结构示意图:如图6所示,计算机系统600包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)601,和/或 一个或多个图像处理器(GPU)613等,处理器可以根据存储在只读存储器(ROM)602中的可执行指令或者从存储部分608加载到随机访问存储器(RAM)603中的可执行指令而执行各种适当的动作和处理。通信部612可包括但不限于网卡,所述网卡可包括但不限于IB(InfiniBand)网卡,
处理器可与只读存储器602和/或随机访问存储器630中通信以执行可执行指令,通过总线604与通信部612相连、并经通信部612与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如:在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个感兴趣区域至少部分包含至少一个目标对象的信息的操作;分别提取当前帧的感兴趣区域中的目标对象的特征的操作;根据目标对象的特征对当前帧的各个感兴趣区域进行预测,得到预测结果的操作;根据当前帧的各个感兴趣区域的预测结果确定后续帧待检测的感兴趣区域的操作,等等。
此外,在RAM 603中,还可存储有装置操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。在有RAM603的情况下,ROM602为可选模块。RAM603存储可执行指令,或在运行时向ROM602中写入可执行指令,可执行指令使处理器601执行上述通信方法对应的操作。输入/输出(I/O)接口605也连接至总线604。通信部612可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
需要说明的,如图6所示的架构仅为一种可选实现方式,在可选实践过程中,可根据实际需要对上述图6的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
本申请实施例还提供了一种计算机程序,包括计算机可读代码,当该计算机可读代码在设备上运行时,该设备中的处理器执行用于实现本申请上述任一实施例的目标对象检测方法中各步骤的指令。特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机 软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个感兴趣区域至少部分包含至少一个目标对象的信息的指令;分别提取当前帧的感兴趣区域中的目标对象的特征的指令;根据目标对象的特征对当前帧的各个感兴趣区域进行预测,得到预测结果的指令;根据当前帧的各个感兴趣区域的预测结果确定后续帧待检测的感兴趣区域的指令,等等。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。
本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请上述任一实施例的目标对象检测方法中各步骤的操作。
可能以许多方式来实现本申请的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (28)

  1. 一种目标对象检测方法,其特征在于,包括如下步骤:
    在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个所述感兴趣区域至少部分包含至少一个目标对象的信息;
    分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征;
    根据所述目标对象的特征对所述当前帧的至少一个感兴趣区域进行预测,得到预测结果;
    根据所述预测结果确定后续帧待检测的感兴趣区域。
  2. 如权利要求1所述的目标对象检测方法,其特征在于,所述预测结果包括:所述感兴趣区域包含目标对象的概率和所述目标对象的预测位置。
  3. 如权利要求2所述的目标对象检测方法,其特征在于,所述根据所述预测结果确定后续帧待检测的感兴趣区域,包括:
    将所述当前帧的所述目标对象的预测位置作为所述后续帧的待检测的感兴趣区域。
  4. 如权利要求1-3任一所述的目标对象检测方法,其特征在于,所述根据所述预测结果确定后续帧待检测的感兴趣区域,包括:
    获取所述当前帧的感兴趣区域中目标对象在区域的边界位置;
    对目标对象所在区域对应的所述边界位置进行加权,得到与所述当前帧的感兴趣区域对应的后续帧的目标对象区域的边界位置。
  5. 如权利要求4所述的目标对象检测方法,其特征在于,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;
    所述获取所述当前帧的感兴趣区域中目标对象所在区域的边界位置,包括:分别获取感兴趣区域中各类目标对象所在区域边界位置;
    所述对目标对象所在区域对应的所述边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置,包括:对各类目标对象所在区域边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
  6. 如权利要求4所述的目标对象检测方法,其特征在于,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;
    所述获取所述当前帧的感兴趣区域中目标对象所在区域的边界位置,包括:分别获取感兴趣区域中各类目标对象所在区域边界位置;
    所述对目标对象所在区域对应的所述边界位置进行加权,得到与所述感兴趣区域对应 的后续帧的目标对象区域的边界位置,包括:
    对各类目标对象所在区域边界位置和各类目标对象包含在所述当前帧的感兴趣区域中的概率进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
  7. 如权利要求1-6任一所述的目标对象检测方法,其特征在于,还包括:基于预设规则在所述视频图像序列的起始帧设置所述至少一个感兴趣区域。
  8. 如权利要求1-7任一所述的目标对象检测方法,其特征在于,分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征包括:分别提取所述当前帧的至少一个感兴趣区域记忆的目标对象的特征轨迹。
  9. 如权利要求8所述的目标对象检测方法,其特征在于,所述特征轨迹包含:所述当前帧的至少一个感兴趣区域中的目标对象的特征和所述当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。
  10. 如权利要求8或9所述的目标对象检测方法,其特征在于,所述根据所述目标对象的特征对所述至少一个感兴趣区域进行预测,得到预测结果,包括:通过所述当前帧的目标对象的特征轨迹对所述至少一个感兴趣区域进行预测,得到预测结果。
  11. 一种目标对象检测装置,其特征在于,包括:
    第一感兴趣模块,用于在视频图像序列的当前帧确定至少一个待检测的感兴趣区域,每个所述感兴趣区域至少部分包含至少一个目标对象的信息;
    特征提取模块,用于分别提取所述当前帧的至少一个感兴趣区域中的目标对象的特征;
    预测模块,用于根据所述目标对象的特征对所述当前帧的至少一个感兴趣区域进行预测,得到预测结果;
    第二感兴趣模块,用于根据所述预测结果确定后续帧待检测的感兴趣区域。
  12. 如权利要求11所述的目标对象检测装置,其特征在于,所述预测结果包括:所述感兴趣区域包含目标对象的概率和所述目标对象的预测位置。
  13. 如权利要求12所述的目标对象检测装置,其特征在于,所述第二感兴趣模块用于将所述当前帧的所述目标对象的预测位置作为所述后续帧的待检测的感兴趣区域。
  14. 如权利要求11-13任意一项所述的目标对象检测装置,其特征在于,所述第二感兴趣模块包括:
    位置获取单元,用于获取所述当前帧的感兴趣区域中目标对象在区域的边界位置;
    位置生成单元,用于对目标对象所在区域对应的所述边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
  15. 如权利要求14所述的目标对象检测装置,其特征在于,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;
    所述位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;
    所述位置生成单元用于对各类目标对象所在区域边界位置进行加权,得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
  16. 如权利要求14项所述的目标对象检测装置,其特征在于,所述当前帧的感兴趣区域至少部分包含多个目标对象的信息;
    所述位置获取单元用于分别获取感兴趣区域中各类目标对象所在区域边界位置;
    所述位置生成单元用于对各类目标对象所在区域边界位置和各类目标对象包含在所述当前帧的感兴趣区域中的概率进行加权得到与所述感兴趣区域对应的后续帧的目标对象区域的边界位置。
  17. 如权利要求11-16任一所述的目标对象检测装置,其特征在于,还包括:
    起始模块,用于基于预设规则在视频图像序列的起始帧设置所述至少一个感兴趣区域。
  18. 如权利要求11-17任一所述的目标对象检测装置,其特征在于,所述特征提取模块用于分别提取所述当前帧的至少一个感兴趣区域记忆的目标对象的特征轨迹。
  19. 如权利要求18所述的目标对象检测装置,其特征在于,所述特征轨迹包含:当前帧的至少一个感兴趣区域中的目标对象的特征和当前帧的先前帧的感兴趣区域记忆的目标对象的特征轨迹。
  20. 如权利要求18或19所述的目标对象检测装置,其特征在于,所述预测模块还用于通过所述当前帧的目标对象的特征轨迹对所述至少一个感兴趣区域进行预测,得到预测结果。
  21. 一种目标对象检测系统,其特征在于,包括:
    图像获取装置,用于获取待检测视频图像的视频图像序列数据;
    处理器,用于接收所述待检测视频图像的视频图像序列数据,用于执行如权利要求1-10任一所述方法中的操作;
    存储器,用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-10任一所述方法对应的操作。
  22. 一种用于目标对象检测的神经网络结构,其特征在于,包括:
    级联的多层神经网络,各层神经网络用于接收视频图像序列中的一帧图像数据,用于对所述图像数据生成至少一个感兴趣区域,并对所述至少一个感兴趣区域进行目标对象检 测得到预测结果,所述预测结果包括所述目标对象的位置;
    本层神经网络的预测结果作为下一层神经网络的输入,所述下一层神经网络根据所述本层神经网络的预测结果对所述下一层神经网络接收的图像数据生成至少一个感兴趣区域,并进行目标对象检测得到预测结果。
  23. 一种电子设备,其特征在于,包括:
    处理器和权利要求10-20任一所述的目标对象检测装置;
    在处理器运行所述目标对象检测装置时,权利要求10-20任一所述的目标对象检测装置中的单元被运行。
  24. 一种电子设备,其特征在于,包括:
    处理器和权利要求21所述的目标对象检测系统;
    在处理器运行所述目标对象检测系统时,权利要求21所述的目标对象检测系统中的单元被运行。
  25. 一种电子设备,其特征在于,包括:
    处理器和权利要求22所述的神经网络结构;
    在处理器运行所述神经网络结构时,权利要求22所述的神经网络结构中的单元被运行。
  26. 一种电子设备,其特征在于,包括:一个或多个处理器、存储器、通信部件和通信总线,所述处理器、所述存储器和所述通信部件通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-10任一所述的目标对象检测方法对应的操作。
  27. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-10任一所述的目标对象检测方法中各步骤的指令。
  28. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-10任一所述的目标对象检测方法中各步骤的操作。
PCT/CN2017/110953 2016-11-15 2017-11-14 目标对象检测方法、装置及系统和神经网络结构 WO2018090912A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611013117.9A CN108073864B (zh) 2016-11-15 2016-11-15 目标对象检测方法、装置及系统和神经网络结构
CN201611013117.9 2016-11-15

Publications (1)

Publication Number Publication Date
WO2018090912A1 true WO2018090912A1 (zh) 2018-05-24

Family

ID=62146084

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/110953 WO2018090912A1 (zh) 2016-11-15 2017-11-14 目标对象检测方法、装置及系统和神经网络结构

Country Status (2)

Country Link
CN (1) CN108073864B (zh)
WO (1) WO2018090912A1 (zh)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246160A (zh) * 2019-06-20 2019-09-17 腾讯科技(深圳)有限公司 视频目标的检测方法、装置、设备及介质
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN110516528A (zh) * 2019-07-08 2019-11-29 杭州电子科技大学 一种基于运动背景下的动目标检测和跟踪方法
CN111127510A (zh) * 2018-11-01 2020-05-08 杭州海康威视数字技术股份有限公司 一种目标对象位置的预测方法及装置
CN111241340A (zh) * 2020-01-17 2020-06-05 Oppo广东移动通信有限公司 视频标签确定方法、装置、终端及存储介质
CN111353597A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
CN111582060A (zh) * 2020-04-20 2020-08-25 浙江大华技术股份有限公司 自动划线周界报警方法、计算机设备及存储装置
CN111860533A (zh) * 2019-04-30 2020-10-30 深圳数字生命研究院 图像的识别方法及装置、存储介质和电子装置
CN111986126A (zh) * 2020-07-17 2020-11-24 浙江工业大学 一种基于改进vgg16网络的多目标检测方法
CN112528932A (zh) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 用于优化位置信息的方法、装置、路侧设备和云控平台
CN113011398A (zh) * 2021-04-28 2021-06-22 北京邮电大学 一种针对多时相遥感图像的目标变化检测方法及装置
CN113538517A (zh) * 2019-06-25 2021-10-22 北京市商汤科技开发有限公司 目标追踪方法及装置、电子设备和存储介质
CN115719468A (zh) * 2023-01-10 2023-02-28 清华大学 图像处理方法、装置及设备

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108810538B (zh) * 2018-06-08 2022-04-05 腾讯科技(深圳)有限公司 视频编码方法、装置、终端及存储介质
CN108900804B (zh) * 2018-07-09 2020-11-03 南通世盾信息技术有限公司 一种基于视频熵的自适应视频流处理方法
US11514585B2 (en) * 2018-09-17 2022-11-29 Nokia Solutions And Networks Oy Object tracking
CN109948611B (zh) * 2019-03-14 2022-07-08 腾讯科技(深圳)有限公司 一种信息区域确定的方法、信息展示的方法及装置
CN112285111A (zh) * 2019-07-09 2021-01-29 株洲变流技术国家工程研究中心有限公司 一种受电弓前碳滑板缺陷检测方法、装置、系统和介质
CN110955243B (zh) * 2019-11-28 2023-10-20 新石器慧通(北京)科技有限公司 行进控制方法、装置、设备、可读存储介质和移动装置
CN111447449B (zh) * 2020-04-01 2022-05-06 北京奥维视讯科技有限责任公司 基于roi的视频编码方法和系统以及视频传输和编码系统
CN111626263B (zh) * 2020-06-05 2023-09-05 北京百度网讯科技有限公司 一种视频感兴趣区域检测方法、装置、设备及介质
CN112017155B (zh) * 2020-07-13 2023-12-26 浙江华锐捷技术有限公司 健康体征数据的测量方法、装置、系统和存储介质
CN112348894B (zh) * 2020-11-03 2022-07-29 中冶赛迪重庆信息技术有限公司 废钢货车位置及状态识别方法、系统、设备及介质
CN112733650B (zh) * 2020-12-29 2024-05-07 深圳云天励飞技术股份有限公司 目标人脸检测方法、装置、终端设备及存储介质
CN113723305A (zh) * 2021-08-31 2021-11-30 北京百度网讯科技有限公司 图像和视频检测方法、装置、电子设备和介质
CN115511818B (zh) * 2022-09-21 2023-06-13 北京医准智能科技有限公司 一种肺结节检出模型的优化方法、装置、设备及存储介质
CN116614631B (zh) * 2023-05-17 2024-03-19 北京百度网讯科技有限公司 视频处理方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295405A (zh) * 2008-06-13 2008-10-29 西北工业大学 人像与车辆识别报警跟踪方法
CN102214359A (zh) * 2010-04-07 2011-10-12 北京智安邦科技有限公司 基于层级式特征匹配的目标跟踪装置及方法
CN102646279A (zh) * 2012-02-29 2012-08-22 北京航空航天大学 一种基于运动预测与多子块模板匹配相结合的抗遮挡跟踪方法
CN103324977A (zh) * 2012-03-21 2013-09-25 日电(中国)有限公司 一种目标数量检测方法和设备
CN104200495A (zh) * 2014-09-25 2014-12-10 重庆信科设计有限公司 一种视频监控中的多目标跟踪方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739551B (zh) * 2009-02-11 2012-04-18 北京智安邦科技有限公司 运动目标识别方法及系统
CN101699862B (zh) * 2009-11-16 2011-04-13 上海交通大学 Ptz摄像机获取感兴趣区域高分辨率图像的方法
JP2012244437A (ja) * 2011-05-19 2012-12-10 Canon Inc 画像処理装置、画像処理方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295405A (zh) * 2008-06-13 2008-10-29 西北工业大学 人像与车辆识别报警跟踪方法
CN102214359A (zh) * 2010-04-07 2011-10-12 北京智安邦科技有限公司 基于层级式特征匹配的目标跟踪装置及方法
CN102646279A (zh) * 2012-02-29 2012-08-22 北京航空航天大学 一种基于运动预测与多子块模板匹配相结合的抗遮挡跟踪方法
CN103324977A (zh) * 2012-03-21 2013-09-25 日电(中国)有限公司 一种目标数量检测方法和设备
CN104200495A (zh) * 2014-09-25 2014-12-10 重庆信科设计有限公司 一种视频监控中的多目标跟踪方法

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127510A (zh) * 2018-11-01 2020-05-08 杭州海康威视数字技术股份有限公司 一种目标对象位置的预测方法及装置
CN111127510B (zh) * 2018-11-01 2023-10-27 杭州海康威视数字技术股份有限公司 一种目标对象位置的预测方法及装置
CN111353597A (zh) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
CN111353597B (zh) * 2018-12-24 2023-12-05 杭州海康威视数字技术股份有限公司 一种目标检测神经网络训练方法和装置
CN111860533B (zh) * 2019-04-30 2023-12-12 深圳数字生命研究院 图像的识别方法及装置、存储介质和电子装置
CN111860533A (zh) * 2019-04-30 2020-10-30 深圳数字生命研究院 图像的识别方法及装置、存储介质和电子装置
CN110246160B (zh) * 2019-06-20 2022-12-06 腾讯科技(深圳)有限公司 视频目标的检测方法、装置、设备及介质
CN110246160A (zh) * 2019-06-20 2019-09-17 腾讯科技(深圳)有限公司 视频目标的检测方法、装置、设备及介质
CN113538517B (zh) * 2019-06-25 2024-04-12 北京市商汤科技开发有限公司 目标追踪方法及装置、电子设备和存储介质
CN113538517A (zh) * 2019-06-25 2021-10-22 北京市商汤科技开发有限公司 目标追踪方法及装置、电子设备和存储介质
CN110516528A (zh) * 2019-07-08 2019-11-29 杭州电子科技大学 一种基于运动背景下的动目标检测和跟踪方法
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN111241340B (zh) * 2020-01-17 2023-09-08 Oppo广东移动通信有限公司 视频标签确定方法、装置、终端及存储介质
CN111241340A (zh) * 2020-01-17 2020-06-05 Oppo广东移动通信有限公司 视频标签确定方法、装置、终端及存储介质
CN111582060B (zh) * 2020-04-20 2023-04-18 浙江大华技术股份有限公司 自动划线周界报警方法、计算机设备及存储装置
CN111582060A (zh) * 2020-04-20 2020-08-25 浙江大华技术股份有限公司 自动划线周界报警方法、计算机设备及存储装置
CN111986126A (zh) * 2020-07-17 2020-11-24 浙江工业大学 一种基于改进vgg16网络的多目标检测方法
CN112528932A (zh) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 用于优化位置信息的方法、装置、路侧设备和云控平台
CN112528932B (zh) * 2020-12-22 2023-12-08 阿波罗智联(北京)科技有限公司 用于优化位置信息的方法、装置、路侧设备和云控平台
CN113011398A (zh) * 2021-04-28 2021-06-22 北京邮电大学 一种针对多时相遥感图像的目标变化检测方法及装置
CN115719468A (zh) * 2023-01-10 2023-02-28 清华大学 图像处理方法、装置及设备

Also Published As

Publication number Publication date
CN108073864B (zh) 2021-03-09
CN108073864A (zh) 2018-05-25

Similar Documents

Publication Publication Date Title
WO2018090912A1 (zh) 目标对象检测方法、装置及系统和神经网络结构
US11798271B2 (en) Depth and motion estimations in machine learning environments
US10796452B2 (en) Optimizations for structure mapping and up-sampling
US10733431B2 (en) Systems and methods for optimizing pose estimation
US10692243B2 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
WO2018153323A1 (zh) 用于检测视频中物体的方法、装置和电子设备
EP3509014A1 (en) Detecting objects in images
WO2019137104A1 (zh) 基于深度学习的推荐方法和装置、电子设备、介质、程序
KR20220062338A (ko) 스테레오 카메라들로부터의 손 포즈 추정
US10572072B2 (en) Depth-based touch detection
CN108399383B (zh) 表情迁移方法、装置存储介质及程序
WO2019105337A1 (zh) 基于视频的人脸识别方法、装置、设备、介质及程序
US10762644B1 (en) Multiple object tracking in video by combining neural networks within a bayesian framework
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
EP3493105A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
US20160343146A1 (en) Real-time object analysis with occlusion handling
US10037624B2 (en) Calibrating object shape
EP3493106A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
US20210166402A1 (en) Adaptive object tracking policy
CN110163171B (zh) 用于识别人脸属性的方法和装置
CN112949512B (zh) 一种动态手势识别方法、手势交互方法及交互系统
US11544498B2 (en) Training neural networks using consistency measures
Ait Abdelali et al. An adaptive object tracking using Kalman filter and probability product kernel
McLeod et al. Globally optimal event-based divergence estimation for ventral landing
JP2021089778A (ja) 情報処理装置、情報処理方法、及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17871713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 04.09.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17871713

Country of ref document: EP

Kind code of ref document: A1