WO2019080747A1 - 目标跟踪及神经网络训练方法、装置、存储介质、电子设备 - Google Patents

目标跟踪及神经网络训练方法、装置、存储介质、电子设备

Info

Publication number
WO2019080747A1
WO2019080747A1 PCT/CN2018/110433 CN2018110433W WO2019080747A1 WO 2019080747 A1 WO2019080747 A1 WO 2019080747A1 CN 2018110433 W CN2018110433 W CN 2018110433W WO 2019080747 A1 WO2019080747 A1 WO 2019080747A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
detected
neural network
sample image
Prior art date
Application number
PCT/CN2018/110433
Other languages
English (en)
French (fr)
Inventor
李博
武伟
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2019080747A1 publication Critical patent/WO2019080747A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the embodiments of the present invention relate to the field of artificial intelligence technologies, and in particular, to a target tracking method, apparatus, storage medium, and electronic device, and a neural network training method, apparatus, storage medium, and electronic device.
  • Target tracking technology is an important part of intelligent video surveillance technology. For a still image, only the position of the limit box of the object in the still image needs to be detected, but for the smart video, after detecting the position of the limit frame of the object in each frame of the video image, each frame of the video image is required.
  • the limit boxes of the medium objects are matched to determine the trajectory of the target object.
  • the embodiments of the present application provide a technical solution for target tracking and a technical solution for neural network training.
  • a target tracking method includes: acquiring, by using a first neural network, location data of the target object in the non-detected image according to a detected image and a non-detected image in a sequence of video frames containing a target object, the first neural network Retrieving a position of the target object in the non-detected image according to the detected image, the non-detected image being a subsequent image of the detected image; according to a position of the target object in the detected image
  • the data and the position data of the target object in the non-detected image determine a trajectory of the target object.
  • the acquiring, by using the first neural network, the location data of the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence of the target object including:
  • the first neural network acquires position data of the target object in the first non-detected image according to the detected image in the sequence of video frames and the first non-detected image after the detected image.
  • the method further includes: acquiring, by the first neural network, according to the first non-detected image in the sequence of video frames and the second non-detected image after the first non-detected image Position data of the target object in the second non-detected image.
  • the method further obtains, by using the first neural network, the location data of the target object in the non-detected image according to the detected image and the non-detected image in the sequence of video frames containing the target object.
  • the method includes: cutting the detection image and the non-detection image according to position data of the target object in the detection image, respectively, obtaining a first area image corresponding to the detection image and corresponding to the non-detection image a second area image, wherein the first area image and the second area image comprise the target object; the first neural network, according to the detected image and the non-detected image in the sequence of video frames containing the target object And acquiring the location data of the target object in the non-detection image, comprising: acquiring, by using the first neural network, the target object according to the first region image and the second region image that include the target object Location data in the second region image.
  • the method further includes: dividing the video frame sequence into multiple sets of video frames in a chronological order, each set of the video frames including at least one frame of video images; for the plurality of sets of video frames, starting from the first Acquiring location data of the target object in the frame video image, and acquiring location data of the target object in the video image subsequent to the first frame video image by using the first neural network, thereby obtaining the video frame in the group Position data of the target object of at least one frame of the video image; determining a trajectory of the target object according to position data of the target object of the at least one frame of the plurality of sets of video frames.
  • the acquiring the location data of the target object from the first frame video image comprises: acquiring, by using a second neural network for target location detection, location data of the target object from the first frame video image,
  • the second neural network includes a fast convolutional neural network.
  • the method further obtains, by using the first neural network, the location data of the target object in the non-detected image according to the detected image and the non-detected image in the sequence of video frames containing the target object.
  • the method includes: determining, according to the category of the target object, a first neural network corresponding to a category of the target object.
  • the method further obtains, by using the first neural network, the location data of the target object in the non-detected image according to the detected image and the non-detected image in the sequence of video frames containing the target object.
  • the method includes: training the first neural network according to a detected sample image and a non-detected sample image in a sequence of video frame samples containing a target object, wherein the non-detected sample image is a subsequent image of the detected sample image.
  • the training the first neural network according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object including: according to the first neural network to be trained, according to the video containing the target object Detecting a sample image and a non-detected sample image in a sequence of frame samples, acquiring position data of the target object in the non-detected sample image; and determining, according to the position data of the target object in the detected sample image, the non-detection Position data of the target object in the sample image, determining second positional offset data of the target object between the detection sample image and the non-detection sample image; according to the first position offset data and the The second positional offset data is used to train the first neural network, the first positional offset data being a standard positional offset of the target object between the detected sample image and the non-detected sample image.
  • the first neural network to be trained acquires a location of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object.
  • the method further includes: respectively cutting the detection sample image and the non-detection sample image according to the position data of the target object in the detection sample image, and obtaining a third corresponding to the detection sample image a region image and a fourth region image corresponding to the non-detected sample image, wherein the third region image and the fourth region image comprise the target object; the first neural network to be trained, according to Acquiring the detected sample image and the non-detected sample image in the video frame sample sequence of the target object, and acquiring the location data of the target object in the non-detected sample image, including: by using the first neural network to be trained, according to a third area image and a fourth area image containing the target object, and acquiring the target object in the The position of the data area in the image.
  • the first neural network to be trained includes a convolution layer, a splicing layer connected at an end of the convolution layer, and a fully connected layer connected at an end of the splicing layer, where the to-be-trained And acquiring, by the first neural network, the location data of the target object in the fourth area image according to the third area image and the fourth area image that are the target object, including: by using the convolution layer, Performing feature extraction on the third area image and the fourth area image to obtain a position feature vector of the target object in the third area image and the fourth area image; And merging the position feature vector of the target object in the third area image and the fourth area image to obtain a spliced position feature vector; and mapping the spliced position feature vector by using the fully connected layer Obtaining position data of the target object in the fourth area image.
  • the first neural network to be trained acquires a location of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object.
  • the method further comprises: determining the first positional offset data according to position data of the target object in the detected sample image and position calibration data of the target object in the non-detected sample image.
  • the location data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • a neural network training method includes: acquiring, by using a neural network to be trained, location data of the target object in the non-detected sample image according to a detected sample image and a non-detected sample image in a sequence of video frame samples containing a target object, Determining the sample image as a subsequent image of the detected sample image; determining the target according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image a second positional offset data of the object between the detected sample image and the non-detected sample image; training the neural network according to the first positional offset data and the second positional offset data, the A positional offset data is a standard positional offset of the target object between the detected sample image and the non-detected sample image.
  • the neural network to be trained before acquiring the location data of the target object in the non-detected sample image, according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object
  • the method further includes: cutting the detection sample image and the non-detection sample image respectively according to the position data of the target object in the detection sample image, and obtaining a third region image corresponding to the detection sample image And a fourth area image corresponding to the non-detected sample image, wherein the third area image and the fourth area image comprise the target object; the neural network to be trained, according to the target object Obtaining the detected sample image and the non-detected sample image in the video frame sample sequence, and acquiring the location data of the target object in the non-detected sample image, including: by using the neural network to be trained, according to the target object a third area image and a fourth area image, the target object being acquired in the fourth area image Location data.
  • the neural network to be trained comprises a convolution layer, a splicing layer connected at an end of the convolution layer, and a fully connected layer connected at an end of the splicing layer, wherein the neural network to be trained And acquiring, by the network, location data of the target object in the fourth area image according to the third area image and the fourth area image that are the target object, including: by using the convolution layer, the third Performing feature extraction on the area image and the fourth area image to obtain a position feature vector of the target object in the third area image and the fourth area image; and the third area image by the splicing layer And splicing the position feature vector of the target object in the fourth area image to obtain a spliced position feature vector; and performing mapping operation on the spliced position feature vector by using the fully connected layer to obtain the Position data of the target object in the fourth area image.
  • the neural network to be trained, before acquiring the location data of the target object in the non-detected sample image, according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object
  • the method further includes determining the first positional offset data based on position data of the target object in the detected sample image and position calibration data of the target object in the non-detected sample image.
  • the location data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • a target tracking device includes: a first acquiring module, configured to acquire, by using a first neural network, location data of the target object in the non-detected image according to a detected image and a non-detected image in a sequence of video frames containing a target object
  • the first neural network is configured to regress a position of the target object in the non-detected image according to the detected image, where the non-detected image is a subsequent image of the detected image; Determining a trajectory of the target object according to position data of the target object in the detected image and position data of the target object in the non-detected image.
  • the first obtaining module includes: a first acquiring submodule, configured, by using the first neural network, according to the detected image in the video frame sequence and the first non-before the detected image Detecting an image, acquiring position data of the target object in the first non-detected image.
  • the device further includes: a second acquiring module, configured, by using the first neural network, according to the first non-detected image in the sequence of video frames and the first after the first non-detected image Two non-detected images acquire position data of the target object in the second non-detected image.
  • a second acquiring module configured, by using the first neural network, according to the first non-detected image in the sequence of video frames and the first after the first non-detected image
  • Two non-detected images acquire position data of the target object in the second non-detected image.
  • the device before the first acquiring module, the device further includes: a first cropping module, configured to separately perform the detection image and the non-detected image according to the location data of the target object in the detected image Cutting, obtaining a first area image corresponding to the detection image and a second area image corresponding to the non-detection image, wherein the first area image and the second area image comprise the target object;
  • the first obtaining module includes: a second acquiring submodule, configured to acquire, by using the first neural network, the target object according to the first area image and the second area image that are the target object Location data in the two-region image.
  • the device further includes: a dividing module, configured to divide the video frame sequence into multiple sets of video frames according to chronological order, each set of the video frames includes at least one frame of video images; and a third acquiring module, And acquiring, for the plurality of sets of video frames, location data of the target object from a first frame video image, and acquiring, by using the first neural network, a location of a target object in a video image subsequent to the first frame video image. Data, thereby obtaining position data of a target object of at least one frame of the video image in the set of video frames; a second determining module, configured to determine a position of the target object according to at least one of the plurality of sets of video frames The data determines the trajectory of the target object.
  • a dividing module configured to divide the video frame sequence into multiple sets of video frames according to chronological order, each set of the video frames includes at least one frame of video images
  • a third acquiring module And acquiring, for the plurality of sets of video frames, location data of the target object from a first
  • the third acquiring module includes: a third acquiring submodule, configured to acquire, by using a second neural network for target location detection, location data of the target object from a first frame video image, where The second neural network includes a fast convolutional neural network.
  • the device before the first acquiring module, the device further includes: a selecting module, configured to determine, according to the category of the target object, a first neural network corresponding to a category of the target object.
  • a selecting module configured to determine, according to the category of the target object, a first neural network corresponding to a category of the target object.
  • the apparatus before the first acquiring module, further includes: a first training module, configured to train the first neural network according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object a network, the non-detected sample image being a subsequent image of the detected sample image.
  • a first training module configured to train the first neural network according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object a network, the non-detected sample image being a subsequent image of the detected sample image.
  • the first training module includes: a fourth acquiring submodule, configured to: according to the first neural network to be trained, according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object, Obtaining position data of the target object in the non-detected sample image; a first determining sub-module, configured to: according to the position data of the target object in the detected sample image and the target in the non-detected sample image Position data of the object, determining second position offset data of the target object between the detection sample image and the non-detection sample image; a first training sub-module, configured to offset data and the location according to the first position Depicting the second positional offset data, training the first neural network, the first positional offset data being a standard position offset of the target object between the detected sample image and the non-detected sample image .
  • a fourth acquiring submodule configured to: according to the first neural network to be trained, according to the detected sample image and the non-detected sample image in the
  • the device before the fourth acquiring sub-module, the device further includes: a first cropping sub-module, configured to separately detect the sample image and the image according to the location data of the target object in the detected sample image
  • the non-detected sample image is cropped to obtain a third region image corresponding to the detected sample image and a fourth region image corresponding to the non-detected sample image, wherein the third region image and the fourth region image
  • the fourth acquisition sub-module includes: an acquisition unit, configured to acquire, according to the third region image and the fourth region image that includes the target object, by using the first neural network to be trained Position data of the target object in the fourth area image.
  • the first neural network to be trained includes a convolution layer, a splicing layer connected at an end of the convolution layer, and a fully connected layer connected at an end of the splicing layer, and the acquiring unit is specifically used And performing feature extraction on the third area image and the fourth area image by using the convolution layer to obtain a position feature vector of the target object in the third area image and the fourth area image And splicing the position feature vector of the target object in the third area image and the fourth area image by using the splicing layer to obtain a spliced position feature vector;
  • the tiling position feature vector performs a mapping operation to obtain position data of the target object in the fourth region image.
  • the device before the fourth acquiring sub-module, the device further includes: a second determining sub-module, configured to detect, according to the location data of the target object in the sample image and the target object in the non-detected sample image The position calibration data determines the first position offset data.
  • a second determining sub-module configured to detect, according to the location data of the target object in the sample image and the target object in the non-detected sample image The position calibration data determines the first position offset data.
  • the location data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • a neural network training device includes: a fourth acquiring module, configured to acquire, by the neural network to be trained, the target object in the non-detected sample according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object a positional data in the image, the non-detected sample image being a subsequent image of the detected sample image; a third determining module, configured to: according to the position data of the target object and the non-detected sample in the detected sample image Position data of the target object in the image, determining second position offset data of the target object between the detection sample image and the non-detection sample image; and a second training module, configured to Transmitting data and the second positional offset data, training the neural network, the first positional offset data being a standard positional deviation of the target object between the detected sample image and the non-detected sample image Transfer amount.
  • the device before the fourth acquiring module, further includes: a second cropping module, configured to respectively detect the sample image and the non-detection according to the location data of the target object in the detected sample image The sample image is cropped to obtain a third region image corresponding to the detected sample image and a fourth region image corresponding to the non-detected sample image, wherein the third region image and the fourth region image include
  • the fourth acquisition module includes: a fifth acquisition submodule, configured to acquire, by using the neural network to be trained, the third region image and the fourth region image that include the target object Position data of the target object in the fourth area image.
  • the neural network to be trained includes a convolution layer, a splicing layer connected at an end of the convolution layer, and a fully connected layer connected at an end of the splicing layer, the fifth acquiring submodule, specifically And performing feature extraction on the third area image and the fourth area image by using the convolution layer to obtain position features of the target object in the third area image and the fourth area image.
  • the spliced position feature vector performs a mapping operation to obtain position data of the target object in the fourth region image.
  • the apparatus before the fourth acquiring module, the apparatus further includes: a fourth determining module, configured to perform calibration according to the location data of the target object in the detected sample image and the location of the target object in the non-detected sample image The data determines the first position offset data.
  • a fourth determining module configured to perform calibration according to the location data of the target object in the detected sample image and the location of the target object in the non-detected sample image The data determines the first position offset data.
  • the location data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions are executed by a processor to implement the first aspect of the embodiments of the present application.
  • a computer program product comprising: computer program instructions, wherein the program instructions are executed by a processor to implement the target tracking method according to the first aspect of the embodiments of the present application. The step of implementing the neural network training method described in the second aspect of the embodiment of the present application.
  • a computer program comprising computer readable code, the processor in the device executing the implementation of the present application when the computer readable code is run on a device An instruction of each step in the object attribute detecting method described in the first aspect; or
  • the processor in the device executes instructions for implementing the steps in the neural network training method described in the second aspect of the embodiments of the present application.
  • an electronic device includes: a processor and a memory
  • the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the target tracking method according to the first aspect of the embodiment of the present application; or the memory is configured to store at least one An executable instruction that causes the processor to perform an operation corresponding to the neural network training method of the second aspect of the embodiments of the present application.
  • an electronic device including: a processor and a target tracking device according to the third aspect of the embodiment of the present application; when the processor runs the target tracking device, the application is implemented.
  • the module in the target tracking device of the third aspect is operated; or
  • the processor and the neural network training device according to the fourth aspect of the present application; when the processor runs the training device of the neural network, the module in the neural network training device according to the fourth aspect of the present application is operated. .
  • the first neural network for retrieving the position of the target object in the non-detected image according to the detected image is obtained according to the detected image and the non-detected image in the video frame sequence containing the target object.
  • the embodiment of the present application can regress the position of the target object in the non-detected image according to the detected image, and improve the detection efficiency of the target tracking while improving the accuracy of the target tracking.
  • FIG. 1 is a schematic flow chart of an embodiment of a target tracking method according to an embodiment of the present application.
  • FIG. 2 is a flow chart of another embodiment of a target tracking method according to an embodiment of the present application.
  • FIG. 3 is a schematic flow chart of an embodiment of a neural network training method according to an embodiment of the present application.
  • FIG. 4 is a flow chart showing another embodiment of a neural network training method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an object tracking apparatus according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of another embodiment of a target tracking device according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of still another embodiment of a target tracking device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a neural network training apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another embodiment of a neural network training apparatus according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application.
  • Embodiments of the present application can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a schematic flow chart of an embodiment of a target tracking method according to an embodiment of the present application.
  • the method may be performed by any target tracking device, such as a terminal device, a server, a mobile device, etc., which is not limited in this embodiment of the present application.
  • the target tracking method of this embodiment includes the following steps:
  • step S101 position data of the target object in the non-detected image is acquired by the first neural network according to the detected image and the non-detected image in the sequence of video frames containing the target object.
  • the first neural network is configured to return the position of the target object in the non-detected image according to the detected image.
  • Target objects may include, but are not limited to, vehicles, pedestrians, drones, and the like.
  • the position data of the target object in the image may include, but is not limited to, vertex coordinates and center position coordinates of the limit frame of the target object.
  • the limit box of the target object may be a square or a rectangle.
  • the vertex coordinates of the limit box of the target object may be the coordinates of the point where the four corners of the square are located.
  • the detected image may be an image in which a position of the target object is detected by using a detector in a sequence of video frames, and the non-detected image may be a subsequent image of the detected image, and the target object is not detected by the detector.
  • the detected image and the non-detected image may be adjacent video images in a sequence of video frames, or may be video images that are not adjacent in the sequence of video frames, that is, video images that are separated from each other between the detected image and the non-detected image.
  • the step S101 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by the first obtaining module 501 being executed by the processor.
  • step S102 the trajectory of the target object is determined based on the position data of the target object in the detected image and the position data of the target object in the non-detected image.
  • the position data of the target object in the detected image is determined in advance, and the first neural network is not required to acquire.
  • the position data of the target object in the detected image may be detected in advance by a neural network for target position detection.
  • a neural network for target position detection may be used to detect the position data of the target object in the detected image in advance, which is not limited in the embodiment of the present application.
  • the position data of the target object in the non-detected image is obtained from the detected image and the non-detected image through the first neural network.
  • step S102 may be performed by the processor invoking a corresponding instruction stored in the memory or by the first determining module 502 being executed by the processor.
  • the video frame sequence containing the target object includes a plurality of frames of video images. Since the position data of the target object in the preceding detection image and the position data of the target object in the subsequent non-detection image are known, the position data of the target object in each frame of the video frame sequence can be obtained. . The trajectory of the target object can be determined based on the position data of the target object in each frame of the video frame sequence.
  • the first neural network for retrieving the position of the target object in the non-detected image according to the detected image is obtained according to the detected image and the non-detected image in the video frame sequence containing the target object.
  • the embodiment of the present application can regress the position of the target object in the non-detected image according to the detected image, and improve the detection efficiency of the target tracking while improving the accuracy of the target tracking.
  • the target tracking method of this embodiment may be performed by any suitable device having image or data processing capability, including but not limited to: camera, terminal, mobile terminal, PC, server, in-vehicle device, entertainment device, advertising device, personal digital Assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • PDAs personal digital Assistants
  • tablets laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • the target tracking method of this embodiment includes the following steps:
  • step S201 the detected image and the non-detected image are respectively cropped based on the position data of the target object in the detected image, and a first region image corresponding to the detected image and a second region image corresponding to the non-detected image are obtained.
  • the position data of the target object may include, but is not limited to, a length, a width, and a center position coordinate of the limit frame of the target object.
  • the first area image and the second area image contain a target object.
  • the cropping position data of the obtained image may first be determined according to the position data of the target object in the detected image.
  • the coordinate of the center position of the crop frame is the same as the coordinate of the center position of the limit frame of the target object, and the length and width of the limit frame of the target object are expanded according to a certain ratio to obtain the length of the crop frame and Width to get the cropped position data of the image.
  • the detected image and the non-detected image may be respectively cropped according to the cropping position data of the image, and the first region image corresponding to the detected image and the second region image corresponding to the non-detected image are obtained.
  • the reason why the detected image and the non-detected image are cropped is because the number of frames of the video image separated between the detected image and the non-detected image is usually small, for example, between 0 and 3, then the target object is in the non-detected image.
  • the position of the target object is also small with respect to the position of the target object in the detected image, and the position of the limit frame of the target object in the non-detected image falls within the cropping frame of the non-detected image.
  • the data processing amount of the first neural network can be alleviated, and the first neural network can quickly return the target object position of the subsequent non-detected image in the video frame sequence based on the target object position of the preceding detection image in the video frame sequence.
  • the crop position data of the image is determined based on the position data of the target object in the detected image
  • the position data of the target object in the detected image is implicit in the cropped detection image (first area image).
  • the length, width, and center position coordinates of the limit frame of the target object in the first area image may be determined according to the center position coordinates, the length, and the width of the first area image.
  • step S201 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first cropping module 601 being executed by the processor.
  • step S202 position data of the target object in the second area image is acquired by the first neural network according to the first area image and the second area image containing the target object.
  • the first neural network is configured to return the position of the target object in the second region image according to the first region image.
  • the position data of the target object in the second area image may include, but is not limited to, the length, width, and center position coordinates of the limit frame of the target object.
  • the step S202 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second acquisition sub-module 6022 being executed by the processor.
  • the position data of the target object in the non-detection image is acquired by the first neural network according to the detected image and the non-detected image in the video frame sequence containing the target object.
  • acquiring, by using the first neural network, location data of the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence of the target object including: The neural network acquires position data of the target object in the first non-detected image according to the detected image in the sequence of video frames and the first non-detected image after the detected image. Thereby, the position data of the target object in the first non-detection image after the detection of the image can be predicted very accurately.
  • the method of the embodiment of the present application further includes, by using the first neural network, according to the first non-detected image in the video frame sequence and the second after the first non-detected image.
  • the non-detected image acquires position data of the target object in the second non-detected image.
  • the position data of the target object in the second non-detection image after the first non-detection image can be relatively accurately predicted.
  • the detection image and the first non-detection image may be adjacent video images in the video frame sequence, or may be non-adjacent video images in the video frame sequence, that is, the detection image and the first non-detection image are separated from each other.
  • the first non-detected image and the second non-detected image may be adjacent video images in the video frame sequence, or may be non-adjacent video images in the video frame sequence, that is, the first non-detected image and the second non-detected image. There are separate video images.
  • the first neural network returns a higher accuracy of the position of the target object in the first non-detected image after detecting the image according to the detected image, and the first neural network returns the target object in the first non-detected image according to the first non-detected image.
  • the accuracy of the position in the subsequent second non-detected image is low.
  • the method of the embodiment of the present application further includes: determining, according to the category of the target object, a first neural network corresponding to the category of the target object.
  • the accuracy of the target tracking can be further improved.
  • the corresponding first neural network may be separately trained for different categories of the target object.
  • a corresponding first neural network may be separately trained for a vehicle moving faster, and a corresponding first neural network may be separately trained for a slow moving vehicle, thereby further improving the accuracy of target vehicle tracking.
  • Step S203 determining a trajectory of the target object according to position data of the target object in the first area image and position data of the target object in the second area image.
  • the position data of the target object in the image of the first region is the position data of the target object in the detected image, because the image of the first region is obtained by cutting the detected image.
  • the position data of the target object in the second area image is obtained from the first area image and the second area image through the first neural network.
  • the step S203 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second determining module 607 executed by the processor.
  • the video frame sequence containing the target object includes a plurality of frames of video images. Since the position data of the target object in the image of the first area and the position data of the target object in the image of the second area are known, correspondingly, the position data of the target object in the preceding detected image and the target object are behind The position data in the non-detected image is also known, and the position data of the target object in each frame of the video frame sequence can be obtained. The trajectory of the target object can be determined based on the position data of the target object in each frame of the video frame sequence.
  • some optional embodiments of the present application further include: dividing, in chronological order, a video frame sequence into multiple sets of video frames, each set of video frames including at least one frame of video images; and for multiple sets of video frames, from Obtaining position data of the target object in the first frame video image, and acquiring position data of the target object in the video image subsequent to the first frame video image through the first neural network, thereby obtaining a target object of at least one frame of the video image in the group of video frames Position data; determining a trajectory of the target object based on position data of the target object of at least one frame of the video image in the plurality of sets of video frames.
  • the accuracy of the target tracking can be further improved.
  • the acquiring the location data of the target object from the first frame video image includes: acquiring the location data of the target object from the first frame video image by using the second neural network for the target location detection.
  • the second neural network comprises a Fast Convolutional Neural Network (Faster Region with CNN, Faster RCNN).
  • Obtaining, by using the first neural network, location data of the target object in the subsequent video image of the first frame video image comprising: acquiring, by using the first neural network, the location of the target object in the subsequent video image according to the first frame video image and the subsequent video image data.
  • each set of video frames includes four frames of video images.
  • the first frame of the video image is a key frame, and the second neural network is required to detect the position data of the target object from the first frame video image.
  • the subsequent three frames of the video image of the first frame video image need the first neural network according to the first frame video image and subsequent
  • the video image returns the position data of the target object in the subsequent video image. Therefore, the video can be detected in a segmented form.
  • the first frame is a key frame, and several video frames following the first frame are returned, so that the overall detection time of one segment and the prior art The detection time of the frame video image is close, which makes the response time of the target tracking shorter.
  • the first neural network needs to be trained before acquiring the position data of the target object in the second region image according to the first region image and the second region image containing the target object through the first neural network.
  • the first neural network is trained according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object, and the non-detected sample image is the subsequent image of the detected sample image.
  • training the first neural network according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object including: by using the first neural network to be trained, according to the target object
  • the detected sample image and the non-detected sample image in the video frame sample sequence acquire position data of the target object in the non-detected sample image; and according to the position data of the target object in the detected sample image and the position of the target object in the non-detected sample image Data, determining second position offset data between the detected sample image and the non-detected sample image of the target object; and training the first neural network according to the first position offset data and the second position offset data, the first positional deviation
  • the shift data is a standard position offset between the detected sample image and the non-detected sample image of the target object.
  • the standard position offset is measured according to the actual position of the target object in the detected sample image and the non-detected sample image.
  • the position of the target object in the non-detected sample image is obtained according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object through the first neural network to be trained.
  • the method of the embodiment further comprises: cutting the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image, respectively obtaining the third region image corresponding to the detected sample image and the non-detecting sample.
  • a fourth area image corresponding to the image, wherein the third area image and the fourth area image comprise a target object.
  • acquiring, by the first neural network to be trained, location data of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object including: The first neural network acquires position data of the target object in the image of the fourth region according to the third region image and the fourth region image containing the target object.
  • the first neural network to be trained includes a convolution layer, a splicing layer connected at the end of the convolution layer, and a fully connected layer connected at the end of the splicing layer, wherein the first neural network is passed through Obtaining, according to the third area image and the fourth area image of the target object, the position data of the target object in the fourth area image, comprising: performing feature extraction on the third area image and the fourth area image by using the convolution layer, Obtaining a position feature vector of the target object in the third region image and the fourth region image; and stitching the position feature vector of the target object in the third region image and the fourth region image by the stitching layer to obtain the stitched position feature vector; Through the fully connected layer, the stitched position feature vector is mapped to obtain the position data of the target object in the fourth region image.
  • the position of the target object in the non-detected sample image is obtained according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object through the first neural network to be trained.
  • the embodiment method further includes determining the first position offset data based on the position data of the target object in the detected sample image and the position calibration data of the target object in the non-detected sample image.
  • the target tracking method provided by the embodiment of the present invention can not only improve the speed of the target tracking but also ensure the accuracy of the target tracking, compared with the method for detecting each frame of the video frame in the video frame sequence in the prior art. Compared with the method for performing frame skipping detection on a video frame sequence in the prior art, the position information of the target object of each frame of the video frame sequence can be comprehensively utilized, and the target tracking accuracy is higher.
  • the target tracking method provided by the embodiment of the present application can ensure that the position data of the object in the obtained non-detected image is in a one-to-one relationship with the target object, without obtaining a video of each frame in the video frame sequence. After the object position data of the image, the position data of the target object in each frame of the video image is obtained by matching the position data of the object in each frame of the video image, thereby obtaining the trajectory of the target object.
  • the target tracking method provided by the embodiment of the present application can be applied to an actual scenario. For example, on a real-time traffic road, if the traffic management department wants to confirm the running trajectory of the vehicle through target tracking, and cannot pay an expensive equipment fee for each surveillance camera, the method provided by the embodiment of the present application provides The target tracking method can realize that one device can track several or even dozens of surveillance cameras in real time, which reduces the cost of target tracking.
  • the detected image and the non-detected image are respectively cropped according to the position data of the target object in the detected image, and the first region image corresponding to the detected image and the second corresponding to the non-detected image are obtained.
  • the application embodiment can regress the position of the target object in the second region image according to the first region image, and improve the detection efficiency of the target tracking while improving the accuracy of the target tracking.
  • the target tracking method of this embodiment may be performed by any suitable device having image or data processing capability, including but not limited to: camera, terminal, mobile terminal, PC, server, in-vehicle device, entertainment device, advertising device, personal digital Assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • PDAs personal digital Assistants
  • tablets laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • FIG. 3 is a schematic flow chart of an embodiment of a neural network training method according to an embodiment of the present application.
  • the method may be performed by any neural network training device, such as a terminal device, a server, a mobile device, etc., which is not limited in this embodiment of the present application.
  • the neural network training method of this embodiment includes the following steps:
  • step S301 the position data of the target object in the non-detected sample image is acquired according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object by the neural network to be trained.
  • the neural network may be any suitable neural network that can implement feature extraction or target object detection, including but not limited to convolutional neural networks, enhanced learning neural networks, generation networks in anti-neural networks, and the like.
  • the configuration of the structure in the neural network can be appropriately set by a person skilled in the art according to actual needs, such as the number of layers of the convolution layer, the size of the convolution kernel, the number of channels, and the like, which is not limited in the embodiment of the present application.
  • the target object may include a vehicle, a pedestrian, a drone, and the like.
  • the position data of the target object in the sample image may include vertex coordinates and center position coordinates of the limit frame of the target object.
  • the limit box of the target object may be a square or a rectangle.
  • the vertex coordinates of the limit box of the target object may be the coordinates of the point where the four corners of the rectangle are located.
  • the detected sample image may be an image in which a position of the target object is detected by using a detector in a sequence of video frame samples, and the non-detected sample image may be a subsequent image in which the sample image is detected, and the non-utilization detector is used. An image of the position of the target object is detected.
  • the detected sample image and the non-detected sample image may be adjacent video images in the video frame sample sequence, or may be non-adjacent video images in the video frame sample sequence, that is, the detected sample image and the non-detected sample image are separated from each other. Video image.
  • the effect is better, not only to select adjacent detection sample images and non-detection sample images, but also to select non-adjacent detection sample images and non-detection sample images for training.
  • the obtained neural network can acquire the position of the target object in the sample image with a larger target position change, that is, the trained neural network can more accurately acquire the current video frame image according to the target object position in the video images of the past several frames.
  • the position of the object rather than the position of the object in the current video frame image, can only be obtained from the position of the target object in the previous frame of the video image.
  • the step S301 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the fourth obtaining module 801 being executed by the processor.
  • step S302 second position offset data of the target object between the detected sample image and the non-detected sample image is determined based on the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image.
  • detecting the position data of the target object in the sample image is determined in advance, and does not need to be acquired by the neural network to be trained.
  • the position data of the target object in the detected sample image may be detected in advance by a neural network for target position detection.
  • a neural network for target position detection may be used to detect the position data of the target object in the image of the sample in advance, which is not limited in this embodiment of the present application.
  • the position data of the target object in the non-detected sample image is obtained by the neural network to be trained, based on the detected sample image and the non-detected sample image.
  • the position data of the target object in the non-detected sample image may be subtracted from the position data of the target object in the detected sample image to obtain a second position between the detected sample image and the non-detected sample image. Offset data.
  • step S302 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a third determining module 802 executed by the processor.
  • step S303 the neural network is trained based on the first positional offset data and the second positional offset data.
  • the first positional offset data is a standard position offset between the detected sample image and the non-detected sample image of the target object.
  • the first positional offset data is determined according to the position of the target object in the detected sample image and the labeled position of the target object in the non-detected sample image, and can be used as a supervised amount of neural network training.
  • the step S303 may include: determining a position difference of the target object according to the first position offset data and the second position offset data, and then adjusting a network parameter of the neural network according to the position difference of the target object. By calculating the position difference of the target object, the currently obtained second position offset data is evaluated as a basis for subsequent training of the neural network.
  • the step S303 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second training module 803 executed by the processor.
  • the positional difference of the target object may be transmitted back to the neural network to iteratively train the neural network.
  • the training of the neural network is an iterative process.
  • the embodiment of the present application only describes one of the training processes, but those skilled in the art should understand that the training mode can be used for each training of the neural network until the completion of the training.
  • the training of neural networks are only described one of the training processes, but those skilled in the art should understand that the training mode can be used for each training of the neural network until the completion of the training.
  • An exemplary embodiment of the present application is directed to a training method for a neural network, by which a target object is acquired based on a detected sample image and a non-detected sample image in a sequence of video frame samples containing a target object through a neural network to be trained. Detecting position data in the sample image, and determining a second positional deviation between the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image. Transmitting the data, and training the neural network according to the standard position offset between the detected sample image and the non-detected sample image and the second position offset data, and the trained nerve is compared with the prior art. The network can regress the target object position of the subsequent video image in the sequence of video frames based on the target object position of the previous video image in the sequence of video frames.
  • the training method of the neural network of this embodiment may be performed by any suitable device having image or data processing capability, including but not limited to: a camera, a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, Personal digital assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • a camera a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, Personal digital assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • the training method of the neural network of this embodiment includes the following steps:
  • step S401 the detected sample image and the non-detected sample image are respectively cropped according to the position data of the target object in the detected sample image, and the third region image corresponding to the detected sample image and the fourth corresponding to the non-detected sample image are obtained. Area image.
  • the position data of the target object may include a length, a width, and a center position coordinate of the limit frame of the target object.
  • the third area image and the fourth area image contain the target object.
  • the cropping position data of the sample image is first determined according to the position data of the target object in the detected sample image.
  • the coordinate of the center position of the crop frame is the same as the coordinate of the center position of the limit frame of the target object, and the length and width of the limit frame of the target object are expanded according to a certain ratio to obtain the length of the crop frame and Width to get the crop position data of the sample image.
  • the detection sample image and the non-detection sample image may be respectively cropped according to the cropping position data of the sample image, and the third region image corresponding to the detected sample image is obtained and corresponding to the non-detected sample image.
  • the fourth area image is first determined according to the position data of the target object in the detected sample image.
  • the reason why the detected sample image and the non-detected sample image are cropped is because the number of frames of the video image separated between the detected sample image and the non-detected sample image is usually small, for example, between 0 and 3, then the target object is The position of the non-detected sample image relative to the position of the target object in the detected sample image is also small, and the position of the limit frame of the target object in the non-detected sample image falls within the cropping frame of the non-detected sample image.
  • the data processing amount of the neural network can be alleviated, so that the trained neural network can quickly return the target object position of the subsequent video image in the video frame sequence based on the target object position of the previous video image in the video frame sequence.
  • the crop position data of the sample image is determined based on the position data of the target object in the detected sample image
  • the position data of the target object in the detected sample image is implicit in the cropped detection sample image (third region image) in.
  • the length, the width, and the center position coordinate of the limit frame of the target object in the third area image may be determined according to the center position coordinate, the length, and the width of the third area image.
  • step S401 can be performed by the processor invoking a corresponding instruction stored in the memory or by the second cropping module 902 being executed by the processor.
  • step S402 the position data of the target object in the fourth area image is acquired according to the third area image and the fourth area image containing the target object through the neural network to be trained.
  • the neural network to be trained has a convolution layer, a splicing layer connected at the end of the convolution layer, and a fully connected layer connected at the end of the splicing layer.
  • the neural network has six consecutive convolution layers, so that the trained neural network quickly returns the target object position of the subsequent video image in the video frame sequence based on the target object position of the previous video image in the video frame sequence.
  • the neural network does not use a pooling layer.
  • the neural network to be trained has two inputs and one output, one input for inputting the third region image, the other input for inputting the fourth region image, and the output terminal for outputting the target object Position data in the fourth area image.
  • the step S402 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a fifth acquisition sub-module 9031 that is executed by the processor.
  • acquiring, by the neural network to be trained, location data of the target object in the fourth area image according to the third area image and the fourth area image containing the target object including: by convolving layer, the third area Performing feature extraction on the image and the fourth region image to obtain a position feature vector of the target object in the third region image and the fourth region image; and performing position feature vectors of the target object in the third region image and the fourth region image through the mosaic layer After splicing, the spliced position feature vector is obtained; the tiling position feature vector is mapped by the fully connected layer, and the position data of the target object in the fourth region image is obtained.
  • step S403 determining second positional offset data of the target object between the third area image and the fourth area image according to the position data of the target object in the third area image and the position data of the target object in the fourth area image.
  • the position data of the target object in the image of the third region is the position data of the target object in the detected sample image, because the image of the third region is obtained by cutting the image of the detected sample.
  • the position data of the target object in the fourth area image is obtained by the neural network to be trained according to the third area image and the fourth area image.
  • step S403 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a third determining module 904 that is executed by the processor.
  • the second positional offset data is an offset of the position of the target object in the non-detected sample image relative to the position of the target object in the detected sample image.
  • the position data of the target object in the third area image may be subtracted from the position data of the target object in the fourth area image to obtain a second positional offset between the third area image and the fourth area image of the target object. data.
  • the second position offset data includes the amount of change of the center position coordinate of the limit frame of the target object and the limit frame of the target object. The amount of change in length and width.
  • step S404 the neural network is trained based on the first positional offset data and the second positional offset data.
  • the first position offset data is a standard position offset between the detected sample image and the non-detected sample image of the target object, that is, the first position offset data is the target object in the third region image and the fourth region.
  • the method of the embodiment further comprises: detecting the position data of the target object and the non-detecting sample according to the detected sample image.
  • the position calibration data of the target object in the image determines the first position offset data.
  • the step S404 may be performed by a processor invoking a corresponding instruction stored in the memory, or may be performed by a second training module 905 executed by the processor.
  • the location data of the target object in the sample image may be determined in advance, and the neural network to be trained is not required to acquire.
  • the position data of the target object in the detected sample image may be detected in advance by a neural network for target position detection.
  • a neural network for target position detection may be used to detect the position data of the target object in the image of the sample in advance, which is not limited in this embodiment of the present application.
  • the position calibration data of the target object in the non-detected sample image may also be determined in advance.
  • the position calibration data of the target object in the non-detected sample image may be detected in advance by a neural network for target position detection.
  • the position of the limit frame of the target object in the non-detected sample image may also be calibrated by a manual calibration method, thereby obtaining position calibration data of the target object in the non-detected sample image.
  • a manual calibration method thereby obtaining position calibration data of the target object in the non-detected sample image.
  • other embodiments may be used to obtain the position calibration data of the target object in the non-detected sample image in advance, which is not limited in the embodiment of the present application.
  • the first positional offset data is an offset of the nominal position of the target object in the non-detected sample image relative to the position of the target object in the detected sample image.
  • the position calibration data of the target object in the non-detected sample image may be subtracted from the position data of the target object in the detected sample image, and the first positional offset data between the detected sample image and the non-detected sample image is obtained.
  • the first position offset data includes the amount of change of the center position coordinate of the limit frame of the target object and the limit frame of the target object. The amount of change in length and width.
  • the following methods may also be used to train the neural network. For example, first, through the neural network to be trained, the position data of the target object in the non-detected sample image is acquired according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object, wherein the non-detected sample image contains The position of the target object is calibrated; then, the neural network is trained based on the position data of the target object in the non-detected sample image and the position calibration data of the target object contained in the non-detected sample image.
  • the following methods may also be used to train the neural network. For example, first, the detection sample image and the non-detection sample image are respectively cropped according to the position data of the target object in the detection sample image, and the third region image corresponding to the detection sample image and the fourth region image corresponding to the non-detection sample image are obtained. And acquiring, by the neural network to be trained, location data of the target object in the fourth region image according to the third region image and the fourth region image containing the target object, wherein the fourth region image contains the location calibration data of the target object . Then, the neural network is trained according to the position data of the target object in the fourth area image and the position calibration data of the target object contained in the fourth area image.
  • An exemplary embodiment of the present application is directed to a training method for a neural network, which respectively clips a detected sample image and a non-detected sample image according to position data of a target object in the detected sample image, and obtains a corresponding image corresponding to the detected sample image.
  • the neural network to be trained, the position of the target object in the fourth region image according to the third region image and the fourth region image containing the target object Data, according to the position data of the target object in the image of the third region and the position data of the target object in the image of the fourth region, determining the second positional offset data between the image of the third region and the image of the fourth region, and then
  • the neural network is trained according to the standard position offset and the second position offset data between the third area image and the fourth area image of the target object, so that the trained neural network can be based on the previous video image in the video frame sequence.
  • the target object position quickly returns to the target object of the subsequent video image in the sequence of video frames Home.
  • the training method of the neural network of this embodiment may be performed by any suitable device having image or data processing capability, including but not limited to: a camera, a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, Personal digital assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • a camera a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, Personal digital assistants (PDAs), tablets, laptops, handheld game consoles, smart glasses, smart watches, wearables, virtual display devices or display enhancement devices (such as Google Glass, Oculus Rift, Hololens, Gear VR).
  • FIG. 5 is a schematic structural diagram of an embodiment of a target tracking device according to an embodiment of the present application.
  • the device of this embodiment can be used to perform any of the foregoing target tracking methods in the embodiments of the present application.
  • the target tracking device includes a first obtaining module 501 and a first determining module 502.
  • the first obtaining module 501 is configured to acquire, by using the first neural network, location data of the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence containing the target object, where the first neural network is used according to The detected image regresses the position of the target object in the non-detected image, and the non-detected image is a subsequent image of the detected image.
  • the first determining module 502 is configured to determine a trajectory of the target object according to the position data of the target object in the detected image and the position data of the target object in the non-detected image.
  • the first neural network for retrieving the position of the target object in the non-detected image according to the detected image is obtained according to the detected image and the non-detected image in the video frame sequence containing the target object.
  • the embodiment of the present application can regress the position of the target object in the non-detected image according to the detected image, and improve the detection efficiency of the target tracking while improving the accuracy of the target tracking.
  • FIG. 6 is a schematic structural diagram of another embodiment of a target tracking device according to an embodiment of the present application.
  • the device of this embodiment can be used to perform any of the foregoing target tracking methods in the embodiments of the present application.
  • the target tracking device includes a first obtaining module 602 and a first determining module 603.
  • the first obtaining module 602 is configured to acquire, by using the first neural network, location data of the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence of the target object, where the first neural network is used. And returning the position of the target object in the non-detected image according to the detected image, the non-detected image is a subsequent image of the detected image; the first determining module 603 is configured to detect the position data of the target object according to the target object and the target object is not detected. The position data in the image determines the trajectory of the target object.
  • the first obtaining module 602 includes: a first acquiring submodule 6021, configured to acquire, by using the first neural network, the target object according to the detected image in the video frame sequence and the first non-detected image after detecting the image. Position data in the first non-detected image.
  • the apparatus of the embodiment of the present application further includes: a second obtaining module 604, configured to: according to the first non-detection image in the video frame sequence and the second non-detection after the first non-detection image, by using the first neural network The image acquires position data of the target object in the second non-detected image.
  • a second obtaining module 604 configured to: according to the first non-detection image in the video frame sequence and the second non-detection after the first non-detection image, by using the first neural network The image acquires position data of the target object in the second non-detected image.
  • the apparatus of the embodiment of the present application further includes: a first cropping module 601, configured to respectively cut the detected image and the non-detected image according to the position data of the target object in the detected image, to obtain a first region corresponding to the detected image And an image of the second area corresponding to the non-detected image, wherein the first area image and the second area image comprise a target object;
  • the first obtaining module 602 includes: a second obtaining sub-module 6022, configured to pass the first neural network And acquiring position data of the target object in the second area image according to the first area image and the second area image containing the target object.
  • the apparatus of the embodiment of the present application further includes: a dividing module 605, configured to divide the video frame sequence into multiple sets of video frames according to chronological order, each set of video frames includes at least one frame of video images; and the third obtaining module 606, For obtaining a plurality of sets of video frames, acquiring position data of the target object from the first frame video image, and acquiring position data of the target object in the subsequent video image of the first frame video image through the first neural network, thereby obtaining the group of video frames.
  • Position data of the target object of the at least one frame of the video image; the second determining module 607 is configured to determine a trajectory of the target object according to the location data of the target object of the at least one frame of the plurality of sets of video frames.
  • the third obtaining module 606 includes: a third obtaining submodule 6061, configured to acquire location data of the target object from the first frame video image by using a second neural network for target location detection, the second neural network Includes fast convolutional neural networks.
  • FIG. 7 is a schematic structural diagram of still another embodiment of a target tracking device according to an embodiment of the present application.
  • the device of this embodiment can be used to perform any of the foregoing target tracking methods in the embodiments of the present application.
  • the target tracking device includes a first obtaining module 703 and a first determining module 704.
  • the first obtaining module 703 is configured to acquire, by using the first neural network, location data of the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence of the target object, the first neural network.
  • the network is configured to return the position of the target object in the non-detected image according to the detected image, and the non-detected image is the subsequent image of the detected image;
  • the first determining module 704 is configured to: according to the position data of the target object in the detected image and the target object The position data in the non-detected image determines the trajectory of the target object.
  • the apparatus of the embodiment of the present application further includes: a selecting module 702, configured to select, according to a category of the target object, a first neural network corresponding to a category of the target object.
  • a selecting module 702 configured to select, according to a category of the target object, a first neural network corresponding to a category of the target object.
  • the apparatus of the embodiment of the present application further includes: a first training module 701, configured to train the first neural network according to the detected sample image and the non-detected sample image in the video frame sample sequence of the target object, the non-detected sample The image is a subsequent image of the detected sample image.
  • a first training module 701 configured to train the first neural network according to the detected sample image and the non-detected sample image in the video frame sample sequence of the target object, the non-detected sample The image is a subsequent image of the detected sample image.
  • the first training module 701 includes: a fourth obtaining sub-module 7013, configured to, according to the first neural network to be trained, the detection sample image and the non-detection sample image in the video frame sample sequence containing the target object, Obtaining position data of the target object in the non-detected sample image; the first determining sub-module 7014 is configured to determine, according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image, that the target object is in the detection sample a second positional offset data between the image and the non-detected sample image; the first training sub-module 7015 is configured to train the first neural network according to the first positional offset data and the second positional offset data, the first positional bias
  • the shift data is a standard position offset between the detected sample image and the non-detected sample image of the target object.
  • the apparatus of the embodiment of the present application further includes: a first cropping sub-module 7012, configured to respectively cut the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image, and obtain and detect the sample image.
  • a first cropping sub-module 7012 configured to respectively cut the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image, and obtain and detect the sample image.
  • the fourth acquisition sub-module 7013 includes: an obtaining unit 70131, configured to pass The first neural network to be trained acquires position data of the target object in the image of the fourth region according to the third region image and the fourth region image containing the target object.
  • the first neural network has a convolution layer, a splicing layer connected at the end of the convolution layer, and a fully connected layer connected at the end of the splicing layer
  • the acquiring unit 70131 is configured to: pass the convolution layer, Feature extraction is performed on the three-region image and the fourth region image to obtain a position feature vector of the target object in the third region image and the fourth region image; and the position feature of the target object in the third region image and the fourth region image through the mosaic layer
  • the vector is spliced to obtain the spliced position feature vector; through the fully connected layer, the spliced position feature vector is mapped to obtain the position data of the target object in the fourth region image.
  • the apparatus of the embodiment of the present application further includes: a second determining sub-module 7011, configured to determine the first position offset data according to the position data of the target object in the detected sample image and the position calibration data of the target object in the non-detected sample image. .
  • the position data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • FIG. 8 is a schematic structural diagram of an embodiment of a neural network training apparatus according to an embodiment of the present application.
  • the apparatus of this embodiment may be used to perform any of the above embodiments of the neural network training method of the embodiment of the present application.
  • the training device of the neural network includes a fourth obtaining module 801, a third determining module 802, and a second training module 803.
  • the fourth obtaining module 801 is configured to acquire, by using a neural network to be trained, location data of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object, and non-detecting The sample image is a subsequent image of the detected sample image.
  • a third determining module 802 configured to determine a second positional offset between the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image. data.
  • a second training module 803 configured to train the neural network according to the first position offset data and the second position offset data, where the first position offset data is a standard position between the detected sample image and the non-detected sample image of the target object. Offset.
  • the training device of the neural network obtains the position of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the video frame sample sequence containing the target object through the neural network to be trained.
  • Data and determining, according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image, the second positional offset data between the detected sample image and the non-detected sample image, and then according to the target
  • the object is trained in the neural network by detecting a standard position offset between the sample image and the non-detected sample image and the second position offset data, so that the trained neural network can be based on the video compared to the prior art.
  • the target object position of the previous video image in the sequence of frames returns the target object position of the subsequent video image in the sequence of video frames.
  • FIG. 9 is a schematic structural diagram of another embodiment of a training apparatus for a neural network according to an embodiment of the present application.
  • the apparatus of this embodiment may be used to perform any of the above embodiments of the neural network training method of the embodiment of the present application.
  • the training device of the neural network includes a fourth obtaining module 903, a third determining module 904, and a second training module 905.
  • the fourth obtaining module 903 is configured to acquire, by using a neural network to be trained, location data of the target object in the non-detected sample image according to the detected sample image and the non-detected sample image in the video frame sample sequence of the target object.
  • the non-detected sample image is a subsequent image of the detected sample image;
  • the third determining module 904 is configured to determine, according to the position data of the target object in the detected sample image and the position data of the target object in the non-detected sample image, the target object in the detected sample image a second positional offset data between the non-detected sample image and a second training module 905, configured to train the neural network according to the first positional offset data and the second positional offset data, and the first positional offset data is targeted The standard position offset of the object between the detected sample image and the non-detected sample image.
  • the apparatus of the embodiment of the present application further includes: a second cropping module 902, configured to respectively cut the detected sample image and the non-detected sample image according to the position data of the target object in the detected sample image, to obtain a corresponding image of the detected sample a third area image and a fourth area image corresponding to the non-detected sample image, wherein the third area image and the fourth area image comprise a target object;
  • the fourth obtaining module 903 includes: a fifth obtaining sub-module 9031, configured to: Through the neural network to be trained, the position data of the target object in the image of the fourth region is acquired according to the third region image and the fourth region image containing the target object.
  • the neural network to be trained includes a convolution layer, a splicing layer connected at the end of the convolution layer, and a fully connected layer connected at the end of the splicing layer, wherein the fifth obtaining submodule 9031 is used to: pass the volume Layering, extracting features of the third region image and the fourth region image to obtain a position feature vector of the target object in the third region image and the fourth region image; and through the mosaic layer, in the third region image and the fourth region image The position feature vector of the target object is spliced to obtain the spliced position feature vector; through the fully connected layer, the spliced position feature vector is mapped to obtain the position data of the target object in the fourth region image.
  • the apparatus of the embodiment of the present application further includes: a fourth determining module 901, configured to determine first position offset data according to the position data of the target object in the detected sample image and the position calibration data of the target object in the non-detected sample image.
  • a fourth determining module 901 configured to determine first position offset data according to the position data of the target object in the detected sample image and the position calibration data of the target object in the non-detected sample image.
  • the position data includes a length, a width, and a center position coordinate of the limit frame of the target object.
  • the embodiment of the present application further provides a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions are executed by the processor to implement the steps of the target tracking method described in the embodiments of the present application, or The steps of the neural network training method described in the embodiments of the present application are implemented.
  • a computer program product comprising: computer program instructions, wherein the program instructions are executed by a processor to implement the steps of the target tracking method according to the embodiment of the present application, or implement the present application.
  • a computer program comprising computer readable code, the processor in the device executing the method described in the embodiments of the present application when the computer readable code is run on the device An instruction for each step in the object property detection method; or
  • the processor in the device executes instructions for implementing the steps in the neural network training method described in the embodiments of the present application.
  • An electronic device includes: a processor and a memory
  • the memory is configured to store at least one executable instruction, the executable instruction causing the processor to perform an operation corresponding to the target tracking method described in the embodiment of the present application; or the memory is configured to store at least one executable instruction, The executable instructions cause the processor to perform operations corresponding to the neural network training method described in the embodiments of the present application.
  • An embodiment of the present application further provides an electronic device, including: a processor and a target tracking device according to the third aspect of the present application; when the processor runs the target tracking device, The module in the target tracking device is run; or
  • the processor and the neural network training device according to the fourth aspect of the present application; when the processor runs the training device of the neural network, the module in the neural network training device according to the embodiment of the present application is executed.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • the electronic device 1000 includes one or more first processors, a first communication component, etc., the one or more first processors such as one or more central processing units (CPUs) 1001, and / or one or more image processor (GPU) 1013, etc., the first processor may be loaded into random access memory (RAM) 1003 according to executable instructions stored in read only memory (ROM) 1002 or from storage portion 1008.
  • the executable instructions execute various appropriate actions and processes.
  • the first read only memory 1002 and the random access memory 1003 are collectively referred to as a first memory.
  • the first communication component includes a communication component 1012 and/or a communication interface 1009.
  • the communication component 1012 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
  • the communication interface 1009 includes a communication interface of a network interface card such as a LAN card, a modem, etc., and the communication interface 1009 is via an Internet interface such as The network performs communication processing.
  • the first processor can communicate with read only memory 1002 and/or random access memory 1003 to execute executable instructions, connect to communication component 1012 via first communication bus 1004, and communicate with other target devices via communication component 1012, thereby completing
  • the operation corresponding to any target tracking method provided by the embodiment of the present application, for example, acquiring, by the first neural network, the target object in the non-detected image according to the detected image and the non-detected image in the video frame sequence containing the target object Detecting position data in an image, the first neural network is configured to regress a position of the target object in the non-detection image according to the detection image, and the non-detection image is a subsequent image of the detection image; Determining a trajectory of the target object according to position data of the target object in the detected image and position data of the target object in the non-detected image.
  • the operation corresponding to the training method of any one of the neural networks provided by the embodiment of the present application is completed, for example, by using a neural network to be trained, according to the detected sample image and the non-detected sample image in the sequence of video frame samples containing the target object, Obtaining position data of the target object in the non-detected sample image, the non-detected sample image being a subsequent image of the detected sample image; according to the position data and the location of the target object in the detected sample image Determining the position data of the target object in the non-detected sample image, determining second positional offset data of the target object between the detected sample image and the non-detected sample image; and according to the first position offset data and The second positional offset data is used to train the neural network, and the first positional offset data is a standard positional offset of the target object between the detected sample image and the non-detected sample image.
  • RAM 1003 various programs and data required for the operation of the device can be stored.
  • the CPU 1001 or the GPU 1013, the ROM 1002, and the RAM 1003 are connected to each other through the first communication bus 1004.
  • ROM 1002 is an optional module.
  • the RAM 1003 stores executable instructions, or writes executable instructions to the ROM 1002 at runtime, the executable instructions causing the first processor to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 1005 is also coupled to the first communication bus 1004.
  • the communication component 1012 can be integrated or can be configured to have multiple sub-modules (e.g., multiple IB network cards) and be on a communication bus link.
  • the following components are connected to the I/O interface 1005: an input portion 1006 including a keyboard, a mouse, etc.; an output portion 1007 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk or the like And a communication interface 1009 including a network interface card such as a LAN card, modem, or the like.
  • Driver 1010 is also coupled to I/O interface 1005 as needed.
  • a removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 1010 as needed so that a computer program read therefrom is installed into the storage portion 1008 as needed.
  • FIG. 10 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 10 may be selected, deleted, added, or replaced according to actual needs; Different function component settings may also be implemented by separate settings or integrated settings.
  • the GPU 1013 and the CPU 1001 may be separately configured or the GPU 1013 may be integrated on the CPU 1001, and the communication components may be separately configured or integrated on the CPU 1001 or the GPU 1013. and many more. These alternative embodiments are all within the scope of the present application.
  • embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising the corresponding execution
  • the instruction corresponding to the method step provided by the embodiment of the present application for example, acquiring, by the first neural network, the target object in the non-detected image according to the detected image and the non-detected image in the sequence of video frames containing the target object Location data, the first neural network is configured to regress a position of the target object in the non-detected image according to the detected image, the non-detected image being a subsequent image of the detected image; according to the target The position data of the object in the detected image and the position data of the target object in the non-detected image determine a trajectory of the target object.
  • the computer program can be downloaded and installed from the network via a communication component, and/or installed from the removable medium 1011.
  • the above method according to an embodiment of the present application may be implemented in hardware, firmware, or implemented as software or computer code that may be stored in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or implemented by
  • the network downloads computer code originally stored in a remote recording medium or non-transitory machine readable medium and stored in a local recording medium so that the methods described herein can be stored using a general purpose computer, a dedicated processor or programmable
  • Such software processing on a recording medium of dedicated hardware such as an ASIC or an FPGA.
  • a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.
  • the methods and apparatus of the present application may be implemented in a number of ways.
  • the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种目标跟踪及神经网络训练方法、装置、存储介质、电子设备,涉及人工智能领域。其中,所述目标跟踪方法包括:通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。通过本申请实施例,不仅提高了目标跟踪的检测效率,而且还提高了目标跟踪的精度。

Description

目标跟踪及神经网络训练方法、装置、存储介质、电子设备
本申请要求在2017年10月27日提交中国专利局、申请号为CN 201711031418.9、发明名称为“目标跟踪及神经网络训练方法、装置、存储介质、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,尤其涉及一种目标跟踪方法、装置、存储介质和电子设备,以及,一种神经网络的训练方法、装置、存储介质和电子设备。
背景技术
目标跟踪技术是智能视频监控技术中的重要组成部分。对于静态图像,只需要检测出静态图像中物体的限位框的位置,但是对于智能视频,在检测出每一帧视频图像中物体的限位框的位置之后,还需要对每一帧视频图像中物体的限位框进行匹配,以确定目标物体的轨迹。
发明内容
本申请实施例提供一种目标跟踪的技术方案和神经网络训练的技术方案。
根据本申请实施例的第一方面,提供了一种目标跟踪方法。所述方法包括:通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。
可选地,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,包括:通过所述第一神经网络,根据所述视频帧序列中的检测图像和在所述检测图像之后的第一非检测图像,获取所述目标物体在所述第一非检测图像中的位置数据。
可选地,所述方法还包括:通过所述第一神经网络,根据所述视频帧序列中的第一非检测图像和在所述第一非检测图像之后的第二非检测图像,获取所述目标物体在所述第二非检测图像中的位置数据。
可选地,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:根据所述检测图像中目标物体的位置数据,分别对所述检测图像和所述非检测图像进行裁剪,获得与所述检测图像对应的第一区域图像以及与所述非检测图像对应的第二区域图像,其中,所述第一区域图像与所述第二区域图像包含所述目标物体;所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,包括:通过所述第一神经网络,根据含有所述目标物体的第一区域图像和第二区域图像,获取所述目标物体在所述第二区域图像中的位置数据。
可选地,所述方法还包括:按照时间顺序,将所述视频帧序列划分为多组视频帧,每组所述视频帧包括至少一帧视频图像;针对所述多组视频帧,从首帧视频图像中获取所述目标物体的位置数据,并通过所述第一神经网络,获取所述首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组所述视频帧中的至少一帧视频图像的目标物体的位置数据;根据所述多组视频帧中的至少一帧视频图像的目标物体的位置数据确定所述目标物体的轨迹。
可选地,所述从首帧视频图像中获取所述目标物体的位置数据,包括:通过用于目标位置检测的第二神经网络,从首帧视频图像中获取所述目标物体的位置数据,所述第二神经网络包括快速卷积神经网络。
可选地,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:根据所述目标物体的类别确定与所述目标物体的类别对应的第一神经网络。
可选地,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,所述非检测样本图像为所述检测样本图像的在后图像。
可选地,所述根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,包括:通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据;根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标 物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;根据第一位置偏移数据和所述第二位置偏移数据,训练所述第一神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
可选地,所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,包括:通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
可选地,所述待训练的第一神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,所述通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据,包括:通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
可选地,所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
可选地,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
根据本申请实施例的第二方面,提供了一种神经网络训练方法。所述方法包括:通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
可选地,所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,包括:通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
可选地,所述待训练的神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,所述通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据,包括:通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
可选地,所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
可选地,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
根据本申请实施例的第三方面,提供了一种目标跟踪装置。所述装置包括:第一获取模块,用于通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像; 第一确定模块,用于根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。
可选地,所述第一获取模块,包括:第一获取子模块,用于通过所述第一神经网络,根据所述视频帧序列中的检测图像和在所述检测图像之后的第一非检测图像,获取所述目标物体在所述第一非检测图像中的位置数据。
可选地,所述装置还包括:第二获取模块,用于通过所述第一神经网络,根据所述视频帧序列中的第一非检测图像和在所述第一非检测图像之后的第二非检测图像,获取所述目标物体在所述第二非检测图像中的位置数据。
可选地,所述第一获取模块之前,所述装置还包括:第一裁剪模块,用于根据所述检测图像中目标物体的位置数据,分别对所述检测图像和所述非检测图像进行裁剪,获得与所述检测图像对应的第一区域图像以及与所述非检测图像对应的第二区域图像,其中,所述第一区域图像与所述第二区域图像包含所述目标物体;所述第一获取模块,包括:第二获取子模块,用于通过所述第一神经网络,根据含有所述目标物体的第一区域图像和第二区域图像,获取所述目标物体在所述第二区域图像中的位置数据。
可选地,所述装置还包括:划分模块,用于按照时间顺序,将所述视频帧序列划分为多组视频帧,每组所述视频帧包括至少一帧视频图像;第三获取模块,用于针对所述多组视频帧,从首帧视频图像中获取所述目标物体的位置数据,并通过所述第一神经网络,获取所述首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组所述视频帧中的至少一帧视频图像的目标物体的位置数据;第二确定模块,用于根据所述多组视频帧中的至少一帧视频图像的目标物体的位置数据确定所述目标物体的轨迹。
可选地,所述第三获取模块,包括:第三获取子模块,用于通过用于目标位置检测的第二神经网络,从首帧视频图像中获取所述目标物体的位置数据,所述第二神经网络包括快速卷积神经网络。
可选地,所述第一获取模块之前,所述装置还包括:选择模块,用于根据所述目标物体的类别确定与所述目标物体的类别对应的第一神经网络。
可选地,所述第一获取模块之前,所述装置还包括:第一训练模块,用于根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,所述非检测样本图像为所述检测样本图像的在后图像。
可选地,所述第一训练模块,包括:第四获取子模块,用于通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据;第一确定子模块,用于根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;第一训练子模块,用于根据第一位置偏移数据和所述第二位置偏移数据,训练所述第一神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
可选地,所述第四获取子模块之前,所述装置还包括:第一裁剪子模块,用于根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;所述第四获取子模块,包括:获取单元,用于通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
可选地,所述待训练的第一神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,所述获取单元,具体用于:通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
可选地,所述第四获取子模块之前,所述装置还包括:第二确定子模块,用于根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
可选地,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
根据本申请实施例的第四方面,提供了一种神经网络训练装置。所述装置包括:第四获取模块,用于通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;第三确定模块,用于根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;第二训练模块,用于根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网 络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
可选地,所述第四获取模块之前,所述装置还包括:第二裁剪模块,用于根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;所述第四获取模块,包括:第五获取子模块,用于通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
可选地,所述待训练的神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,所述第五获取子模块,具体用于:通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
可选地,所述第四获取模块之前,所述装置还包括:第四确定模块,用于根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
可选地,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
根据本申请实施例的第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述程序指令被处理器执行时实现本申请实施例第一方面所述的目标跟踪方法的步骤,或者实现本申请实施例第二方面所述的神经网络训练方法的步骤。
根据本申请实施例的第六方面,提供了一种计算机程序产品,其包括有计算机程序指令,其中,所述程序指令被处理器执行时实现本申请实施例第一方面所述的目标跟踪方法的步骤,或者实现本申请实施例第二方面所述的神经网络训练方法的步骤。
根据本申请实施例的第七方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请实施例第一方面所述的对象属性检测方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请实施例第二方面所述的神经网络训练方法中各步骤的指令。
根据本申请实施例的第八方面,提供了一种电子设备,包括:处理器和存储器;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请实施例第一方面所述的目标跟踪方法对应的操作;或者,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请实施例第二方面所述的神经网络训练方法对应的操作。
根据本申请实施例的第九方面,提供了一种电子设备,包括:处理器和本申请实施例第三方面所述的目标跟踪装置;在处理器运行所述目标跟踪装置时,本申请实施例第三方面所述的目标跟踪装置中的模块被运行;或者
处理器和本申请实施例第四方面所述的神经网络训练装置;在处理器运行所述神经网络的训练装置时,本申请实施例第四方面所述的神经网络训练装置中的模块被运行。
根据本申请实施例提供的技术方案,通过用于根据检测图像回归目标物体在非检测图像中的位置的第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据;并根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹,相比于现有隔帧检测的技术,本申请实施例可以根据检测图像回归非检测图像中目标物体的位置,在兼顾目标跟踪的检测效率的同时,还提高了目标跟踪的精度。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1是根据本申请实施例目标跟踪方法的一个实施例的流程示意图。
图2是根据本申请实施例目标跟踪方法的另一个实施例的流程示意图。
图3是根据本申请实施例神经网络训练方法的一个实施例的流程示意图。
图4是根据本申请实施例神经网络训练方法的另一个实施例的流程示意图。
图5是根据本申请实施例目标跟踪装置的一个实施例的结构示意图。
图6是根据本申请实施例目标跟踪装置的另一个实施例的结构示意图。
图7是根据本申请实施例目标跟踪装置的又一实施例的结构示意图。
图8是根据本申请实施例神经网络训练装置的一个实施例的结构示意图。
图9是根据本申请实施例神经网络训练装置的另一实施例的结构示意图。
图10是根据本申请实施例电子设备的一个实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1是根据本申请实施例目标跟踪方法的一种实施例的流程示意图。该方法可以由任意目标跟踪设备执行,例如终端设备、服务器、移动设备等等,本申请实施例对此不做限定。如图1所示,本实施例的目标跟踪方法包括以下步骤:
在步骤S101中,通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据。
其中,第一神经网络用于根据检测图像回归目标物体在非检测图像中的位置。目标物体可包括但不限于交通工具、行人、无人机等。目标物体在图像中的位置数据可包括但不限于目标物体的限位框的顶点坐标和中心位置坐标。可选地,目标物体的限位框可为正方形或长方形。当目标物体的限位框为正方形时,目标物体的限位框的顶点坐标可为正方形的四个角所在的点的坐标。
在可选的实施方式中,检测图像可为在视频帧序列中利用检测器检测得到目标物体的位置的图像,非检测图像可为检测图像的在后图像,且非利用检测器检测得到目标物体的位置的图像。检测图像与非检测图像可为视频帧序列中相邻的视频图像,也可为视频帧序列中不相邻的视频图像,即检测图像与非检测图像之间具有相隔的视频图像。
在一个可选示例中,该步骤S101可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取模块501执行。
在步骤S102中,根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹。
其中,目标物体在检测图像中的位置数据是事先确定好的,不需要第一神经网络进行获取。可选地,可事先通过用于目标位置检测的神经网络检测出所述检测图像中目标物体的位置数据。当然,也可以采用其它的实施方式事先检测出检测图像中的目标物体的位置数据,本申请实施例对此不作限制。非检测图像中目标物体的位置数据是通过第一神经网络,根据检测图像和非检测图像获取得到的。
在一个可选示例中,该步骤S102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一确定模块502执行。
在可选的实施方式中,含有目标物体的视频帧序列中包括多帧视频图像。由于目标物体在在前的检测图像中的位置数据以及目标物体在在后的非检测图像中的位置数据是已知的,可得到目标物体在视频帧序列的每一帧视频图像中的位置数据。根据目标物体在视频帧序列的每一帧视频图像中的位置数据可确定得到目标物体的轨迹。
根据本申请实施例提供的技术方案,通过用于根据检测图像回归目标物体在非检测图像中的位置的第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据;并根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹,相比于现有隔帧检测的技术,本申请实施例可以根据检测图像回归非检测图像中目标物体的位置,在兼顾目标跟踪的检测效率的同时,还提高了目标跟踪的精度。
本实施例的目标跟踪方法可以由任意适当的具有图像或数据处理能力的设备执行,包括但不限于:摄像头、终端、移动终端、PC机、服务器、车载设备、娱乐设备、广告设备、个人数码助理(PDA)、平板电脑、笔记本电脑、掌上游戏机、智能眼镜、智能手表、可穿戴设备、虚拟显示设备或显示增强设备(如Google Glass、Oculus Rift、Hololens、Gear VR)等。
图2是根据本申请实施例目标跟踪方法的另一个实施例的流程示意图。该方法可以由任意目标跟踪设备执行,例如终端设备、服务器、移动设备等等,本申请实施例对此不做限定。如图2所示,本实施例的目标跟踪方法包括以下步骤:
在步骤S201中,根据检测图像中目标物体的位置数据,分别对检测图像和非检测图像进行裁剪,获得与检测图像对应的第一区域图像以及与非检测图像对应的第二区域图像。
其中,目标物体的位置数据可包括但不限于目标物体的限位框的长度、宽度以及中心位置坐标。第一区域图像与第二区域图像包含目标物体。
在可选的实施方式中,首先可根据检测图像中目标物体的位置数据确定得到图像的裁剪位置数据。可选地,可保证裁剪框的中心位置坐标与目标物体的限位框的中心位置坐标相同,并将目标物体的限位框的长度和宽度按照一定的比例进行扩大,获得裁剪框的长度和宽度,从而得到图像的裁剪位置数据。在获得图像的裁剪位置数据之后,可根据图像的裁剪位置数据,分别对检测图像和非检测图像进行裁剪,获得与检测图像对应的第一区域图像以及与非检测图像对应的第二区域图像。之所以对检测图像和非检测图像进行裁剪,是因为检测图像与非检测图像之间相隔的视频图像的帧数通常较小,例如:在0到3之间,那么目标物体在非检测图像中的位置相对于目标物体在检测图像中的位置的变化也很小,目标物体在非检测图像中的限位框的位置会落入非检测图像的裁剪框内。籍此,可减轻第一神经网络的数据处理量,第一神经网络能够基于视频帧序列中在前的检测图像的目标物体位置快速回归出视频帧序列中在后的非检测图像的目标物体位置。此外,由于图像的裁剪位置数据是根据检测图像中目标物体的位置数据确定得到的,因此,检测图像中目标物体的位置数据隐含在裁剪后的检测图像(第一区域图像)中。可选地,可根据第一区域图像的中心位置坐标、长度和宽度确定得到第一区域图像中目标物体的限位框的长度、宽度和中心位置坐标。
在一个可选示例中,该步骤S201可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一裁剪模块601执行。
在步骤S202中,通过第一神经网络,根据含有目标物体的第一区域图像和第二区域图像,获取目标物体在第二区域图像中的位置数据。
其中,第一神经网络用于根据第一区域图像回归目标物体在第二区域图像中的位置。目标物体在第二区域图像中的位置数据可包括但不限于目标物体的限位框的长度、宽度以及中心位置坐标。
在一个可选示例中,该步骤S202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二获取子模块6022执行。
在不对检测图像和非检测图像进行裁剪的情况下,通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据。在本申请一可选实施方式中,通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据,包括:通过第一神经网络,根据视频帧序列中的检测图像和在检测图像之后的第一非检测图像,获取目标物体在第一非检测图像中的位置数据。籍此,能够非常准确地预测出在检测图像之后的第一非检测图像中的目标物体的位置数据。
可选地,在本申请一可选实施方式中,本申请实施例方法还包括:通过第一神经网络,根据视频帧序列中的第一非检测图像和在第一非检测图像之后的第二非检测图像,获取目标物体在第二非检测图像中的位置数据。籍此,能够较为准确地预测出在第一非检测图像之后的第二非检测图像中的目标物体的位置数据。
其中,检测图像与第一非检测图像可为视频帧序列中相邻的视频图像,也可为视频帧序列中不相邻的视频图像,即检测图像与第一非检测图像之间具有相隔的视频图像。第一非检测图像与第二非检测图像可为视频帧序列中相邻的视频图像,也可为视频帧序列中不相邻的视频图像,即第一非检测图像与第二非检测图像之间具有相隔的视频图像。第一神经网络根据检测图像回归出目标物体在检测图像之后的第一非检测图像中的位置的准确度较高,第一神经网络根据第一非检测图像回归出目标物体在第一非检测图像之后的第二非检测图像中的位置的准确度较低。
可选地,在该步骤S202之前,本申请实施例方法还包括:根据目标物体的类别确定与目标物体的类别对应的第一神经网 络。籍此,能够进一步提高目标跟踪的精度。
在可选的实施方式中,可以针对目标物体的不同类别分别训练对应的第一神经网络。例如,对于移动较快的车辆可以单独训练一个相应的第一神经网络,而对于移动较慢的车辆可以单独训练一个相应的第一神经网络,从而能够进一步提高目标车辆跟踪的精度。
步骤S203,根据目标物体在第一区域图像中的位置数据和目标物体在所述第二区域图像中的位置数据确定所述目标物体的轨迹。
其中,第一区域图像中目标物体的位置数据就是检测图像中目标物体的位置数据,因为第一区域图像是通过对检测图像进行裁剪得到的。第二区域图像中目标物体的位置数据是通过第一神经网络,根据第一区域图像和第二区域图像获取得到的。
在一个可选示例中,该步骤S203可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二确定模块607执行。
在可选的实施方式中,含有目标物体的视频帧序列中包括多帧视频图像。由于目标物体在第一区域图像中的位置数据和目标物体在第二区域图像中的位置数据是已知的,相应地,目标物体在在前的检测图像中的位置数据以及目标物体在在后的非检测图像中的位置数据也是已知的,可得到目标物体在视频帧序列的每一帧视频图像中的位置数据。根据目标物体在视频帧序列的每一帧视频图像中的位置数据可确定得到目标物体的轨迹。
可选地,本申请的一些可选实施例方法中还包括:按照时间顺序,将视频帧序列划分为多组视频帧,每组视频帧包括至少一帧视频图像;针对多组视频帧,从首帧视频图像中获取目标物体的位置数据,并通过第一神经网络,获取首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组视频帧中至少一帧视频图像的目标物体的位置数据;根据多组视频帧中至少一帧视频图像的目标物体的位置数据确定目标物体的轨迹。籍此,能够进一步提高目标跟踪的精度。
其中,从首帧视频图像中获取目标物体的位置数据,包括:通过用于目标位置检测的第二神经网络,从首帧视频图像中获取目标物体的位置数据。可选地,第二神经网络包括快速卷积神经网络(Faster Region with CNN,Faster RCNN)。通过第一神经网络,获取首帧视频图像后续的视频图像中目标物体的位置数据,包括:通过第一神经网络,根据首帧视频图像和后续的视频图像获取后续的视频图像中目标物体的位置数据。
在一些可选的实施方式中,每组视频帧包括四帧视频图像。首帧视频图像为关键帧,需要第二神经网络从首帧视频图像中检测出目标物体的位置数据,首帧视频图像后续的三帧视频图像需要第一神经网络根据首帧视频图像和后续的视频图像回归出后续的视频图像中目标物体的位置数据。籍此,视频可以以分段的形式进行检测,一个分段内,首帧为关键帧,首帧后面的几个视频帧都做回归,这样一个分段的整体检测时间和现有技术中一帧视频图像的检测时间接近,可以让目标跟踪的反应时间更短。
在本实施例中,通过第一神经网络,根据含有目标物体的第一区域图像和第二区域图像,获取目标物体在第二区域图像中的位置数据之前,需要对第一神经网络进行训练。在训练第一神经网络时,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练第一神经网络,非检测样本图像为检测样本图像的在后图像。
在本申请一些可选实施方式中,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练第一神经网络,包括:通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据;并根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据;再根据第一位置偏移数据和第二位置偏移数据,训练第一神经网络,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量。其中,标准位置偏移量为根据目标物体在检测样本图像和非检测样本图像中的实际位置测量得到的。
在本申请一些可选实施方式中,通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据之前,该实施例方法还包括:根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像,其中,第三区域图像与第四区域图像包含目标物体。相应地,通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,包括:通过待训练的第一神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据。
在本申请一些可选实施方式中,待训练的第一神经网络包括卷积层、连接在卷积层末端的拼接层,以及连接在拼接层末端的全连接层,其中,通过第一神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据,包括:通过卷积层,对第三区域图像和第四区域图像进行特征提取,获得第三区域图像和第四区域图像中 目标物体的位置特征向量;通过拼接层,对第三区域图像和第四区域图像中目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过全连接层,对拼接后的位置特征向量进行映射操作,获得目标物体在第四区域图像中的位置数据。
在本申请一些可选实施方式中,通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据之前,该实施例方法还包括:根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置标定数据确定第一位置偏移数据。
本申请实施例提供的目标跟踪方法,与现有技术中对视频帧序列中每帧视频图像都做检测的方法相比,不仅能够提高目标跟踪的速度,还能够保证目标跟踪的精度,并且与现有技术中对视频帧序列进行跳帧检测的方法相比,能够综合运用视频帧序列中每一帧视频图像的目标物体的位置信息,目标跟踪的精度更高。此外,本申请实施例提供的目标跟踪方法可以保证获取得到的在后的非检测图像中物体的位置数据与目标物体是一对一的关系,而不需要在得到视频帧序列中每一帧视频图像的物体位置数据之后,再通过对每一帧视频图像中的物体的位置数据进行匹配来获得每一帧视频图像中目标物体的位置数据,从而得到目标物体的轨迹。
本申请实施例提供的目标跟踪方法可以应用于实际的场景中。例如,在实时的交通路面上,如果交通管理部门希望通过目标跟踪来确认车辆的运行轨迹,而又无法为每个监控摄像头都支付一笔昂贵的设备费用的时候,通过本申请实施例提供的目标跟踪方法则可以实现一台设备实时跟踪数个甚至数十个监控摄像头,降低了目标跟踪的成本。
根据本实施例提供的目标跟踪方法,根据检测图像中目标物体的位置数据,分别对检测图像和非检测图像进行裁剪,获得与检测图像对应的第一区域图像以及与非检测图像对应的第二区域图像,再通过用于根据第一区域图像回归目标物体在第二区域图像中的位置的第一神经网络,根据含有目标物体的第一区域图像和第二区域图像,获取目标物体在第二区域图像中的位置数据;并根据目标物体在第一区域图像中的位置数据和目标物体在第二区域图像中的位置数据确定目标物体的轨迹,相比于现有隔帧检测的技术,本申请实施例可以根据第一区域图像回归第二区域图像中目标物体的位置,在兼顾目标跟踪的检测效率的同时,还提高了目标跟踪的精度。
本实施例的目标跟踪方法可以由任意适当的具有图像或数据处理能力的设备执行,包括但不限于:摄像头、终端、移动终端、PC机、服务器、车载设备、娱乐设备、广告设备、个人数码助理(PDA)、平板电脑、笔记本电脑、掌上游戏机、智能眼镜、智能手表、可穿戴设备、虚拟显示设备或显示增强设备(如Google Glass、Oculus Rift、Hololens、Gear VR)等。
图3是根据本申请实施例神经网络训练方法的一个实施例的流程示意图。该方法可以由任意神经网络训练设备执行,例如终端设备、服务器、移动设备等等,本申请实施例对此不做限定。如图3所示,本实施例的神经网络训练方法包括以下步骤:
在步骤S301中,通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据。
在本申请实施例中,神经网络可以是任意适当的可实现特征提取或目标对象检测的神经网络,包括但不限于卷积神经网络、增强学习神经网络、对抗神经网络中的生成网络等等。神经网络中结构的设置可以由本领域技术人员根据实际需求适当设定,如卷积层的层数、卷积核的大小、通道数等等,本申请实施例对此不作限制。其中,目标物体可包括交通工具、行人、无人机等。样本图像中目标物体的位置数据可包括目标物体的限位框的顶点坐标和中心位置坐标。可选地,目标物体的限位框可为正方形或长方形。例如,当目标物体的限位框为长方形时,目标物体的限位框的顶点坐标可为长方形的四个角所在的点的坐标。
在可选的实施方式中,检测样本图像可为在视频帧样本序列中利用检测器检测得到目标物体的位置的图像,非检测样本图像可为检测样本图像的在后图像,且非利用检测器检测得到目标物体的位置的图像。检测样本图像与非检测样本图像可为视频帧样本序列中相邻的视频图像,也可为视频帧样本序列中不相邻的视频图像,即检测样本图像与非检测样本图像之间具有相隔的视频图像。为了让训练得到的神经网络的适应性较好,效果较佳,不仅限于选择相邻的检测样本图像和非检测样本图像,还可以选择不相邻的检测样本图像和非检测样本图像,让训练得到的神经网络能够获取目标位置变化更大的样本图像中目标物体的位置,即可以让训练得到的神经网络能够根据过去几帧的视频图像中的目标物体位置更精确地获取当前视频帧图像中的物体的位置,而不是只能通过前一帧视频图像中目标物体的位置获取当前视频帧图像中的物体的位置。
在一个可选示例中,该步骤S301可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第四获取模块801执行。
在步骤S302中,根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据。
其中,检测样本图像中目标物体的位置数据是事先确定好的,不需要待训练的神经网络进行获取。可选地,可事先通过 用于目标位置检测的神经网络检测出检测样本图像中目标物体的位置数据。当然,也可以采用其它的实施方式事先检测出检测样本图像中的目标物体的位置数据,本申请实施例对此不作限制。非检测样本图像中目标物体的位置数据是通过待训练的神经网络,根据检测样本图像和非检测样本图像获取得到的。
在可选的实施方式中,可将非检测样本图像中目标物体的位置数据减去检测样本图像中目标物体的位置数据,获得目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据。
在一个可选示例中,该步骤S302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三确定模块802执行。
在步骤S303中,根据第一位置偏移数据和第二位置偏移数据,训练神经网络。
其中,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量。可选地,第一位置偏移数据是根据检测样本图像中目标物体的位置和非检测样本图像中目标物体的标注位置确定得到的,可作为神经网络训练的监督量。在可选的实施方式中,该步骤S303可包括:根据第一位置偏移数据和第二位置偏移数据确定目标物体的位置差异,再根据目标物体的位置差异调整神经网络的网络参数。通过计算目标物体的位置差异,对当前获得的第二位置偏移数据进行评估,以作为后续训练神经网络的依据。
在一个可选示例中,该步骤S303可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二训练模块803执行。
可选地,可将目标物体的位置差异反向传输给神经网络,从而迭代地训练该神经网络。神经网络的训练是一个迭代的过程,本申请实施例仅对其中的一次训练过程进行了说明,但本领域技术人员应当明了,对神经网络的每次训练都可采用该训练方式,直至完成所述神经网络的训练。
本申请的示例性实施例旨在提出一种神经网络的训练方法,通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,并根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据,再根据目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量和所述第二位置偏移数据,训练神经网络,与现有技术相比,使得训练得到的神经网络能够基于视频帧序列中在前视频图像的目标物体位置回归出视频帧序列中在后视频图像的目标物体位置。
本实施例的神经网络的训练方法可以由任意适当的具有图像或数据处理能力的设备执行,包括但不限于:摄像头、终端、移动终端、PC机、服务器、车载设备、娱乐设备、广告设备、个人数码助理(PDA)、平板电脑、笔记本电脑、掌上游戏机、智能眼镜、智能手表、可穿戴设备、虚拟显示设备或显示增强设备(如Google Glass、Oculus Rift、Hololens、Gear VR)等。
图4是根据本申请实施例神经网络训练方法的另一个实施例的流程示意图。该方法可以由任意神经网络训练设备执行,例如终端设备、服务器、移动设备等等,本申请实施例对此不做限定。如图4所示,本实施例的神经网络的训练方法包括以下步骤:
在步骤S401中,根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像。
其中,目标物体的位置数据可包括目标物体的限位框的长度、宽度以及中心位置坐标。第三区域图像与第四区域图像包含目标物体。
在可选的实施方式中,首先可根据检测样本图像中目标物体的位置数据确定得到样本图像的裁剪位置数据。可选地,可保证裁剪框的中心位置坐标与目标物体的限位框的中心位置坐标相同,并将目标物体的限位框的长度和宽度按照一定的比例进行扩大,获得裁剪框的长度和宽度,从而得到样本图像的裁剪位置数据。在获得样本图像的裁剪位置数据之后,可根据样本图像的裁剪位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像。之所以对检测样本图像和非检测样本图像进行裁剪,是因为检测样本图像与非检测样本图像之间相隔的视频图像的帧数通常较小,例如:在0到3之间,那么目标物体在非检测样本图像中的位置相对于目标物体在检测样本图像中的位置的变化也很小,目标物体在非检测样本图像中的限位框的位置会落入非检测样本图像的裁剪框内。籍此,可减轻神经网络的数据处理量,从而训练得到的神经网络可基于视频帧序列中在前视频图像的目标物体位置快速回归出视频帧序列中在后视频图像的目标物体位置。此外,由于样本图像的裁剪位置数据是根据检测样本图像中目标物体的位置数据确定得到的,因此,检测样本图像中目标物体的位置数据隐含在裁剪后的检测样本图像(第三区域图像)中。可选地,可根据第三区域图像的中心位置坐标、长度和宽度确定得到第三区域图像中目标物体的限位框的长度、宽度和中心位置坐标。
在一个可选示例中,该步骤S401可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二裁剪模块 902执行。
在步骤S402中,通过待训练的神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据。
其中,待训练的神经网络具有卷积层、连接在卷积层末端的拼接层,以及连接在拼接层末端的全连接层。可选地,神经网络具有六层连续的卷积层,为了使得训练得到的神经网络基于视频帧序列中在前视频图像的目标物体位置快速回归出视频帧序列中在后视频图像的目标物体位置,神经网络没有采用池化层。可选地,待训练的神经网络具有两个输入端和一个输出端,一个输入端用于输入第三区域图像,另一个输入端用于输入第四区域图像,输出端用于输出目标物体在第四区域图像中的位置数据。
在一个可选示例中,该步骤S402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第五获取子模块9031执行。
可选地,通过待训练的神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据,包括:通过卷积层,对第三区域图像和第四区域图像进行特征提取,获得第三区域图像和第四区域图像中目标物体的位置特征向量;通过拼接层,对第三区域图像和第四区域图像中目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过全连接层,对拼接后的位置特征向量进行映射操作,获得目标物体在所述第四区域图像中的位置数据。
在步骤S403中,根据第三区域图像中目标物体的位置数据和第四区域图像中目标物体的位置数据,确定目标物体在第三区域图像和第四区域图像之间的第二位置偏移数据。
其中,第三区域图像中目标物体的位置数据就是检测样本图像中目标物体的位置数据,因为第三区域图像是通过对检测样本图像进行裁剪得到的。第四区域图像中目标物体的位置数据是通过待训练的神经网络,根据第三区域图像和第四区域图像获取得到的。
在一个可选示例中,该步骤S403可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第三确定模块904执行。
在可选的实施方式中,第二位置偏移数据是非检测样本图像中目标物体的位置相对于检测样本图像中目标物体的位置的偏移量。可选地,可将第四区域图像中目标物体的位置数据减去第三区域图像中目标物体的位置数据,获得目标物体在第三区域图像和第四区域图像之间的第二位置偏移数据。当目标物体的位置数据包括目标物体的限位框的长度、宽度以及中心位置坐标时,第二位置偏移数据包括目标物体的限位框的中心位置坐标的改变量以及目标物体的限位框的长度和宽度的改变量。
在步骤S404中,根据第一位置偏移数据和第二位置偏移数据,训练神经网络。
其中,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量,也即是第一位置偏移数据为目标物体在第三区域图像和第四区域图像之间的标准位置偏移量。可选地,根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪之前,该实施例方法还包括:根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置标定数据确定第一位置偏移数据。
在一个可选示例中,该步骤S404可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二训练模块905执行。
在本实施例中,检测样本图像中目标物体的位置数据可以是事先确定好的,不需要待训练的神经网络进行获取。可选地,可事先通过用于目标位置检测的神经网络检测出检测样本图像中目标物体的位置数据。当然,也可以采用其它的实施方式事先检测出检测样本图像中的目标物体的位置数据,本申请实施例对此不作限制。非检测样本图像中目标物体的位置标定数据也可以是事先确定好的。可选地,可事先通过用于目标位置检测的神经网络检测出所述非检测样本图像中目标物体的位置标定数据。在本申请一些可选实施方式中,还可通过人工标定的方式对非检测样本图像中目标物体的限位框的位置进行标定,从而得到非检测样本图像中目标物体的位置标定数据。当然,也可以采用其它的实施方式事先获得非检测样本图像中目标物体的位置标定数据,本申请实施例对此不作限制。
在可选的实施方式中,第一位置偏移数据是非检测样本图像中目标物体的标定位置相对于检测样本图像中目标物体的位置的偏移量。可选地,可将非检测样本图像中目标物体的位置标定数据减去检测样本图像中目标物体的位置数据,获得目标物体在检测样本图像和非检测样本图像之间的第一位置偏移数据。当目标物体的位置数据包括目标物体的限位框的长度、宽度以及中心位置坐标时,第一位置偏移数据包括目标物体的限位框的中心位置坐标的改变量以及目标物体的限位框的长度和宽度的改变量。
在本申请一些可选实施方式中,还可以采取以下方法对神经网络进行训练。例如,首先通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,其中, 非检测样本图像含有目标物体的位置标定数据;然后,再根据目标物体在非检测样本图像中的位置数据和非检测样本图像含有的目标物体的位置标定数据训练神经网络。
在本申请另一些可选实施方式中,还可以采取以下方法对神经网络进行训练。例如,首先根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像,再通过待训练的神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据,其中,第四区域图像含有目标物体的位置标定数据。然后,再根据目标物体在第四区域图像中的位置数据和第四区域图像含有的目标物体的位置标定数据训练神经网络。
本申请的示例性实施例旨在提出一种神经网络的训练方法,根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像,并通过待训练的神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据,再根据第三区域图像中目标物体的位置数据和第四区域图像中目标物体的位置数据,确定目标物体在第三区域图像和第四区域图像之间的第二位置偏移数据,再根据目标物体在第三区域图像和第四区域图像之间的标准位置偏移量和第二位置偏移数据,训练神经网络,使得训练得到的神经网络能够基于视频帧序列中在前视频图像的目标物体位置快速回归出视频帧序列中在后视频图像的目标物体位置。
本实施例的神经网络的训练方法可以由任意适当的具有图像或数据处理能力的设备执行,包括但不限于:摄像头、终端、移动终端、PC机、服务器、车载设备、娱乐设备、广告设备、个人数码助理(PDA)、平板电脑、笔记本电脑、掌上游戏机、智能眼镜、智能手表、可穿戴设备、虚拟显示设备或显示增强设备(如Google Glass、Oculus Rift、Hololens、Gear VR)等。
基于相同的技术构思,图5是根据本申请实施例目标跟踪装置的一个实施例的结构示意图。本实施例装置可用以执行本申请实施例上述目标跟踪方法的任一实施例。
参照图5,该目标跟踪装置包括第一获取模块501和第一确定模块502。
第一获取模块501,用于通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据,第一神经网络用于根据检测图像回归所述目标物体在所述非检测图像中的位置,非检测图像为检测图像的在后图像。
第一确定模块502,用于根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹。
通过本实施例提供的目标跟踪装置,通过用于根据检测图像回归目标物体在非检测图像中的位置的第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据;并根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹,相比于现有隔帧检测的技术,本申请实施例可以根据检测图像回归非检测图像中目标物体的位置,在兼顾目标跟踪的检测效率的同时,还提高了目标跟踪的精度。
基于相同的技术构思,图6是根据本申请实施例目标跟踪装置的另一个实施例的结构示意图。本实施例装置可用以执行本申请实施例上述目标跟踪方法的任一实施例。
参照图6,该目标跟踪装置包括第一获取模块602和第一确定模块603。其中,第一获取模块602,用于通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在非检测图像中的位置数据,第一神经网络用于根据检测图像回归目标物体在非检测图像中的位置,非检测图像为检测图像的在后图像;第一确定模块603,用于根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹。
可选地,第一获取模块602,包括:第一获取子模块6021,用于通过第一神经网络,根据视频帧序列中的检测图像和在检测图像之后的第一非检测图像,获取目标物体在第一非检测图像中的位置数据。
可选地,本申请实施例装置还包括:第二获取模块604,用于通过第一神经网络,根据视频帧序列中的第一非检测图像和在第一非检测图像之后的第二非检测图像,获取目标物体在第二非检测图像中的位置数据。
可选地,本申请实施例装置还包括:第一裁剪模块601,用于根据检测图像中目标物体的位置数据,分别对检测图像和非检测图像进行裁剪,获得与检测图像对应的第一区域图像以及与非检测图像对应的第二区域图像,其中,第一区域图像与第二区域图像包含目标物体;第一获取模块602,包括:第二获取子模块6022,用于通过第一神经网络,根据含有目标物体的第一区域图像和第二区域图像,获取目标物体在第二区域图像中的位置数据。
可选地,本申请实施例装置还包括:划分模块605,用于按照时间顺序,将视频帧序列划分为多组视频帧,每组视频帧包括至少一帧视频图像;第三获取模块606,用于针对多组视频帧,从首帧视频图像中获取目标物体的位置数据,并通过第一神经网络,获取首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组视频帧中的至少一帧视频图像的目标 物体的位置数据;第二确定模块607,用于根据多组视频帧中的至少一帧视频图像的目标物体的位置数据确定目标物体的轨迹。
可选地,第三获取模块606,包括:第三获取子模块6061,用于通过用于目标位置检测的第二神经网络,从首帧视频图像中获取目标物体的位置数据,第二神经网络包括快速卷积神经网络。
需要说明的是,对于本申请实施例提供的目标跟踪装置还涉及的具体细节已在本申请实施例提供的目标跟踪方法中作了详细的说明,在此不在赘述。
基于相同的技术构思,图7是根据本申请实施例目标跟踪装置的又一个实施例的结构示意图。本实施例装置可用以执行本申请实施例上述目标跟踪方法的任一实施例。
参照图7,该目标跟踪装置包括第一获取模块703和第一确定模块704。其中,第一获取模块703,用于通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取目标物体在所述非检测图像中的位置数据,第一神经网络用于根据检测图像回归目标物体在非检测图像中的位置,非检测图像为检测图像的在后图像;第一确定模块704,用于根据目标物体在检测图像中的位置数据和目标物体在非检测图像中的位置数据确定目标物体的轨迹。
可选地,本申请实施例装置还包括:选择模块702,用于根据目标物体的类别选择与目标物体的类别对应的第一神经网络。
可选地,本申请实施例装置还包括:第一训练模块701,用于根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,非检测样本图像为检测样本图像的在后图像。
可选地,第一训练模块701,包括:第四获取子模块7013,用于通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据;第一确定子模块7014,用于根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据;第一训练子模块7015,用于根据第一位置偏移数据和第二位置偏移数据,训练第一神经网络,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量。
可选地,本申请实施例装置还包括:第一裁剪子模块7012,用于根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像,其中,第三区域图像与第四区域图像包含目标物体;第四获取子模块7013,包括:获取单元70131,用于通过待训练的第一神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据。
可选地,第一神经网络具有卷积层、连接在卷积层末端的拼接层,以及连接在拼接层末端的全连接层,其中,获取单元70131,用于:通过卷积层,对第三区域图像和第四区域图像进行特征提取,获得第三区域图像和第四区域图像中目标物体的位置特征向量;通过拼接层,对第三区域图像和第四区域图像中目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过全连接层,对拼接后的位置特征向量进行映射操作,获得目标物体在第四区域图像中的位置数据。
可选地,本申请实施例装置还包括:第二确定子模块7011,用于根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置标定数据确定第一位置偏移数据。
可选地,位置数据包括目标物体的限位框的长度、宽度以及中心位置坐标。
需要说明的是,对于本申请实施例提供的目标跟踪装置还涉及的具体细节已在本申请实施例提供的目标跟踪方法中作了详细的说明,在此不在赘述。
基于相同的技术构思,图8是根据本申请实施例神经网络训练装置的一个实施例的结构示意图。本实施例装置可用以执行本申请实施例上述神经网络训练方法的任一实施例。
参照图8,该神经网络的训练装置包括第四获取模块801、第三确定模块802和第二训练模块803。
第四获取模块801,用于通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,非检测样本图像为检测样本图像的在后图像。
第三确定模块802,用于根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据。
第二训练模块803,用于根据第一位置偏移数据和第二位置偏移数据,训练神经网络,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量。
通过本实施例提供的神经网络的训练装置,通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,并根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据,再根据目标 物体在检测样本图像和非检测样本图像之间的标准位置偏移量和所述第二位置偏移数据,训练所述神经网络,与现有技术相比,使得训练得到的神经网络能够基于视频帧序列中在前视频图像的目标物体位置回归出视频帧序列中在后视频图像的目标物体位置。
基于相同的技术构思,图9是根据本申请实施例神经网络的训练装置的另一个实施例的结构示意图。本实施例装置可用以执行本申请实施例上述神经网络训练方法的任一实施例。
参照图9,该神经网络的训练装置包括第四获取模块903、第三确定模块904和第二训练模块905。其中,第四获取模块903,用于通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取目标物体在非检测样本图像中的位置数据,非检测样本图像为检测样本图像的在后图像;第三确定模块904,用于根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置数据,确定目标物体在检测样本图像和非检测样本图像之间的第二位置偏移数据;第二训练模块905,用于根据第一位置偏移数据和第二位置偏移数据,训练神经网络,第一位置偏移数据为目标物体在检测样本图像和非检测样本图像之间的标准位置偏移量。
可选地,本申请实施例装置还包括:第二裁剪模块902,用于根据检测样本图像中目标物体的位置数据,分别对检测样本图像和非检测样本图像进行裁剪,获得与检测样本图像对应的第三区域图像以及与非检测样本图像对应的第四区域图像,其中,第三区域图像与第四区域图像包含目标物体;第四获取模块903,包括:第五获取子模块9031,用于通过待训练的神经网络,根据含有目标物体的第三区域图像和第四区域图像,获取目标物体在第四区域图像中的位置数据。
可选地,待训练的神经网络包括卷积层、连接在卷积层末端的拼接层,以及连接在拼接层末端的全连接层,其中,通过第五获取子模块9031,用于:通过卷积层,对第三区域图像和第四区域图像进行特征提取,获得第三区域图像和第四区域图像中目标物体的位置特征向量;通过拼接层,对第三区域图像和第四区域图像中目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;通过全连接层,对拼接后的位置特征向量进行映射操作,获得目标物体在第四区域图像中的位置数据。
可选地,本申请实施例装置还包括:第四确定模块901,用于根据检测样本图像中目标物体的位置数据和非检测样本图像中目标物体的位置标定数据确定第一位置偏移数据。
可选地,位置数据包括目标物体的限位框的长度、宽度以及中心位置坐标。
需要说明的是,对于本申请实施例提供的神经网络的训练装置还涉及的具体细节已在本申请实施例提供的神经网络的训练方法中作了详细的说明,在此不在赘述。
根据本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述程序指令被处理器执行时实现本申请实施例所述的目标跟踪方法的步骤,或者实现本申请实施例所述的神经网络训练方法的步骤。
根据本申请实施例还提供了一种计算机程序产品,其包括有计算机程序指令,其中,所述程序指令被处理器执行时实现本申请实施例所述的目标跟踪方法的步骤,或者实现本申请实施例所述的神经网络的训练方法的步骤。
根据本申请实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请实施例所述的对象属性检测方法中各步骤的指令;或者
当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请实施例所述的神经网络训练方法中各步骤的指令。
根据本申请实施例还提供了一种电子设备,包括:处理器和存储器;
该存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请实施例所述的目标跟踪方法对应的操作;或者,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请实施例所述的神经网络训练方法对应的操作。
根据本申请实施例还提供了一种电子设备,包括:处理器和本申请实施例第三方面所述的目标跟踪装置;在处理器运行所述目标跟踪装置时,本申请实施例所述的目标跟踪装置中的模块被运行;或者
处理器和本申请实施例第四方面所述的神经网络训练装置;在处理器运行所述神经网络的训练装置时,本申请实施例所述的神经网络训练装置中的模块被运行。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图10,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备1000的结构示意图。如图10所示,电子设备1000包括一个或多个第一处理器、第一通信元件等,所述一个或多个第一处理器例如:一个或多个中央处理单元(CPU)1001,和/或一个或多个图像处理器(GPU)1013等,第一处理器可以根据存储在只读存储器(ROM)1002中的可执行指令或者从存储部分1008加载到随机访问存储器(RAM)1003中的可执行指令而执行各种适当的动作和处理。本实施例中,第一只读存储器1002和随机访问存储器1003统称为第一存储器。第一通信元件包括通信组件1012和/或通信接口1009。其中,通信组件 1012可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,通信接口1009包括诸如LAN卡、调制解调器等的网络接口卡的通信接口,通信接口1009经由诸如因特网的网络执行通信处理。
第一处理器可与只读存储器1002和/或随机访问存储器1003中通信以执行可执行指令,通过第一通信总线1004与通信组件1012相连、并经通信组件1012与其他目标设备通信,从而完成本申请实施例提供的任一项目标跟踪方法对应的操作,例如,通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。或者,完成本申请实施例提供的任一项神经网络的训练方法对应的操作,例如,通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
此外,在RAM 1003中,还可存储有装置操作所需的各种程序和数据。CPU1001或GPU1013、ROM1002以及RAM1003通过第一通信总线1004彼此相连。在有RAM1003的情况下,ROM1002为可选模块。RAM1003存储可执行指令,或在运行时向ROM1002中写入可执行指令,可执行指令使第一处理器执行上述通信方法对应的操作。输入/输出(I/O)接口1005也连接至第一通信总线1004。通信组件1012可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在通信总线链接上。
以下部件连接至I/O接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信接口1009。驱动器1010也根据需要连接至I/O接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
需要说明的,如图10所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图10的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU1013和CPU1001可分离设置或者可将GPU1013集成在CPU1001上,通信元件可分离设置,也可集成设置在CPU1001或GPU1013上,等等。这些可替换的实施方式均落入本申请的保护范围。
特别地,根据本申请实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。或者,例如,通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。在这些实施例中,该计算机程序可以通过通信元件从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被第一处理器执行时,执行本申请实施例的方法中限定的上述功能。
需要指出,根据实施的需要,可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤,也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤,以实现本申请实施例的目的。
上述根据本申请实施例的方法可在硬件、固件中实现,或者被实现为可存储在记录介质(诸如CD ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码,或者被实现通过网络下载的原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码,从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解,计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如,RAM、ROM、闪存等),当所述软件或计算机代码被计算机、处理器或 硬件访问且执行时,实现在此描述的处理方法。此外,当通用计算机访问用于实现在此示出的处理的代码时,代码的执行将通用计算机转换为用于执行在此示出的处理的专用计算机。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (40)

  1. 一种目标跟踪方法,其特征在于,所述方法包括:
    通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;
    根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。
  2. 根据权利要求1所述的方法,其特征在于,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,包括:
    通过所述第一神经网络,根据所述视频帧序列中的检测图像和在所述检测图像之后的第一非检测图像,获取所述目标物体在所述第一非检测图像中的位置数据。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    通过所述第一神经网络,根据所述视频帧序列中的第一非检测图像和在所述第一非检测图像之后的第二非检测图像,获取所述目标物体在所述第二非检测图像中的位置数据。
  4. 根据权利要求1~3中任意一项权利要求所述的方法,其特征在于,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:
    根据所述检测图像中目标物体的位置数据,分别对所述检测图像和所述非检测图像进行裁剪,获得与所述检测图像对应的第一区域图像以及与所述非检测图像对应的第二区域图像,其中,所述第一区域图像与所述第二区域图像包含所述目标物体;
    所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,包括:
    通过所述第一神经网络,根据含有所述目标物体的第一区域图像和第二区域图像,获取所述目标物体在所述第二区域图像中的位置数据。
  5. 根据权利要求1~4中任意一项权利要求所述的方法,其特征在于,所述方法还包括:
    按照时间顺序,将所述视频帧序列划分为多组视频帧,每组所述视频帧包括至少一帧视频图像;
    针对所述多组视频帧,从首帧视频图像中获取所述目标物体的位置数据,并通过所述第一神经网络,获取所述首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组所述视频帧中的至少一帧视频图像的目标物体的位置数据;
    根据所述多组视频帧中的至少一帧视频图像的目标物体的位置数据确定所述目标物体的轨迹。
  6. 根据权利要求5所述的方法,其特征在于,所述从首帧视频图像中获取所述目标物体的位置数据,包括:
    通过用于目标位置检测的第二神经网络,从首帧视频图像中获取所述目标物体的位置数据,所述第二神经网络包括快速卷积神经网络。
  7. 根据权利要求1~6中任意一项权利要求所述的方法,其特征在于,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:
    根据所述目标物体的类别确定与所述目标物体的类别对应的第一神经网络。
  8. 根据权利要求1~7中任意一项权利要求所述的方法,其特征在于,所述通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据之前,所述方法还包括:
    根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,所述非检测样本图像为所述检测样本图像的在后图像。
  9. 根据权利要求8所述的方法,其特征在于,所述根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,包括:
    通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据;
    根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;
    根据第一位置偏移数据和所述第二位置偏移数据,训练所述第一神经网络,所述第一位置偏移数据为所述目标物体 在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
  10. 根据权利要求9所述的方法,其特征在于,所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:
    根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;
    所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,包括:
    通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
  11. 根据权利要求10所述的方法,其特征在于,所述待训练的第一神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,
    所述通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据,包括:
    通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;
    通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;
    通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
  12. 根据权利要求9~11中任意一项权利要求所述的方法,其特征在于,所述通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:
    根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
  13. 根据权利要求1~12中任意一项权利要求所述的方法,其特征在于,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
  14. 一种神经网络训练方法,其特征在于,所述方法包括:
    通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;
    根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;
    根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
  15. 根据权利要求14所述的方法,其特征在于,所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:
    根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;
    所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,包括:
    通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
  16. 根据权利要求15所述的方法,其特征在于,所述待训练的神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,
    所述通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所 述第四区域图像中的位置数据,包括:
    通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;
    通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;
    通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
  17. 根据权利要求14~16中任意一项权利要求所述的方法,其特征在于,所述通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据之前,所述方法还包括:
    根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
  18. 根据权利要求14~17中任意一项权利要求所述的方法,其特征在于,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
  19. 一种目标跟踪装置,其特征在于,所述装置包括:
    第一获取模块,用于通过第一神经网络,根据含有目标物体的视频帧序列中的检测图像和非检测图像,获取所述目标物体在所述非检测图像中的位置数据,所述第一神经网络用于根据所述检测图像回归所述目标物体在所述非检测图像中的位置,所述非检测图像为所述检测图像的在后图像;
    第一确定模块,用于根据所述目标物体在所述检测图像中的位置数据和所述目标物体在所述非检测图像中的位置数据确定所述目标物体的轨迹。
  20. 根据权利要求19所述的装置,其特征在于,所述第一获取模块,包括:
    第一获取子模块,用于通过所述第一神经网络,根据所述视频帧序列中的检测图像和在所述检测图像之后的第一非检测图像,获取所述目标物体在所述第一非检测图像中的位置数据。
  21. 根据权利要求20所述的装置,其特征在于,所述装置还包括:
    第二获取模块,用于通过所述第一神经网络,根据所述视频帧序列中的第一非检测图像和在所述第一非检测图像之后的第二非检测图像,获取所述目标物体在所述第二非检测图像中的位置数据。
  22. 根据权利要求19~21中任意一项权利要求所述的装置,其特征在于,所述第一获取模块之前,所述装置还包括:
    第一裁剪模块,用于根据所述检测图像中目标物体的位置数据,分别对所述检测图像和所述非检测图像进行裁剪,获得与所述检测图像对应的第一区域图像以及与所述非检测图像对应的第二区域图像,其中,所述第一区域图像与所述第二区域图像包含所述目标物体;
    所述第一获取模块,包括:
    第二获取子模块,用于通过所述第一神经网络,根据含有所述目标物体的第一区域图像和第二区域图像,获取所述目标物体在所述第二区域图像中的位置数据。
  23. 根据权利要求19~22中任意一项权利要求所述的装置,其特征在于,所述装置还包括:
    划分模块,用于按照时间顺序,将所述视频帧序列划分为多组视频帧,每组所述视频帧包括至少一帧视频图像;
    第三获取模块,用于针对所述多组视频帧,从首帧视频图像中获取所述目标物体的位置数据,并通过所述第一神经网络,获取所述首帧视频图像后续的视频图像中目标物体的位置数据,从而获得该组所述视频帧中的至少一帧视频图像的目标物体的位置数据;
    第二确定模块,用于根据所述多组视频帧中的至少一帧视频图像的目标物体的位置数据确定所述目标物体的轨迹。
  24. 根据权利要求23所述的装置,其特征在于,所述第三获取模块,包括:
    第三获取子模块,用于通过用于目标位置检测的第二神经网络,从首帧视频图像中获取所述目标物体的位置数据,所述第二神经网络包括快速卷积神经网络。
  25. 根据权利要求19~24中任意一项权利要求所述的装置,其特征在于,所述第一获取模块之前,所述装置还包括:
    选择模块,用于根据所述目标物体的类别确定与所述目标物体的类别对应的第一神经网络。
  26. 根据权利要求19~25中任意一项权利要求所述的装置,其特征在于,所述第一获取模块之前,所述装置还包括:
    第一训练模块,用于根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像训练所述第一神经网络,所述非检测样本图像为所述检测样本图像的在后图像。
  27. 根据权利要求26所述的装置,其特征在于,所述第一训练模块,包括:
    第四获取子模块,用于通过待训练的第一神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据;
    第一确定子模块,用于根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;
    第一训练子模块,用于根据第一位置偏移数据和所述第二位置偏移数据,训练所述第一神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
  28. 根据权利要求27所述的装置,其特征在于,所述第四获取子模块之前,所述装置还包括:
    第一裁剪子模块,用于根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;
    所述第四获取子模块,包括:
    获取单元,用于通过所述待训练的第一神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
  29. 根据权利要求28所述的装置,其特征在于,所述待训练的第一神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,
    所述获取单元,具体用于:
    通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量;
    通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;
    通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
  30. 根据权利要求27~29中任意一项权利要求所述的装置,其特征在于,所述第四获取子模块之前,所述装置还包括:
    第二确定子模块,用于根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
  31. 根据权利要求19~30中任意一项权利要求所述的装置,其特征在于,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
  32. 一种神经网络训练装置,其特征在于,所述装置包括:
    第四获取模块,用于通过待训练的神经网络,根据含有目标物体的视频帧样本序列中的检测样本图像和非检测样本图像,获取所述目标物体在所述非检测样本图像中的位置数据,所述非检测样本图像为所述检测样本图像的在后图像;
    第三确定模块,用于根据所述检测样本图像中所述目标物体的位置数据和所述非检测样本图像中所述目标物体的位置数据,确定所述目标物体在所述检测样本图像和所述非检测样本图像之间的第二位置偏移数据;
    第二训练模块,用于根据第一位置偏移数据和所述第二位置偏移数据,训练所述神经网络,所述第一位置偏移数据为所述目标物体在所述检测样本图像和所述非检测样本图像之间的标准位置偏移量。
  33. 根据权利要求32所述的装置,其特征在于,所述第四获取模块之前,所述装置还包括:
    第二裁剪模块,用于根据所述检测样本图像中目标物体的位置数据,分别对所述检测样本图像和所述非检测样本图像进行裁剪,获得与所述检测样本图像对应的第三区域图像以及与所述非检测样本图像对应的第四区域图像,其中,所述第三区域图像与所述第四区域图像包含所述目标物体;
    所述第四获取模块,包括:
    第五获取子模块,用于通过所述待训练的神经网络,根据含有所述目标物体的第三区域图像和第四区域图像,获取所述目标物体在所述第四区域图像中的位置数据。
  34. 根据权利要求33所述的装置,其特征在于,所述待训练的神经网络包括卷积层、连接在所述卷积层末端的拼接层,以及连接在所述拼接层末端的全连接层,
    所述第五获取子模块,具体用于:
    通过所述卷积层,对所述第三区域图像和所述第四区域图像进行特征提取,获得所述第三区域图像和所述第四区域 图像中所述目标物体的位置特征向量;
    通过所述拼接层,对所述第三区域图像和所述第四区域图像中所述目标物体的位置特征向量进行拼接,获得拼接后的位置特征向量;
    通过所述全连接层,对所述拼接后的位置特征向量进行映射操作,获得所述目标物体在所述第四区域图像中的位置数据。
  35. 根据权利要求32~34中任意一项权利要求所述的装置,其特征在于,所述第四获取模块之前,所述装置还包括:
    第四确定模块,用于根据所述检测样本图像中目标物体的位置数据和所述非检测样本图像中目标物体的位置标定数据确定所述第一位置偏移数据。
  36. 根据权利要求32~35中任意一项权利要求所述的装置,其特征在于,所述位置数据包括所述目标物体的限位框的长度、宽度以及中心位置坐标。
  37. 一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述程序指令被处理器执行时实现权利要求1~13中任意一项权利要求所述的目标跟踪方法,或者实现权利要求14~18中任意一项权利要求所述的神经网络训练方法。
  38. 一种计算机程序产品,其包括有计算机程序指令,其中,所述程序指令被处理器执行时实现权利要求1~13中任意一项权利要求所述的目标跟踪方法,或者实现权利要求14~18中任意一项权利要求所述的神经网络训练方法。
  39. 一种电子设备,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1~13中任意一项权利要求所述的目标跟踪方法对应的操作;或者,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求14~18中任意一项权利要求所述的神经网络训练方法对应的操作。
  40. 一种电子设备,包括:处理器和权利要求19~31中任意一项所述的目标跟踪装置;在处理器运行所述目标跟踪装置时,权利要求19~31中任意一项所述的目标跟踪装置中的模块被运行;或者
    处理器和权利要求32~36中任意一项所述的神经网络训练装置;在处理器运行所述神经网络的训练装置时,权利要求32~36中任意一项所述的神经网络训练装置中的模块被运行。
PCT/CN2018/110433 2017-10-27 2018-10-16 目标跟踪及神经网络训练方法、装置、存储介质、电子设备 WO2019080747A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711031418.9 2017-10-27
CN201711031418.9A CN108230358A (zh) 2017-10-27 2017-10-27 目标跟踪及神经网络训练方法、装置、存储介质、电子设备

Publications (1)

Publication Number Publication Date
WO2019080747A1 true WO2019080747A1 (zh) 2019-05-02

Family

ID=62654718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/110433 WO2019080747A1 (zh) 2017-10-27 2018-10-16 目标跟踪及神经网络训练方法、装置、存储介质、电子设备

Country Status (2)

Country Link
CN (1) CN108230358A (zh)
WO (1) WO2019080747A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882579A (zh) * 2020-07-03 2020-11-03 湖南爱米家智能科技有限公司 基于深度学习和目标跟踪的大输液异物检测方法、系统、介质及设备

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230358A (zh) * 2017-10-27 2018-06-29 北京市商汤科技开发有限公司 目标跟踪及神经网络训练方法、装置、存储介质、电子设备
US11430312B2 (en) * 2018-07-05 2022-08-30 Movidius Limited Video surveillance with neural networks
CN109376594A (zh) 2018-09-11 2019-02-22 百度在线网络技术(北京)有限公司 基于自动驾驶车辆的视觉感知方法、装置、设备以及介质
CN109242801B (zh) * 2018-09-26 2021-07-02 北京字节跳动网络技术有限公司 图像处理方法和装置
WO2021072699A1 (en) * 2019-10-17 2021-04-22 Shenzhen Malong Technologies Co., Ltd. Irregular scan detection for retail systems
CN110660102B (zh) * 2019-06-17 2020-10-27 腾讯科技(深圳)有限公司 基于人工智能的说话人识别方法及装置、系统
CN110619600B (zh) * 2019-09-17 2023-12-08 南京旷云科技有限公司 神经网络模型训练方法及装置、存储介质及电子设备
CN110717593B (zh) * 2019-10-14 2022-04-19 上海商汤临港智能科技有限公司 神经网络训练、移动信息测量、关键帧检测的方法及装置
CN112102615B (zh) * 2020-08-28 2022-03-25 浙江大华技术股份有限公司 交通事故检测方法、电子设备及存储介质
CN112137591B (zh) * 2020-10-12 2021-07-23 平安科技(深圳)有限公司 基于视频流的目标物位置检测方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7277558B2 (en) * 2001-11-27 2007-10-02 Lockheed Martin Corporation Method and system for estimating the position of moving objects in images
CN106326837A (zh) * 2016-08-09 2017-01-11 北京旷视科技有限公司 对象追踪方法和装置
CN107066922A (zh) * 2016-12-30 2017-08-18 西安天和防务技术股份有限公司 用于国土资源监控的目标追踪方法
CN108230358A (zh) * 2017-10-27 2018-06-29 北京市商汤科技开发有限公司 目标跟踪及神经网络训练方法、装置、存储介质、电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7277558B2 (en) * 2001-11-27 2007-10-02 Lockheed Martin Corporation Method and system for estimating the position of moving objects in images
CN106326837A (zh) * 2016-08-09 2017-01-11 北京旷视科技有限公司 对象追踪方法和装置
CN107066922A (zh) * 2016-12-30 2017-08-18 西安天和防务技术股份有限公司 用于国土资源监控的目标追踪方法
CN108230358A (zh) * 2017-10-27 2018-06-29 北京市商汤科技开发有限公司 目标跟踪及神经网络训练方法、装置、存储介质、电子设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882579A (zh) * 2020-07-03 2020-11-03 湖南爱米家智能科技有限公司 基于深度学习和目标跟踪的大输液异物检测方法、系统、介质及设备

Also Published As

Publication number Publication date
CN108230358A (zh) 2018-06-29

Similar Documents

Publication Publication Date Title
WO2019080747A1 (zh) 目标跟踪及神经网络训练方法、装置、存储介质、电子设备
Jeong et al. Complex urban dataset with multi-level sensors from highly diverse urban environments
US20210021761A1 (en) Connecting And Using Building Data Acquired From Mobile Devices
US11321593B2 (en) Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device
US11328401B2 (en) Stationary object detecting method, apparatus and electronic device
US10249047B2 (en) System and method for detecting and tracking multiple moving targets based on wide-area motion imagery
CN109887003B (zh) 一种用于进行三维跟踪初始化的方法与设备
KR102047031B1 (ko) 딥스테레오: 실세계 이미지로부터 새로운 뷰들을 예측하는 러닝
US10755425B2 (en) Automatic tuning of image signal processors using reference images in image processing environments
US10249089B2 (en) System and method for representing remote participants to a meeting
US20200363216A1 (en) Localizing transportation requests utilizing an image based transportation request interface
US9213899B2 (en) Context-aware tracking of a video object using a sparse representation framework
CN111127563A (zh) 联合标定方法、装置、电子设备及存储介质
JP7435130B2 (ja) 屋内位置特定のための方法、サーバ、及びプログラム
EP3206163B1 (en) Image processing method, mobile device and method for generating a video image database
AU2018200055B2 (en) Show personalized preview for 360 or panoramic images based on search query to enhance customer experience by providing them most relevant view of images
US10885655B2 (en) Systems and methods for object measurement
US20190045248A1 (en) Super resolution identifier mechanism
US11100325B2 (en) Photographic results by composition analysis using deep learning neural networks
CN113378605B (zh) 多源信息融合方法及装置、电子设备和存储介质
WO2022143366A1 (zh) 图像处理方法、装置、电子设备、介质及计算机程序产品
CN110462337A (zh) 使用传感器可读标签自动生成地图地标
US20230087261A1 (en) Three-dimensional target estimation using keypoints
CN111832579B (zh) 地图兴趣点数据处理方法、装置、电子设备以及可读介质
CN112329762A (zh) 图像处理方法、模型训练方法、装置、计算机设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18870840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18870840

Country of ref document: EP

Kind code of ref document: A1