CN109598743A

CN109598743A - Pedestrian target tracking, device and equipment

Info

Publication number: CN109598743A
Application number: CN201811386432.5A
Authority: CN
Inventors: 车广富; 董玉新; 安山
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2019-04-09
Anticipated expiration: 2038-11-20
Also published as: CN109598743B

Abstract

The embodiment of the present invention provides a kind of pedestrian target tracking, device and equipment.This method comprises: obtaining continuous videos image captured by multiple cameras, camera described in every road presets the best shooting visual field；Pedestrian target in each camera continuous videos image collected is determined using the convolutional neural networks model of target pedestrian detection；Matched jamming is carried out to the pedestrian target in multiple cameras continuous videos image collected.The method of the embodiment of the present invention is realized across accurately tracking under camera to pedestrian target.

Description

Pedestrian target tracking, device and equipment

Technical field

The present embodiments relate to technical field of computer vision more particularly to a kind of pedestrian target trackings, device And equipment.

Background technique

With the fast development of the technical fields such as computer vision, machine learning, big data analysis and artificial intelligence, respectively Kind based on computer vision intelligent use, product and service etc. also obtain quick development, for example, unmanned shop, nobody Dining room, intelligent security guard etc. bring convenience for people's lives.

It is particularly important to the recognition and tracking of target in various applications based on computer vision.It is answered with unmanned shop For, in order to accurately carry out Products Show to user, and the buying behavior of user, consumption habit are accurately analyzed Deng needing in real time accurately to track the target user in unmanned shop.In unmanned shop, in order to avoid vision is blind Area generallys use multiple cameras and obtains video datas, inevitably there is that the visual field is overlapping to ask between adjacent camera Topic.When target user enters visual field overlapping region, need to re-start target user recognition and tracking, and each camera Visual field coordinate system is different, and appearance and form easily change, and existing method is unable to satisfy the target following demand across camera.

Summary of the invention

The embodiment of the present invention provides a kind of pedestrian target tracking, device and equipment, for realizing the row across camera People's target following.

In a first aspect, the embodiment of the present invention provides a kind of pedestrian target tracking, comprising:

Continuous videos image captured by multiple cameras is obtained, every road camera presets the best shooting visual field；

Each camera continuous videos image collected is determined using the convolutional neural networks model of target pedestrian detection In pedestrian target；

Matched jamming is carried out to the pedestrian target in multiple cameras continuous videos image collected.

In one possible implementation, before obtaining continuous videos image captured by multiple cameras, the side Method further include:

There is overlapping camera progress visual field division to the visual field in multiple cameras to determine the best of each camera The visual field is shot, so that same pedestrian target is only located in the best shooting visual field an of camera.

In one possible implementation, each camera shooting is determined using the convolutional neural networks model of target pedestrian detection Before pedestrian target in continuous videos image collected, the method also includes:

Several video image samples are obtained, include position of the pedestrian target in video image sample in video image sample Information；

Pedestrian target head detection model is trained according to several video image samples, obtains neural network model.

In one possible implementation, pedestrian target head detection model is carried out according to several video image samples Training, comprising:

Pedestrian target head detection model is trained using transfer learning algorithm.

In one possible implementation, each camera shooting is determined using the convolutional neural networks model of target pedestrian detection Pedestrian target in continuous videos image collected, specifically includes:

The position of the pedestrian head in each video image is determined using neural network model；

The position of pedestrian target is determined according to the corresponding best shooting visual field in the position of pedestrian head and video image.

In one possible implementation, neural network model includes global neural network model and Local neural network Model；

Global neural network model in the global scope of video image for determining the position of pedestrian head；

Local neural network model is for determining the position of pedestrian head in the preset range of the position determined by previous frame It sets.

In one possible implementation, the pedestrian head in each video image is determined using neural network model Position, comprising:

The first position set of the pedestrian head in each video image is determined using global neural network model；

The second position set of the pedestrian head in each video image is determined using Local neural network model；

The position of pedestrian target, packet are determined according to the corresponding best shooting visual field in the position of pedestrian head and video image It includes:

The view is determined according to the first position of each video image set and the corresponding best shooting visual field of video image The first object location sets of frequency image；

The view is determined according to the second position of each video image set and the corresponding best shooting visual field of video image Gather second target position of frequency image.

In one possible implementation, to the pedestrian target in multiple cameras continuous videos image collected into Row matched jamming, comprising:

Using Hungary Algorithm to the pedestrian target in each camera continuous videos image collected matched with Track.

In one possible implementation, using Hungary Algorithm to each camera continuous videos image collected In pedestrian target carry out matched jamming, specifically include:

Using Hungary Algorithm, according to the first object location sets and second of each camera video image collected The similarity of position frame in the set of target position carries out matched jamming to pedestrian target；

Wherein, the similarity of position frame determines according to the following formula:

Wherein, O_iIndicate that i-th of position frame in first object location sets, N indicate position in first object location sets The sum of frame, O_jIndicate that j-th of position frame in the set of the second target position, M indicate position frame in the set of the second target position Sum.

In one possible implementation, to the pedestrian target in multiple cameras continuous videos image collected into Row matched jamming, further includes:

If the position frame not matched in first object location sets meets the first preset condition, it is determined that the position frame pair The pedestrian answered has just enter into the corresponding best shooting visual field of first object location sets；

If the position frame in the set of the second target position meets the second preset condition, it is determined that the corresponding pedestrian of the position frame It leaves the second target position and gathers the corresponding best shooting visual field；

First preset condition and the second preset condition according to pedestrian out of and into the best shooting visual field at the time of and track Spacing determines.

Second aspect, the embodiment of the present invention provide a kind of pedestrian target tracking device, comprising:

Module is obtained, for obtaining continuous videos image captured by multiple cameras, every road camera is preset most The good shooting visual field；

Determining module, for determining that each camera is collected using the convolutional neural networks model of target pedestrian detection Pedestrian target in continuous videos image；

Tracking module, for the pedestrian target in multiple cameras continuous videos image collected matched with Track.

The third aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising:

At least one processor and memory；

Memory stores computer executed instructions；

At least one processor executes the computer executed instructions of memory storage, so that at least one processor executes such as The described in any item pedestrian target trackings of first aspect.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Computer executed instructions are stored in matter, for realizing any one of such as first aspect when computer executed instructions are executed by processor The pedestrian target tracking.

Pedestrian target tracking, device and equipment provided in an embodiment of the present invention are clapped by obtaining multiple cameras The continuous videos image taken the photograph, every road camera preset the best shooting visual field, and using the convolutional Neural of target pedestrian detection Network model determines the pedestrian target in each camera continuous videos image collected, to multiple cameras company collected Pedestrian target in continuous video image carries out matched jamming, realizes across accurately tracking under camera to pedestrian target.Pass through The best shooting visual field is preset for every road camera, the visual field for avoiding each road camera overlaps, therefore avoids pair Re-recognizing and tracking for overlapping region pedestrian target, reduces calculating cost, improves tracking velocity.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is the flow chart of one embodiment of pedestrian target tracking provided by the invention；

Fig. 2 is the schematic diagram in the best shooting visual field that one embodiment of the invention provides；

Fig. 3 is the schematic diagram for the video image sample that one embodiment of the invention provides；

Fig. 4 is that the Local neural network model sample that one embodiment of the invention provides makes schematic diagram；

Fig. 5 is the structural schematic diagram of one embodiment of pedestrian target tracking device provided by the invention；

Fig. 6 is the structural schematic diagram of one embodiment of electronic equipment provided by the invention.

Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate idea of the invention.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.

Term " includes " and " having " and their any deformations in description and claims of this specification, it is intended that It is to cover and non-exclusive includes.Such as the process, method, system, product or equipment for containing a series of steps or units do not have It is defined in listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also wrap Include the other step or units intrinsic for these process, methods, product or equipment.

" first " and " second " in the present invention only plays mark action, be not understood to indicate or imply ordinal relation, Relative importance or the quantity for implicitly indicating indicated technical characteristic." multiple " refer to two or more." and/ Or ", the incidence relation of affiliated partner is described, indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individually depositing In A, A and B, these three situations of individualism B are existed simultaneously.It is a kind of "or" that character "/", which typicallys represent forward-backward correlation object, Relationship.

" one embodiment " or " embodiment " mentioned in the whole text in specification of the invention means related with embodiment A particular feature, structure, or characteristic include at least one embodiment of the application.Therefore, occur everywhere in the whole instruction " in one embodiment " or " in one embodiment " not necessarily refer to identical embodiment.It should be noted that not rushing In the case where prominent, the feature in embodiment and embodiment in the present invention be can be combined with each other.

Fig. 1 is the flow chart of one embodiment of pedestrian target tracking provided by the invention.Method provided in this embodiment It can be executed, can also be executed by network side equipment such as server by terminal device.As shown in Figure 1, row provided in this embodiment People's method for tracking target may include:

S101, continuous videos image captured by multiple cameras is obtained, every road camera presets best shooting view It is wild.

Continuous videos image in the present embodiment can be more by need to install in the scene for carrying out pedestrian target tracking A camera obtains, and multiple cameras can carry out the video acquisition of non-blind area to scene.For example, the view in order to realize non-blind area Frequency acquires, and can carry out video acquisition in multiple and different positions installation camera in unmanned shop, intelligent security guard can be in monitoring area In multiple and different positions installation camera carry out video monitoring.

In order to avoid occurring the overlapping problem in the visual field between adjacent camera, can be preset most for every road camera The good shooting visual field.The best shooting visual field can be formed according to the installation site of camera and the installation site of multiple cameras Network topology structure determine.It is overlapping that there is no the visuals field between the best shooting visual field of each road camera.

S102, each camera continuous videos collected are determined using the convolutional neural networks model of target pedestrian detection Pedestrian target in image.

Each frame of each camera continuous videos image collected is directed in the present embodiment, using for detecting pedestrian Convolutional neural networks model determine pedestrian target.Convolutional neural networks model in the present embodiment is needed previously according to applied field Scape is trained.It for example,, can be according to header information when to overlook visual angle if camera is installed on the eminence in scene It determines pedestrian target, then convolutional neural networks model can be trained according to header information in video is overlooked；If camera The medium position being installed in scene can determine pedestrian target according to human figure and facial information, then when to look squarely visual angle Convolutional neural networks model can be trained according to human figure and facial information in head-up video.

The pedestrian target in each camera continuous videos image collected in the present embodiment can be the camera The best shooting visual field in pedestrian target, it can determine pedestrian target only for the best shooting visual field of the camera.This Embodiment for pedestrian target quantity with no restrictions, can be single row people target, or multiple pedestrian targets.

S103, matched jamming is carried out to the pedestrian target in multiple cameras continuous videos image collected.

Matched jamming is carried out to the pedestrian target in multiple cameras continuous videos image collected in the present embodiment, it can To match by most preferably shooting the pedestrian target in the visual field to each camera, the pedestrian target across camera is determined, And it is tracked.

Pedestrian target tracking provided in this embodiment, by obtaining continuous videos figure captured by multiple cameras Picture, every road camera presets the best shooting visual field, and is determined often using the convolutional neural networks model of target pedestrian detection Pedestrian target in a camera continuous videos image collected, in multiple cameras continuous videos image collected Pedestrian target carries out matched jamming, realizes across accurately tracking under camera to pedestrian target.By for every road camera it is pre- The best shooting visual field of first setting, the visual field for avoiding each road camera overlaps, therefore avoids to overlapping region pedestrian's mesh Target is re-recognized and is tracked, and is reduced calculating cost, is improved tracking velocity.

In some embodiments, on the basis of the above embodiments, method provided in this embodiment is obtaining multiple camera shootings It can also include: there is overlapping camera to the visual field in multiple cameras to carry out before continuous videos image captured by head The visual field divides the best shooting visual field to determine each camera, so that same pedestrian target is only located at a camera most In the good shooting visual field.

Net can be formed by according to the installation site of camera and the installation site of multiple cameras in the present embodiment Network topological structure determines the best shooting visual field of each camera.By taking the application in unmanned shop as an example, each camera can be with It is set as overlooking visual angle.Fig. 2 is the schematic diagram in the best shooting visual field that one embodiment of the invention provides.As shown in Fig. 2, in figure Solid-line rectangle frame can indicate the indoor sectional view in unmanned shop, be mounted with that number is respectively 1,2,3 and 4 in the unmanned shop Four cameras, specific location is as shown in Fig. 2, dotted line is the boundary line for dividing four cameras and most preferably shooting the visual field in figure. The enclosed region being made of dotted line and solid line is the best shooting visual field of each camera.As shown in Fig. 2, four cameras are most The good shooting visual field is mutually indepedent, overlapping region is not present, therefore, same pedestrian target may be positioned only at the best of a camera It shoots in the visual field.

It, in practical applications can be with it should be noted that Fig. 2 illustrates only a kind of best division mode in shooting visual field The concrete conditions such as the network topology structure according to the shape in place, the quantity of camera and camera to the best shooting visual field into The different division of row.

In some embodiments, on the basis of the above embodiments, method provided in this embodiment is using target pedestrian Before the convolutional neural networks model of detection determines the pedestrian target in each camera continuous videos image collected, may be used also It include position of the pedestrian target in video image sample in video image sample to include: to obtain several video image samples Information；Pedestrian target head detection model is trained according to several video image samples, obtains neural network model.

Video image sample in the present embodiment is that the video image sample of the location information of pedestrian target has been marked. For overlooking visual angle scene, the location information of pedestrian target can be identified by the head position of pedestrian.

Optionally, location information of the pedestrian target in video image sample can indicate that position frame can by position frame To select the shapes such as rectangle, circle, ellipse.

Fig. 3 is the schematic diagram for the video image sample that one embodiment of the invention provides.As shown in figure 3, at the vertical view visual angle In scene, the location information of the pedestrian is indicated with the boundary rectangle frame of the head position of pedestrian.Identify different pedestrian positions The size of the boundary rectangle frame of information can be different.

It can according to the detailed process that video image sample is trained pedestrian target head detection model in the present embodiment To be, using video image frame as the input of pedestrian target head detection model, using the location information of label as desired output, Model can be carried out according to the Position Design loss function of location information and pedestrian target head detection the model output of label Repetitive exercise.Such as convolutional neural networks can be based on and learnt using loss function value as guiding by reverse propagated error Model elaboration is trained.Pedestrian target head detection model in the present embodiment for example can using SSD, YOLO, Faster-RCNN etc..

Pedestrian target head detection model is trained using head position information in the present embodiment, due to head position It is small to block probability, profile variation is small, and pedestrian target can be determined more accurately in the model trained.

Optionally, a kind of realization side pedestrian target head detection model being trained according to several video image samples Formula, which may is that, is trained pedestrian target head detection model using transfer learning algorithm.

In the present embodiment, specific application scenarios can be directed to, collects and marks pedestrian head sample, utilize transfer learning Algorithm is finely adjusted pedestrian target head detection model, so that pedestrian target head detection model is for concrete application scene Adaptability is substantially improved.

Pedestrian target head detection model is trained using transfer learning algorithm in the present embodiment, not only can be improved The speed of model training reduces the quantity demand for training sample, and the model trained is more in line with scene demand, energy It is enough that more accurate location information is provided.

In some embodiments, determine that each camera is acquired using the convolutional neural networks model of target pedestrian detection Continuous videos image in a kind of implementation of pedestrian target may is that

Using a frame of video image as the input of neural network model in the present embodiment, then can export wherein included complete The position of portion's pedestrian head, the position that will only fall into the camera and most preferably shoot the pedestrian head in the visual field in the present embodiment, determines For the pedestrian target in camera video image collected.

For the application scenarios shown in Fig. 2 and the best shooting visual field, No. 1 camera institute is determined according to neural network model There are tri- pedestrians of A, B and C in the video image of acquisition, three pedestrians are located at No. 1 camera, No. 2 cameras and No. 4 are taken the photograph As head the best shooting visual field in, be filtered by the best shooting visual field of No. 1 camera, can determine No. 1 camera institute The position of pedestrian target is the position where A in the video image of acquisition.

Optionally, neural network model may include global neural network model and local neural network model.Wherein, entirely Office's neural network model in the global scope of video image for determining the position of pedestrian head；Local neural network model is used In the position for determining pedestrian head in the preset range of the position determined by previous frame.

Optionally, global neural network model and local neural network model are required to carry out using the sample marked in advance Training.For global neural network model, can be trained using the intact video images that pedestrian position is marked, and for Local neural network model can be then trained using the local video image that pedestrian position is marked.

The problems such as in order to avoid missing inspection, erroneous detection due to caused by the background pixel for accounting for specific gravity big absolutely in training sample, The detection position that previous frame is utilized in the present embodiment, in the region that its subrange removal search head most possibly occurs, and with This region come replace detection position；Similarly when next frame, detection position is indicated with the region that previous frame obtains, in its subrange Interior removal search optimum position, to complete location updating.Fig. 4 is the Local neural network model sample that one embodiment of the invention provides This production schematic diagram.As shown in figure 4, on the basis of can be according to the pedestrian target boundary rectangle frame position that previous frame determines, outward Expand 2 times of ranges, using energy coverage goal rectangle frame as condition, the position frame of random selected preset quantity, as local nerve net The training sample of network model.

In some embodiments, the one of the position of the pedestrian head in each video image is determined using neural network model Kind implementation may is that

The position of pedestrian target is determined according to the corresponding best shooting visual field in the position of pedestrian head and video image A kind of implementation may is that

In some embodiments, to the pedestrian target in multiple cameras continuous videos image collected matched with A kind of implementation of track may is that

Wherein, Hungary Algorithm is based on Hall theorem, by finding augmenting path, may be implemented to each camera institute Pedestrian target in the continuous videos image of acquisition carries out fast and accurately matched jamming.

Optionally, the pedestrian target in each camera continuous videos image collected is carried out using Hungary Algorithm Matched jamming can specifically include:

Wherein, O_iIndicate that i-th of position frame in first object location sets, N indicate position in first object location sets The sum of frame, O_jIndicate that j-th of position frame in the set of the second target position, M indicate position frame in the set of the second target position Sum.O_i∩O_jIndicate the intersection of i-th of position frame and j-th of position frame area, O_i∪O_jIndicate i-th of position frame and j-th The union of position frame area.

First preset condition and the second preset condition according to pedestrian out of and into the best shooting visual field at the time of and track Spacing determines.At the time of generation according to pedestrian out of and into the event in the best shooting visual field and the spacing of track, row can be determined People's movement speed, the first preset condition and the second preset condition it should be ensured that the speed in the reasonable scope.

With application scenarios shown in Fig. 2 and the best shooting visual field for example, if pedestrian target A is from the best of No. 1 camera The shooting visual field moves into the best shooting visual field of No. 2 cameras, then will occur in the first object location sets of No. 2 cameras One position frame not matched, if leave the best shooting visual field of No. 1 camera according to A, into No. 2 cameras At the time of the best shooting visual field and the movement speed of A determined by the track spacing of A is reasonable, then can determine the position frame pair The pedestrian A answered has just enter into the best shooting visual field of No. 2 cameras.

Fig. 5 is the structural schematic diagram of one embodiment of pedestrian target tracking device provided by the invention.As shown in figure 5, this reality The pedestrian target tracking device 50 for applying example offer may include: to obtain module 501, determining module 502 and tracking module 503.

Module 501 is obtained, for obtaining continuous videos image captured by multiple cameras, every road camera is preset The best shooting visual field.

Determining module 502, for determining that each camera is adopted using the convolutional neural networks model of target pedestrian detection Pedestrian target in the continuous videos image of collection.

Tracking module 503, for being matched to the pedestrian target in multiple cameras continuous videos image collected Tracking.

The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1, realization principle and skill Art effect is similar, and details are not described herein again.

Optionally, pedestrian target tracking device 50 is also to include division module, for obtaining captured by multiple cameras Continuous videos image before, there is overlapping camera to the visual field in multiple cameras and carry out the visual field and divide to determine each take the photograph As the best shooting visual field of head, so that same pedestrian target is only located in the best shooting visual field an of camera.

Optionally, pedestrian target tracking device 50 is also to include training module, in the volume using target pedestrian detection Before product neural network model determines the pedestrian target in each camera continuous videos image collected, several videos are obtained Image pattern includes location information of the pedestrian target in video image sample in video image sample；According to several video figures Decent is trained pedestrian target head detection model, obtains neural network model.

Optionally, a kind of realization side pedestrian target head detection model being trained according to several video image samples Formula, which can be, is trained pedestrian target head detection model using transfer learning algorithm.

Optionally, determining module 502 specifically can be used for determining the row in each video image using neural network model The position of head part；The position of pedestrian target is determined according to the corresponding best shooting visual field in the position of pedestrian head and video image It sets.

Optionally, neural network model may include global neural network model and local neural network model；Wherein, entirely Office's neural network model in the global scope of video image for determining the position of pedestrian head；Local neural network model is used In the position for determining pedestrian head in the preset range of the position determined by previous frame.

Optionally, determining module 502 specifically can be used for,

The first position set of the pedestrian head in each video image is determined using global neural network model；Using office Portion's neural network model determines the second position set of the pedestrian head in each video image；

The view is determined according to the first position of each video image set and the corresponding best shooting visual field of video image The first object location sets of frequency image；It is corresponding best according to the second position of each video image set and video image The shooting visual field determines the second target position set of the video image.

Optionally, tracking module 503 specifically can be used for, collected to each camera continuous using Hungary Algorithm Pedestrian target in video image carries out matched jamming.

Optionally, tracking module 503 specifically can be used for, collected according to each camera using Hungary Algorithm The similarity of the first object location sets of video image and the position frame in the set of the second target position, carries out pedestrian target Matched jamming；

Optionally, tracking module 503 specifically can be also used for, if the position frame not matched in first object location sets Meet the first preset condition, it is determined that the corresponding pedestrian of the position frame has just enter into the corresponding best shooting of first object location sets The visual field；

Fig. 6 is the structural schematic diagram of one embodiment of electronic equipment provided by the invention.Electronic equipment provided in this embodiment Including but not limited to computer, individual server, multiple servers composition server group or based on cloud computing by big meter The cloud that calculation machine or server are constituted, wherein cloud computing is one kind of distributed computing, is made of the computer of a group loose couplings A super virtual computer.As shown in fig. 6, the electronic equipment 60 may include:

At least one processor 602 and memory 606；

The memory 606 stores computer executed instructions；

At least one described processor 602 executes the computer executed instructions that the memory 606 stores so that it is described extremely A few processor 602 executes pedestrian target tracking as described above.

The specific implementation process of processor 602 can be found in the embodiment of the method for above-mentioned pedestrian target tracking, realize Principle is similar with technical effect, and details are not described herein again for the present embodiment.Wherein, processor 602 and memory 606 can pass through bus 603 connections.

The embodiment of the present invention also provides a kind of computer readable storage medium, stores in the computer readable storage medium There are computer executed instructions, for realizing pedestrian described in any of the above embodiments when the computer executed instructions are executed by processor Method for tracking target.

In the above-described embodiment, it should be understood that disclosed device and method, it can be real by another way It is existing.For example, apparatus embodiments described above are only illustrative, for example, the division of the module, only a kind of logic Function division, there may be another division manner in actual implementation, such as multiple modules may be combined or can be integrated into separately One system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or straight Connecing coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or module can be electrical property, Mechanical or other forms.

The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit It is that modules physically exist alone, can also be integrated in one unit with two or more modules.Above-mentioned module at Unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this Shen Please each embodiment the method part steps.

It should be understood that above-mentioned processor can be central processing unit (English: Central Processing Unit, letter Claim: CPU), can also be other general processors, digital signal processor (English: Digital Signal Processor, Referred to as: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with hair The step of bright disclosed method, can be embodied directly in hardware processor and execute completion, or with hardware in processor and soft Part block combiner executes completion.

Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one Magnetic disk storage can also be USB flash disk, mobile hard disk, read-only memory, disk or CD etc..

Bus can be industry standard architecture (Industry Standard Architecture, ISA) bus, outer Portion's apparatus interconnection (Peripheral Component, PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control Bus etc..For convenient for indicating, the bus in illustrations does not limit only a bus or a type of bus.

Above-mentioned storage medium can be by any kind of volatibility or non-volatile memory device or their combination It realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable Read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, Disk or CD.Storage medium can be any usable medium that general or specialized computer can access.

A kind of illustrative storage medium is coupled to processor, believes to enable a processor to read from the storage medium Breath, and information can be written to the storage medium.Certainly, storage medium is also possible to the component part of processor.It processor and deposits Storage media can be located at specific integrated circuit (Application Specific Integrated Circuits, referred to as: ASIC in).Certainly, pocessor and storage media can also be used as discrete assembly and be present in terminal or server.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of pedestrian target tracking characterized by comprising

Continuous videos image captured by multiple cameras is obtained, camera described in every road presets the best shooting visual field；

It is determined in each camera continuous videos image collected using the convolutional neural networks model of target pedestrian detection Pedestrian target；

2. the method according to claim 1, wherein described obtain continuous videos figure captured by multiple cameras Before picture, further includes:

There is overlapping camera progress visual field division to the visual field in the multiple camera to determine each camera The best shooting visual field, so that same pedestrian target is only located in the best shooting visual field an of camera.

3. the method according to claim 1, wherein the convolutional neural networks mould using target pedestrian detection Type determines before the pedestrian target in each camera continuous videos image collected, further includes:

Several video image samples are obtained, include pedestrian target in the video image sample in the video image sample Location information；

Pedestrian target head detection model is trained according to several video image samples, obtains the neural network mould Type.

4. according to the method described in claim 3, it is characterized in that, it is described according to several video image samples to pedestrian's mesh Header portion detection model is trained, comprising:

The pedestrian target head detection model is trained using transfer learning algorithm.

5. method according to claim 1-4, which is characterized in that the convolution mind using target pedestrian detection The pedestrian target in each camera continuous videos image collected is determined through network model, is specifically included:

The position of the pedestrian head in each video image is determined using the neural network model；

The pedestrian target is determined according to the corresponding best shooting visual field in the position of the pedestrian head and the video image Position.

6. according to the method described in claim 5, it is characterized in that, the neural network model includes global neural network model With local neural network model；

The overall situation neural network model in the global scope of video image for determining the position of pedestrian head；

The Local neural network model is for determining the position of pedestrian head in the preset range of the position determined by previous frame It sets.

7. according to the method described in claim 6, it is characterized in that, described determining each described using the neural network model The position of pedestrian head in video image, comprising:

The first position set of the pedestrian head in each video image is determined using the global neural network model；

The second position set of the pedestrian head in each video image is determined using the Local neural network model；

It is described that the pedestrian is determined according to the corresponding best shooting visual field in the position of the pedestrian head and the video image The position of target, comprising:

It is determined according to the first position of each video image set and the corresponding best shooting visual field of the video image The first object location sets of the video image；

It is determined according to the second position of each video image set and the corresponding best shooting visual field of the video image Gather second target position of the video image.

8. the method according to the description of claim 7 is characterized in that described to multiple cameras continuous videos image collected In pedestrian target carry out matched jamming, comprising:

Matched jamming is carried out to the pedestrian target in each camera continuous videos image collected using Hungary Algorithm.

9. according to the method described in claim 8, it is characterized in that, described acquire each camera using Hungary Algorithm Continuous videos image in pedestrian target carry out matched jamming, specifically include:

Using the Hungary Algorithm, according to the first object location sets and second of each camera video image collected The similarity of position frame in the set of target position carries out matched jamming to pedestrian target；

Wherein, the similarity of the position frame determines according to the following formula:

I=1,2,3 ... N, j=1,2,3 ... M

Wherein, O_iIndicate that i-th of position frame in the first object location sets, N indicate position in the first object location sets Set the sum of frame, O_jIndicate that j-th of position frame in the second target position set, M indicate the second target position set The sum of middle position frame.

10. according to the method described in claim 9, it is characterized in that, described to multiple cameras continuous videos figure collected Pedestrian target as in carries out matched jamming, further includes:

If the position frame not matched in the first object location sets meets the first preset condition, it is determined that the position frame pair The pedestrian answered has just enter into the corresponding best shooting visual field of the first object location sets；

If the position frame in the second target position set meets the second preset condition, it is determined that the corresponding pedestrian of the position frame It leaves second target position and gathers the corresponding best shooting visual field；

First preset condition and second preset condition according to pedestrian out of and into the best shooting visual field at the time of and Track spacing determines.

11. a kind of pedestrian target tracking device characterized by comprising

Module is obtained, for obtaining continuous videos image captured by multiple cameras, camera described in every road is preset most The good shooting visual field；

Determining module determines that each camera is collected continuous for the convolutional neural networks model using target pedestrian detection Pedestrian target in video image；

Tracking module, for carrying out matched jamming to the pedestrian target in multiple cameras continuous videos image collected.

12. a kind of electronic equipment characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that at least one described processing Device executes such as the described in any item pedestrian target trackings of claim 1-10.

13. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium It executes instruction, for realizing such as described in any item rows of claim 1-10 when the computer executed instructions are executed by processor People's method for tracking target.