CN108596109A

CN108596109A - A kind of object detection method and device based on neural network and motion vector

Info

Publication number: CN108596109A
Application number: CN201810385675.0A
Authority: CN
Inventors: 王子彤; 姜凯; 聂林川
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Inspur Group Co Ltd
Priority date: 2018-04-26
Filing date: 2018-04-26
Publication date: 2018-09-28
Anticipated expiration: 2038-04-26
Also published as: CN108596109B

Abstract

The present invention provides a kind of object detection method and device, method includes：Parsing stream of video data records the serial number per frame image to extract multiple image；For every frame image, determine whether image is intraframe predictive coding frame according to the serial number of image, if so, executing A1；Otherwise A2 to A4 is executed；A1：Image is identified by neural network model to detect at least one target, stores the location information of each target in the picture, each target is marked to form target image in the picture；A2：Reference picture is determined according to the serial number of image, obtains corresponding motion vector information；A3：According to the location information and motion vector information of each target in a reference image, the current location information of each target in the picture is determined；A4：According to the current location information of each target in the picture, each target is marked to form target image in the picture.Technical solution provided by the invention, detection efficiency are higher.

Description

A kind of object detection method and device based on neural network and motion vector

Technical field

The present invention relates to field of computer technology, more particularly to a kind of target detection based on neural network and motion vector Method and device.

Background technology

In vehicle-mounted automated driving system, it is often necessary to be carried out to the stream of video data that image capture device is sent corresponding Parsing and identification, each target on each frame image to determine and mark stream of video data carrying (for example, vehicle, Pedestrian and traffic sign etc.) position, realize to entrained by stream of video data image carry out target detection.

Currently, when realization carries out target detection to the image entrained by stream of video data, need through neural network mould Type carries out image recognition respectively to each frame image entrained by stream of video data, to determine and mark on each frame image At least one target, and each frame image after label is exported.

In above-mentioned technical proposal, by neural network model to each frame image carry out respectively image recognition elapsed time compared with Long, detection efficiency is relatively low.

Invention content

An embodiment of the present invention provides a kind of object detection method and device based on neural network and motion vector, school survey Efficiency is higher.

In a first aspect, the present invention provides a kind of object detection method based on neural network and motion vector, including：

Receive the stream of video data that image capture device is sent；

The stream of video data is parsed to extract at least two field pictures, and each frame described image for recording extraction is divided Not corresponding serial number；

It is directed to each frame described image, the serial number corresponding to described image determines whether described image is in frame Encoded predicted frame, if so, executing A1；Otherwise, A2, A3 and A4 are executed；

A1：Described image is identified by neural network model to detect at least one mesh of described image carrying Mark determines and stores location information of each target in described image, to each target in described image Processing is marked to form target image in position；

A2：Serial number corresponding to described image determines the reference picture corresponding to described image, and described in acquisition Image corresponds to the motion vector information of the reference picture；

A3：According to each target location information in the reference picture and the motion vector information, really Fixed current location information of each target in described image；

A4：According to current location information of each target in described image, to each described in described image Processing is marked to form target image in the position of target.

Preferably,

The serial number corresponding to described image determines whether described image is intraframe predictive coding frame, including：

The evaluation of estimate corresponding to described image is calculated by following formula：

Wherein, it is default normal more than 1 that β, which characterizes serial number, n corresponding to the evaluation coefficient, α characterization described images, Number；

When β is integer, determine that described image is intraframe predictive coding frame, otherwise, it determines described image is inter-prediction Coded frame.

Preferably,

The motion vector information, including：Whole pixel motion vector and intraframe predictive coding image block.

Second aspect, an embodiment of the present invention provides a kind of object detecting device based on neural network and motion vector, Including

Code stream receiving module, the stream of video data for receiving image capture device transmission；

Code stream analyzing module parses the stream of video data to extract at least two field pictures, and records each of extraction The corresponding serial number of frame described image institute；

Image determining module, for being directed to each frame described image, the serial number corresponding to described image determines Whether described image is intraframe predictive coding frame, if so, triggering neural network calling module；Otherwise, acquisition of information mould is triggered Block；

The neural network calling module, is identified to detect described image for passing through neural network model At least one target of image carrying is stated, determine and stores location information of each target in described image, described Processing is marked to form target image to the position of each target in image；

Described information acquisition module determines the ginseng corresponding to described image for the serial number corresponding to described image Image is examined, and obtains the motion vector information that described image corresponds to the reference picture；

Position determination module, for according to each target location information in the reference picture and the fortune Dynamic vector information determines current location information of each target in described image；

Processing module is marked, for the current location information according to each target in described image, in the figure Processing is marked to form target image to the position of each target as in.

Preferably,

Described image determining module, for executing following steps：

When β is integer, determines that described image is intraframe predictive coding frame, when β is non-integer, determine that described image is Inter prediction encoding frame.

Preferably,

An embodiment of the present invention provides a kind of object detection method and device based on neural network and motion vector, the party It in method, is parsed by the stream of video data to reception to extract at least two field pictures, and records each frame of extraction The corresponding serial number of image institute subsequently can then be directed to each frame image, and the serial number corresponding to the image determines should Whether image is intraframe predictive coding frame；When image is intraframe predictive coding frame, just by neural network model to the image It is identified to detect at least one target of image carrying, place is marked to the position of each target in the images Reason stores the location information of each target in the images to form target image；When image is non-intraframe predictive coding frame When, the reference picture corresponding to the image can be determined according to the serial number corresponding to the image, and obtain the image and correspond to ginseng The motion vector information of image is examined, to true according to the location information of motion vector information and each target in a reference image The fixed current location information of each target in the images, and then the current location information according to each target in the images, Processing is marked to form target image to the position of each target in the images.In conclusion the embodiment of the present invention carries The technical solution of confession, without being known respectively to each frame image entrained by stream of video data by neural network model Not, you can detect and mark each target on each frame image, detection efficiency is higher.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.

Fig. 1 is a kind of stream for object detection method based on neural network and motion vector that one embodiment of the invention provides Cheng Tu；

Fig. 2 is a kind of knot for object detecting device based on neural network and motion vector that one embodiment of the invention provides Structure schematic diagram.

Specific implementation mode

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments, based on the embodiments of the present invention, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

As shown in Figure 1, an embodiment of the present invention provides a kind of object detection method based on neural network and motion vector, Including：

Step 101, the stream of video data that image capture device is sent is received；

Step 102, the stream of video data is parsed to extract at least two field pictures, and is recorded described in each frame of extraction The corresponding serial number of image institute；

Step 103, the image corresponding to a non-selected serial number is selected successively；

Step 104, the serial number corresponding to the described image selected determines whether described image is compiled for intra prediction Code frame, if so, thening follow the steps 105；Otherwise, step 106 is executed；

Step 105, described image is identified by neural network model to detect that described image carries at least One target determines and stores location information of each target in described image, to each described in described image Processing is marked to form target image in the position of target；

Step 106, the serial number corresponding to described image determines the reference picture corresponding to described image, and obtains Described image corresponds to the motion vector information of the reference picture；

Step 107, believed according to each target location information in the reference picture and the motion vector Breath, determines current location information of each target in described image；

Step 108, the current location information according to each target in described image, to each in described image Processing is marked to form target image in the position of the target；

Step 109, if there are non-selected images, if so, thening follow the steps 103；Otherwise, terminate current process.

Embodiment as shown in Figure 1 is parsed by the stream of video data to reception to extract at least two frame figures Picture, and the corresponding serial number of each frame image institute of extraction is recorded, it subsequently can be then directed to each frame image, according to the image Corresponding serial number determines whether the image is intraframe predictive coding frame；When image is intraframe predictive coding frame, just pass through Neural network model is identified the image to detect at least one target of image carrying, in the images to each Processing is marked to form target image in the position of target, and stores the location information of each target in the images；Work as figure When as being non-intraframe predictive coding frame, the reference picture corresponding to the image can be determined according to the serial number corresponding to the image, And the motion vector information that the image corresponds to reference picture is obtained, to joined according to motion vector information and each target It examines the location information in image and determines the current location information of each target in the images, and then according to each target in the figure Processing is marked to form target image to the position of each target in the images in current location information as in.To sum up It is described, technical solution provided in an embodiment of the present invention, without passing through neural network model to every entrained by stream of video data One frame image is identified respectively, you can detects and marks each target on each frame image, detection efficiency is higher.

Correspondingly, without being known respectively to each frame image entrained by stream of video data by neural network model Not, the workload of neural network model can be also saved, each frame image progress mesh realized and carried to stream of video data is reduced The power consumption for marking the terminal of detection executes other tasks for terminal and provides enough computer storage threshold doseags, while end also can be improved The processing capacity that video data caused by end is blocked and deforms upon for target distorts, improves its robustness.

In one embodiment of the invention, the serial number corresponding to described image determines whether described image is frame Intraprediction encoding frame, including：

In above-described embodiment, n is preset constant more than 1, which (for example can hold in conjunction with practical business scene The computing capability or storage capacity of the equipment of row this method) rationally it is arranged, the size of preset constant would generally influence image The accuracy of detection of middle target, in order to ensure with higher accuracy of detection, preset constant should be relatively small, can usually be arranged For more than 1 and less than 6 integer, for example, could be provided as any one in 2,3,4,5.

In above-described embodiment, as α=1, i.e., when the serial number corresponding to image is 1, it is from reception to characterize the image First image extracted in stream of video data needs to pass through nerve net at this point, the image should be intraframe predictive coding frame Network model is identified the image to detect one or more targets of image carrying, in the images to each target Position be marked processing to form target image, meanwhile, should also store each target in the images position letter Breath.

It is understandable, it is every to meet item defined by above-described embodiment from each frame image of the video extraction of reception Each frame image of part should all be confirmed as town intraprediction encoding frame, for example, be taken when in n=3 and stream of video data When 9 frame image of band, each frame image that corresponding serial number is 1,4,7 can be confirmed as town intraprediction encoding frame；Correspondingly, corresponding Each frame image that serial number is 2,3,5,6,8,9 is inter prediction encoding frame.

In one embodiment of the invention, when being parsed due to the video data to reception, typically receiving while parses, It therefore, that is, can be according to image institute when determining that image is inter prediction encoding frame when it is not intraframe predictive coding frame to determine Corresponding serial number carries out backtracking traversal according to descending sequence to each image of extraction No is intraframe predictive coding frame, is determined as the present image of traversal when the present image of traversal is intraframe predictive coding frame Reference picture.It for example, then can be first when the present image for determining that corresponding serial number is 6 is not intraframe predictive coding frame First traversal order number 5, when determining image corresponding to serial number 5 nor when intraframe predictive coding frame, traversal order number 4, When it is intraframe predictive coding frame to determine the image corresponding to serial number 4, then the image corresponding to serial number 4 can be determined as Reference picture corresponding with the present image.

In one embodiment of the invention, the motion vector information, in including but not limited to whole pixel motion vector and frame Predictive-coded picture block, for example, can also include dividing pixel motion vector.

For example, when the present image A of corresponding serial number 6 is inter prediction encoding frame, the reference corresponding to the image Image is the image B (intraframe predictive coding frame) that corresponding serial number is 4, when motion vector information includes intraframe predictive coding figure When as block, characterizes in image A and image B and all have the intraframe predictive coding image block, the intraframe predictive coding image can be passed through Whole pixel motion vector of the block in image B with respect to it in image A, determines the same target in image A and image B Relative motion vectors then can be according to relative motion vectors and each due to position of each target in image A it is known that therefore Location determination of the target in image A goes out position of each target in image B, to mark each target in image B.

As shown in Fig. 2, an embodiment of the present invention provides a kind of object detecting device based on neural network and motion vector, Including：

Code stream receiving module 201, the stream of video data for receiving image capture device transmission；

Code stream analyzing module 202 parses the stream of video data to extract at least two field pictures, and records the every of extraction The corresponding serial number of one frame described image institute；

Image determining module 203, for being directed to each frame described image, the serial number corresponding to described image is true Determine whether described image is intraframe predictive coding frame, if so, triggering neural network calling module 204；Otherwise, triggering information obtains Modulus block 205；

The neural network calling module 204 is identified to detect described image for passing through neural network model Go out at least one target of described image carrying, determine and stores location information of each target in described image, Processing is marked to form target image to the position of each target in described image；

Described information acquisition module 205 is determined for the serial number corresponding to described image corresponding to described image Reference picture, and obtain described image correspond to the reference picture motion vector information；

Position determination module 206, for according to each target location information in the reference picture and institute Motion vector information is stated, determines current location information of each target in described image；

Processing module 207 is marked, for the current location information according to each target in described image, described Processing is marked to form target image to the position of each target in image.

In one embodiment of the invention, described image determining module 203, for executing following steps：

In one embodiment of the invention, the motion vector information, including：Whole pixel motion vector and intraframe predictive coding Image block.

The contents such as the information exchange between each unit, implementation procedure in above-mentioned apparatus, due to implementing with the method for the present invention Example is based on same design, and particular content can be found in the narration in the method for the present invention embodiment, and details are not described herein again.

In conclusion each embodiment of the present invention at least has the advantages that：

1, it in one embodiment of the invention, is parsed by the stream of video data to reception to extract at least two frame figures Picture, and the corresponding serial number of each frame image institute of extraction is recorded, it subsequently can be then directed to each frame image, according to the image Corresponding serial number determines whether the image is intraframe predictive coding frame；When image is intraframe predictive coding frame, just pass through Neural network model is identified the image to detect at least one target of image carrying, in the images to each Processing is marked to form target image in the position of target, and stores the location information of each target in the images；Work as figure When as being non-intraframe predictive coding frame, the reference picture corresponding to the image can be determined according to the serial number corresponding to the image, And the motion vector information that the image corresponds to reference picture is obtained, to joined according to motion vector information and each target It examines the location information in image and determines the current location information of each target in the images, and then according to each target in the figure Processing is marked to form target image to the position of each target in the images in current location information as in.To sum up It is described, technical solution provided in an embodiment of the present invention, without passing through neural network model to every entrained by stream of video data One frame image is identified respectively, you can detects and marks each target on each frame image, detection efficiency is higher.

2, in one embodiment of the invention, without passing through neural network model to each frame figure entrained by stream of video data As being identified respectively, the workload of neural network model can be also saved, reduction realization carries stream of video data each Frame image carries out the power consumption of the terminal of target detection, and executing other tasks for terminal provides enough computer storage threshold doseags, together When the processing capacity that video data caused by terminal is blocked and deforms upon for target distorts also can be improved, improve its robust Property.

It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation is distinguished with another entity or operation, is existed without necessarily requiring or implying between these entities or operation Any actual relationship or order.Moreover, the terms "include", "comprise" or its any other variant be intended to it is non- It is exclusive to include, so that the process, method, article or equipment including a series of elements includes not only those elements, But also include other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In the absence of more restrictions, the element limited by sentence " including one ", is not arranged Except there is also other identical factors in the process, method, article or apparatus that includes the element.

Finally, it should be noted that：The foregoing is merely presently preferred embodiments of the present invention, is merely to illustrate the skill of the present invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made all within the spirits and principles of the present invention, Equivalent replacement, improvement etc., are included within the scope of protection of the present invention.

Claims

1. a kind of object detection method based on neural network and motion vector, which is characterized in that including：

Receive the stream of video data that image capture device is sent；

The stream of video data is parsed to extract at least two field pictures, and each frame described image institute for recording extraction is right respectively The serial number answered；

It is directed to each frame described image, the serial number corresponding to described image determines whether described image is intra prediction Coded frame, if so, executing A1；Otherwise, A2, A3 and A4 are executed；

A1：Described image is identified by neural network model to detect at least one target of described image carrying, It determines and stores location information of each target in described image, to the position of each target in described image Processing is marked to form target image；

A2：Serial number corresponding to described image determines the reference picture corresponding to described image, and obtains described image Corresponding to the motion vector information of the reference picture；

A3：According to each target location information in the reference picture and the motion vector information, determine each Current location information of a target in described image；

A4：According to current location information of each target in described image, to each target in described image Position be marked processing to form target image.

2. according to the method described in claim 1, it is characterized in that,

Wherein, it is the preset constant more than 1 that β, which characterizes serial number, n corresponding to the evaluation coefficient, α characterization described images,；

When β is integer, determine that described image is intraframe predictive coding frame, otherwise, it determines described image is inter prediction encoding Frame.

3. method according to claim 1 or 2, which is characterized in that

4. a kind of object detecting device based on neural network and motion vector, which is characterized in that including：

Code stream analyzing module parses the stream of video data to extract at least two field pictures, and records each frame institute of extraction State the corresponding serial number of image institute；

Image determining module, for being directed to each frame described image, described in the serial number determination corresponding to described image Whether image is intraframe predictive coding frame, if so, triggering neural network calling module；Otherwise, data obtaining module is triggered；

The neural network calling module is identified described image for passing through neural network model to detect the figure As at least one target carried, determines and store location information of each target in described image, in described image In to the position of each target be marked processing to form target image；

Described information acquisition module determines the reference chart corresponding to described image for the serial number corresponding to described image Picture, and obtain the motion vector information that described image corresponds to the reference picture；

Position determination module, for according to each target location information in the reference picture and movement arrow Information is measured, determines current location information of each target in described image；

Processing module is marked, for the current location information according to each target in described image, in described image Processing is marked to form target image to the position of each target.

5. device according to claim 4, which is characterized in that

Described image determining module, for executing following steps：

It when β is integer, determines that described image is intraframe predictive coding frame, when β is non-integer, determines that described image is interframe Encoded predicted frame.

6. device according to claim 4 or 5, which is characterized in that