CN109886951A

CN109886951A - Method for processing video frequency, device and electronic equipment

Info

Publication number: CN109886951A
Application number: CN201910137122.8A
Authority: CN
Inventors: 孙培钦
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2019-06-14

Abstract

The present invention provides a kind of method for processing video frequency, device and electronic equipments, belong to technical field of image processing, method for processing video frequency, device and electronic equipment provided by the invention, for carrying out attributive analysis to the target object in video.Target object is determined first in the video of acquisition, multiple images to be analyzed comprising target object are extracted from video, feature extraction is carried out to multiple images to be analyzed, obtain multiple characteristic patterns, multiple characteristic patterns are merged, fusion feature figure is obtained, the attribute information of target object is determined further according to fusion feature figure.The accuracy of attributive analysis result can be improved while ensureing treatment effeciency in this method.

Description

Method for processing video frequency, device and electronic equipment

Technical field

The invention belongs to technical field of image processing, more particularly, to a kind of method for processing video frequency, device and electronic equipment.

Background technique

Video structural processing technique is used to analyze the attribute information of target object from the video that camera is shot, and will belong to Property information is presented to the user.For example, target object can be pedestrian or vehicle in intelligent transportation scene.If target object It is pedestrian, attributive analysis can be carried out to traffic video, export the attribute informations such as gender, clothing, the age of pedestrian.

In practical applications, if carrying out attributive analysis to the targeted object region in each picture frame of video, Operand is very big, too high to the operational capability requirement of terminal device, and current terminal device is difficult to realize.Therefore, in existing skill In art, a picture frame of optimal quality is usually only selected in the picture frame comprising target object, is extracted in the picture frame Targeted object region carries out attributive analysis, but this method is it is difficult to ensure that analyze the accuracy of obtained attribute information.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of method for processing video frequency, device and electronic equipment, for view Target object in frequency carries out attributive analysis, can improve the accuracy of attributive analysis result while ensureing treatment effeciency.

To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:

In a first aspect, the embodiment of the invention provides a kind of method for processing video frequency, comprising:

Target object is determined in the video of acquisition；

Multiple images to be analyzed are extracted from the video；It include target object in the image to be analyzed；

Feature extraction is carried out to multiple images to be analyzed, obtains multiple characteristic patterns；

Multiple characteristic patterns are merged, fusion feature figure is obtained；

The attribute information of the target object is determined according to the fusion feature figure.

With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein institute State the step of target object is determined in the video of acquisition, comprising:

The detection for carrying out target object to the video by target detection network, obtains each image in the video The corresponding detection block of frame；

By target tracker, the mark of the detection block is determined according to the similarity of detection block in each picture frame；

According to tagged detection block, position of the target object in each picture frame of the video is determined.

With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein from The step of extracting multiple images to be analyzed in the video, comprising:

The image to be analyzed of specified quantity is extracted from the video；The quality of target object is full in the image to be analyzed Sufficient preset requirement.

The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspect Possible embodiment, wherein the step of image to be analyzed of specified quantity is extracted from the video, comprising:

From the picture frame comprising target object, the area image comprising target object is extracted；

The quality for choosing target object meets the area image of preset requirement or picture frame is saved into image library；

When the image in described image library is more than specified quantity, the area image or target object saved at first is deleted The minimum image of quality；

Using the image in image library as image to be analyzed.

The third possible embodiment with reference to first aspect, the embodiment of the invention provides the 4th kind of first aspect Possible embodiment, wherein the method also includes:

Network is detected by picture quality, and quality testing is carried out to the area image comprising target object or picture frame；

According to the image quality information that described image quality testing network exports, the target pair in the area image is judged Whether the quality of elephant meets preset requirement；Described image quality information includes at least image definition.

Any one of second with reference to first aspect into the 4th kind of possible embodiment, the embodiment of the present invention mentions The 5th kind of possible embodiment of first aspect is supplied, wherein feature extraction is carried out to multiple images to be analyzed, is obtained The step of multiple characteristic patterns, comprising:

Each image to be analyzed input feature vector is extracted into network, obtain each of described feature extraction network output to Analyze the corresponding characteristic pattern of image.

With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein from The step of extracting multiple images to be analyzed in the video, comprising:

The quality that target object is chosen from the video meets the picture frame of preset requirement or area image is used as wait divide Analyse image；The area image is the area image comprising target object extracted from described image frame；

The step of are carried out by feature extraction, obtains multiple characteristic patterns for multiple images to be analyzed, comprising:

Feature extraction is carried out to each image to be analyzed by feature extraction network, obtains each image to be analyzed pair The characteristic pattern answered；

The characteristic pattern of specified quantity is chosen from the obtained characteristic pattern.

The 6th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 7th kind of first aspect Possible embodiment, wherein the quality that target object is chosen from the video meets picture frame or the region of preset requirement The step of image is as image to be analyzed, comprising:

According to the image quality information of the picture frame of target detection network output, the target object in described image frame is judged Quality whether meet preset requirement；Described image quality information includes at least image definition；

If so, using described image frame as image to be analyzed, alternatively, extracting from described image frame includes target object Area image as image to be analyzed.

With reference to first aspect, the embodiment of the invention provides the 8th kind of possible embodiments of first aspect, wherein will The step of the multiple characteristic pattern is merged, and fusion feature figure is obtained, comprising:

The multiple characteristic pattern is weighted fusion according to default weight, obtains fusion feature figure.

With reference to first aspect, the embodiment of the invention provides the 9th kind of possible embodiments of first aspect, wherein root The step of determining the attribute information of the target object according to the fusion feature figure, comprising:

The fusion feature figure is inputted into attributive analysis network, obtains the target pair of the attributive analysis network output The attribute information of elephant.

Second aspect, the embodiment of the present invention also provide a kind of video process apparatus, comprising:

Target determination module, for determining target object in the video of acquisition；

Image zooming-out module, for extracting multiple images to be analyzed from the video；Include in the image to be analyzed Target object；

Attributive analysis module obtains multiple characteristic patterns for carrying out feature extraction to multiple images to be analyzed；It will be more A characteristic pattern is merged, and fusion feature figure is obtained；The attribute of the target object is determined according to the fusion feature figure Information.

The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, including memory and processor；

The computer program that can be run on the processor is stored in the memory, described in the processor executes The step of method described in any one of above-mentioned first aspect is realized when computer program.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Computer program is stored on medium, the computer program is executed when being run by processor described in above-mentioned any one of first aspect Method the step of.

Method for processing video frequency, device and electronic equipment provided in an embodiment of the present invention, for the target object in video Carry out attributive analysis.Target object is determined first in the video of acquisition, extracted from video it is multiple comprising target object to Image is analyzed, feature extraction is carried out to multiple images to be analyzed, multiple characteristic patterns is obtained, multiple characteristic patterns is merged, are obtained To fusion feature figure, the attribute information of target object is determined further according to fusion feature figure.Due to carrying out feature extraction to image Fast speed, therefore characteristic extraction procedure can be performed a plurality of times, multiple characteristic patterns are obtained, and the fortune of attributive analysis is carried out to image Calculation amount is big, it is slow to execute speed, therefore carry out attributive analysis based on fusion feature figure, less executes only once or secondaryly attributive analysis Process ensures the treatment effeciency of this method.The multiple characteristic patterns obtained according to multiple images to be analyzed are merged, are obtained Fusion feature figure has contained the feature of multiple images to be analyzed, carries out attributive analysis based on fusion feature figure, attribute can be improved Precision of analysis.

Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 shows the structural schematic diagram of a kind of electronic equipment provided by the embodiment of the present invention；

Fig. 2 shows a kind of flow charts of method for processing video frequency provided by the embodiment of the present invention；

Fig. 3 shows the schematic diagram of another kind method for processing video frequency provided by the embodiment of the present invention；

Fig. 4 shows a kind of structural block diagram of video process apparatus provided by the embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

Due to usually only selecting a picture frame of optimal quality in video to carry out attributive analysis in the prior art, it is difficult to protect The accuracy for the attribute information that card analysis obtains.Based on this, the embodiment of the invention provides a kind of method for processing video frequency, device, Electronic equipment and computer storage medium.Below in conjunction with the drawings and specific embodiments to video provided in an embodiment of the present invention at Reason method, apparatus and electronic equipment are described in detail.

Embodiment one:

Firstly, describing the exemplary electronic device of the method for processing video frequency for realizing the embodiment of the present invention referring to Fig.1 100.The exemplary electronic device 100 can be the mobile terminals such as smart phone, tablet computer；It is also possible to computer or server Etc. other equipment.The exemplary electronic device 100 can also be camera, such as structuring camera.The structuring camera be with The camera of video structural processing function.

As shown in Figure 1, electronic equipment 100 includes one or more processors 102, one or more memories 104, input Device 106, output device 108, can also include image collecting device 110, these components by bus system 112 and/or its Bindiny mechanism's (not shown) of its form interconnects.It should be noted that the component and structure of electronic equipment 100 shown in FIG. 1 only show Example property, and not restrictive, as needed, the electronic equipment also can have other assemblies and structure.

The processor 102 can be central processing unit (CPU), graphics processor (Graphics Processing Unit, GPU) or the other forms with data-handling capacity, image-capable and/or instruction execution capability processing list Member, and can control other components in the electronic equipment 100 to execute desired function.

In an alternative embodiment, processor 102 may include first processor and second processor.Wherein, One processor can use single-chip microcontroller or other micro-chip processors, for controlling other components in the electronic equipment 100, and Execute the non-convolution step in method for processing video frequency provided by the embodiment of the present invention.Second processor is as first processor Coprocessor can use field programmable gate array (Field-Programmable Gate Array, FPGA) chip, use In the step of executing progress convolution in method for processing video frequency provided by the embodiment of the present invention.Field programmable gate array chip can To accelerate the calculating process of convolution, the operational efficiency of method for processing video frequency is further increased.

The memory 104 may include one or more computer program products, and the computer program product can be with Including various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described volatile Property memory for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-easy The property lost memory for example may include read-only memory (ROM), hard disk, flash memory etc..On the computer readable storage medium It can store one or more computer program instructions, processor 102 can run described program instruction, described below to realize The embodiment of the present invention in the attributive analysis function and/or other desired functions (realized by processor).In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or Various images, video for generating etc..

The input unit 106 can be the device that user is used to input instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (for example, image or sound) to external (for example, user), and It and may include one or more of display, loudspeaker etc..

Described image acquisition device 110 can shoot the desired image of user or video, and by captured image or Video is stored in the memory 104 for the use of other components.

Embodiment two:

A kind of method for processing video frequency is present embodiments provided, attributive analysis can be improved while ensureing treatment effeciency As a result accuracy, Fig. 2 shows the flow charts of the method for processing video frequency.It should be it should be noted that showing in the flowchart of fig. 2 Out the step of, can execute in a computer system such as a set of computer executable instructions, although also, in flow charts Logical order is shown, but in some cases, it can be with the steps shown or described are performed in an order that is different from the one herein. It describes in detail below to the present embodiment.

As shown in Fig. 2, method for processing video frequency provided in this embodiment, includes the following steps:

Step S202 determines target object in the video of acquisition.

Obtain one section of video, the video, which can be, is carrying out the image collecting device of shooting task acquires in real time one Section video, or pre-stored video.Pre-stored video, which can be, to be normalized original video, smooths The video obtained after processing.In some embodiments, complete video can obtained and then determining target pair in video As.In further embodiments, target object can be determined in the video flowing of acquisition in real time when acquiring video.

The target object can be any object, including but not limited to pedestrian, vehicle, animal or plant etc., target pair A part (such as face), a part of animal or a part of plant as can also be human body, the present invention do not make this specifically Limitation.

Target object is determined in the video of acquisition, it can be understood as determine target object in each image of the video Position or region in frame determine the motion track of target object.Using existing method for tracking target, can obtain Target object is determined in the video taken.

Step S204 extracts multiple images to be analyzed from video.

It wherein, include target object in the image to be analyzed, in some embodiments, image to be analyzed can be video In include target object picture frame.In further embodiments, image to be analyzed can be from the image comprising target object The area image comprising target object extracted in frame.

Optionally, for, each comprising the picture frame of target object, can extract from picture frame includes target in video The area image of object.Position of the target object in each picture frame of video is had determined that in step S202, therefore Area image comprising target object can be extracted from picture frame using the method for existing image partition method or stingy figure. From obtained multiple regions image, the quality for choosing target object meets the area image of preset requirement as figure to be analyzed Picture.Alternatively, from the picture frame comprising target object, choose target object quality meet preset requirement picture frame be used as to Analyze image.Wherein, the quality of the target object meet preset requirement may include target object can identification reach and set Determine threshold value；For example, target object can identification can be indicated by the clarity of target object, according to including target object The image definition of picture frame or area image determines whether the quality of target object meets preset requirement.If target object is Pedestrian, the quality of target object meet the direction that preset requirement can also include face in image and meet predetermined angle, such as positive face Or the steering of face is in the angular range of setting.Area image or figure can be determined using existing picture quality detection method As the quality of frame, determine whether the quality of target object in image meets default want according to the quality of area image or picture frame It asks.

Illustratively, network can be detected by picture quality to carry out the area image comprising target object or picture frame Quality testing detects the image quality information of network output according to picture quality, judges the target in area image or picture frame Whether the quality of object meets preset requirement.Described image quality information includes at least image definition, is examined according to picture quality The image definition of survey grid network output, can determine target object can identification.For example, for some area image, such as The image definition of fruit picture quality detection network output reaches given threshold, illustrates the distinguishable of target object in the area image Knowledge and magnanimity reach given threshold, can be using the area image as image to be analyzed., whereas if picture quality detection network output Image definition be not up to given threshold, illustrate target object in the area image can identification be not up to given threshold, It cannot directly be abandoned using the area image as image to be analyzed.

It should be noted that image quality information can also include other parameters according to the difference of target object.For example, When target object is pedestrian, image quality information can also be including facial orientation etc..If including more in image quality information A parameter can determine that the quality of target object meets preset requirement when each parameter reaches setting condition；It can also incite somebody to action The numerical value of multiple parameters is weighted summation, obtains the comprehensive quality score of image.If comprehensive quality score is greater than or equal to Given threshold, then it is assumed that the quality of target object meets preset requirement in area image.If comprehensive quality score is less than setting Threshold value, then it is assumed that the quality of target object is unsatisfactory for preset requirement in area image.The quality for choosing target object, which meets, to be preset It is required that area image or picture frame as image to be analyzed.

Step S206 carries out feature extraction to multiple images to be analyzed, obtains multiple characteristic patterns.

If image to be analyzed is the area image comprising target object extracted from the picture frame comprising target object, Image can be analyzed by feature extraction network handles carry out feature extraction.Specifically, respectively that each image to be analyzed is defeated Enter feature extraction network, obtains the corresponding characteristic pattern of each image to be analyzed of feature extraction network output.

Feature extraction network is a kind of convolutional neural networks, and feature extraction network may include at least one convolutional layer, institute At least one convolutional layer is stated for extracting characteristic pattern from image to be analyzed.Pass through the feature extraction network pair being made of convolutional layer Image carries out the arithmetic speed of feature extraction quickly, can also be even if executing feature extraction operation to multiple images to be analyzed It is completed in the very short time.In order to further increase arithmetic speed, feature extraction network, FPGA can be implemented on fpga chip Chip can execute convolution operation with multidiameter delay, therefore can speed up and complete convolution algorithm process.

If image to be analyzed is in video include target object picture frame, can be extracted from each picture frame and include The area image of target object carries out feature extraction to each area image by feature extraction network, obtains each administrative division map As corresponding characteristic pattern.

Step S208 merges multiple characteristic patterns, obtains fusion feature figure.

In an alternative embodiment, multiple characteristic patterns can be subjected to mean value fusion, obtains fusion feature figure.Into one Step ground is said, the average value of the characteristic value of same position in multiple characteristic patterns is calculated, using the corresponding average value in the position as fusion The characteristic value of corresponding position in characteristic pattern, obtains fusion feature figure.

In an alternative embodiment, multiple characteristic patterns can be weighted fusion according to default weight, obtained Fusion feature figure.The corresponding weight of each characteristic pattern, which can be according to the quality of target object in above-mentioned zone image It determines, the quality of target object is higher, and corresponding weight is bigger, and the quality of target object is lower, and corresponding weight is smaller.According to Default weight, calculates the weighted average of the characteristic value of same position in multiple characteristic patterns, by the corresponding weighted average in the position It is worth the characteristic value as corresponding position in fusion feature figure, obtains fusion feature figure.

Step S210 determines the attribute information of target object according to fusion feature figure.

Fusion feature figure is inputted into attributive analysis network, obtains the attribute letter of the target object of attributive analysis network output Breath.

Attributive analysis network is also a kind of convolutional neural networks, attributive analysis network may include at least one convolutional layer and At least one full articulamentum, the attribute letter of the last one full articulamentum output target object at least one described full articulamentum Breath.If target object is pedestrian, the attribute information of target object can include but is not limited to the gender of personnel, age model It encloses, substantially height, hair decorations, clothing, belongings etc..If target object is vehicle, the attribute information of target object can be with The including but not limited to information such as license plate number, Che Yanse, vehicle, brand, traffic allowance, car decoration object.

Method for processing video frequency provided in an embodiment of the present invention, for carrying out attributive analysis to the target object in video.It is first Target object is first determined in the video of acquisition, multiple images to be analyzed comprising target object is extracted from video, to multiple Image to be analyzed carries out feature extraction, obtains multiple characteristic patterns, multiple characteristic patterns are merged, obtain fusion feature figure, then The attribute information of target object is determined according to fusion feature figure.It, can due to carrying out the fast speed of feature extraction to image To execute characteristic extraction procedure for multiple images to be analyzed, multiple characteristic patterns are obtained, and carry out the fortune of attributive analysis to image Calculation amount is big, it is slow to execute speed, therefore carry out attributive analysis based on fusion feature figure, less executes only once or secondaryly attributive analysis Process ensures the treatment effeciency of this method.The multiple characteristic patterns obtained according to multiple images to be analyzed are merged, are obtained Fusion feature figure has contained the feature of multiple images to be analyzed, carries out attributive analysis based on fusion feature figure, attribute can be improved Precision of analysis.

For the calculation amount of controlling feature extraction process, it is further reduced overall calculation amount.In a kind of optional embodiment In, the image conduct that specified quantity is chosen in the area image or picture frame of preset requirement can be met from the quality of target object Image to be analyzed.

A kind of feasible implementation are as follows: from the picture frame comprising target object, extract the region comprising target object Image, the quality for choosing target object meet the area image of preset requirement and save into image library.When the region in image library When image is more than specified quantity, the minimum area image of the quality of the area image or target object that save at first is deleted, is made The quantity of area image in image library maintains specified quantity, using the area image in image library as image to be analyzed.? In practical implementation, the area comprising target object can be extracted from picture frame according to the sequencing of picture frame in video Area image saves current region image to image if the quality of target object meets preset requirement in current region image In library.When the area image in image library is more than specified quantity, the area image saved at first is deleted.Alternatively, by region When image is saved into image library, while the quality of the corresponding target object of the area image is saved, the matter of the target object Amount can be embodied by the image definition or comprehensive quality score of the area image.When the area image in image library is more than to refer to When fixed number amount, the minimum area image of the quality of delete target object.Illustratively, specified quantity can be 10, work as image library In area image quantity be 11 when, that is, be more than specified quantity, then can according to setting redundant rule elimination one open area image. After video or after target object disappears in video, using the area image being finally stored in image library as to be analyzed Image.The image to be analyzed for only choosing specified quantity carries out subsequent characteristic extraction step, can reduce characteristic extraction step Operand.

Another feasible implementation are as follows: the picture frame of specified quantity can be chosen from video as figure to be analyzed The quality of picture, the target object in described image frame meets preset requirement.For example, from the picture frame comprising target object, choosing The picture frame for taking the quality of target object to meet preset requirement is saved into image library.When the area image in image library is more than to refer to When fixed number amount, the picture frame saved at first is deleted.Alternatively, when saving picture frame into image library, while saving the image The quality of the corresponding target object of frame.When the picture frame in image library is more than specified quantity, the quality of delete target object is most Low picture frame.Using the picture frame in image library as image to be analyzed.

Embodiment three:

On the basis of above-described embodiment two, a kind of specific embodiment of method for processing video frequency is present embodiments provided, Fig. 3 shows the flow chart of the specific embodiment.As shown in figure 3, the method for processing video frequency includes the following steps:

Step S302 determines target object in the video of acquisition.

Target object is determined in the video of acquisition, it can be understood as determine target object in each image of the video Position or region in frame.A kind of optional embodiment includes the following steps:

One, the detection for carrying out target object to video by target detection network, obtains each picture frame pair in video The detection block answered.

Target detection network is the depth convolutional neural networks for carrying out target detection to video, can be in video Each picture frame carries out target detection, detect include target object picture frame, and export the corresponding detection of each picture frame Frame.The detection block is to surround the bounding box of target object, includes a target object in each detection block, detection block can be Rectangle is also possible to other shapes.Detection block is for indicating position of the target object in picture frame, when detection block is rectangle When, position can be indicated with the position coordinates of the vertical angles of rectangle.

It should be noted that may include the things similar with target object in video.For example, target object is vehicle When, it include vehicle A and vehicle B in video.At this point, target detection network will output including vehicle A detection block and including vehicle B Detection block.

Two, by target tracker, the mark of detection block is determined according to the similarity of detection block in each picture frame.

Target tracker is for tracking the target object in video according to the testing result of target detection network.It will The continuous multiple images frame input target tracker of detection block is marked, target tracker can be detected according in each picture frame The similarity of frame determines the mark of detection block.For example, target tracker can be according to the distance phase of detection block in each picture frame Like the mark for spending determining detection block.I.e. in adjacent image frame, identical mark, like-identified is arranged in closely located detection block Detection block be used to indicate the same target object.In other words, the mark of detection block is used to indicate the target pair in the detection block Which target object, the i.e. identity information of target object liked.Target tracker can be real using existing target tracking algorism It is existing, it, can also be using depth convolution mind as calculated the distance between the detection block in adjacent image frame using Euclidean distance algorithm Through network implementations.Illustratively, when in video including vehicle A and vehicle B, target tracker can be to vehicle in different images frame The corresponding detection block of A is respectively provided with mark a, is respectively provided with mark b to the corresponding detection block of vehicle B in different images frame.

Three, according to tagged detection block, position of the target object in each picture frame of video is determined.

Since the detection block of like-identified indicates same target object, the detection block with like-identified is in different images frame In position show the motion track of the target object in video, thus realize target according to.In embodiments of the present invention, root According to the position with tagged detection block in different images frame, it can determine target object in each picture frame of video Position.For example, if the vehicle A in video is the target object for needing to carry out attributive analysis, the mark of the corresponding detection block of vehicle A Knowing is a, according to position of the detection block with mark a in different images frame, vehicle A can be determined in each image of video Position in frame.

Step S304, the quality that target object is chosen from video meet the picture frame of preset requirement.

In some embodiments, network can be detected by picture quality and quality is carried out to the picture frame comprising target object Detection, according to picture quality detect network output image quality information, judge the target object in picture frame quality whether Meet preset requirement.Described image quality information includes at least image definition.For some area image, if image matter The image definition of amount detection network output reaches given threshold, and it is default to illustrate that the quality of the target object in the picture frame meets It is required that.If the image definition of picture quality detection network output is not up to given threshold, illustrate the target in the picture frame The quality of object does not meet preset requirement, directly abandons.

In further embodiments, can be believed according to the picture quality for the picture frame that above-mentioned target detection network exports Breath, judges whether the quality of the target object in picture frame meets preset requirement.I.e. target detection network is carrying out target detection While, picture quality can also be detected.

Step S306 extracts the area image comprising target object from the picture frame of selection, by described image frame or obtains The area image arrived is as image to be analyzed.

The quality of target object is met into the picture frame of preset requirement as image to be analyzed；Alternatively, according to above-mentioned target Position of the detection block of network output in picture frame is detected, the image in detection block is extracted using image partition method, is obtained Area image comprising target object, using obtained area image as image to be analyzed.

Step S308 carries out feature extraction to each image to be analyzed by feature extraction network, obtains each to be analyzed The corresponding characteristic pattern of image.

Step S310 chooses the characteristic pattern of specified quantity from obtained characteristic pattern.

Illustratively, it can save the characteristic pattern that feature extraction obtains is carried out to image to be analyzed into feature picture library. When the characteristic pattern in feature picture library is more than specified quantity, the quality for deleting the characteristic pattern or target object that save at first is minimum The corresponding characteristic pattern of image, so that the quantity of the characteristic pattern in feature picture library is maintained specified quantity.

The characteristic pattern of selection is weighted fusion, obtains fusion feature figure by step S312.

After video or after target object disappears in video, will finally be stored in the characteristic pattern in feature picture library by It is weighted fusion according to default weight, obtains fusion feature figure.

Step S314 determines the attribute information of target object according to fusion feature figure.

Fusion feature figure is inputted into attributive analysis network, obtains the attribute letter of the target object of attributive analysis network output Breath.It is understood that existing net with attributes can be divided into preceding attribute sub-network and rear attribute sub-network two parts, preceding category Temper network only includes convolutional layer, operand is smaller, and arithmetic speed is fast, can be run multiple times as above-mentioned feature extraction network； For attribute sub-network as above-mentioned attributive analysis network, arithmetic speed is slower afterwards, can be secondary less for the real-time for ensureing attributive analysis Operation.

It should be noted that the step of being realized above by convolutional neural networks or sub-step (including but not limited to step S302, step S308, step S314 etc.) it can be realized in second processor, second processor can use field-programmable Gate array chip.Field programmable gate array chip can accelerate the calculating process of convolution, further increase method for processing video frequency Operational efficiency.

In order to may be directly applied to above-mentioned target detection network, feature extraction network and attributive analysis network to view Target object in frequency carries out attributive analysis, exports more accurately and reliably as a result, it is desirable in advance to target detection network, feature It extracts network and attributive analysis network is trained.It can be respectively to target detection network, feature extraction network and attributive analysis Network is trained.It is illustrated by taking the training process of target detection network as an example, which includes: acquisition training image Sample set；The training image sample set includes multiple training images, is manually marked to the target object in training image. Target detection network is trained using training sample set, training image is inputted into target detection network, obtains target detection The testing result of network output determines loss according to the testing result of target detection network output and pre-set artificial mark Value.Target detection network is trained based on penalty values.In general, penalty values are to determine actual output and desired output Degree of closeness.Penalty values are smaller, illustrate actual export closer to desired output.Back-propagation algorithm can be used, according to Penalty values adjust the parameter of target detection network, until completing when penalty values converge to preset desired value to target detection net The training of network obtains trained target detection network.

In the prior art, if including vehicle A and vehicle B in video, vehicle A is the target for needing to carry out attributive analysis Object.At some moment, the distance of vehicle A and vehicle B are very small or even completely overlapped, in the moment corresponding picture frame In, mistake may occur for the mark of the detection block of target tracker output, and the detection block of vehicle A may be identified as b, vehicle B Detection block may be identified as a.If the optimal quality of the picture frame, carry out having selected the picture frame when attributive analysis, then The attribute information arrived is the attribute information of vehicle B, rather than the attribute information of vehicle A, therefore the result of attributive analysis will be wrong Accidentally.

It in embodiments of the present invention, will be according to multiple wait divide due to having chosen the corresponding image to be analyzed of multiple images frame Multiple characteristic patterns that analysis image obtains are merged, and obtained fusion feature figure has contained the feature of multiple images to be analyzed, base Attributive analysis is carried out in fusion feature figure, even if wherein there is the image of single error, feature can also be accounted for absolutely mostly by other Several correct images are covered, therefore phenomenon that can be not corresponding with target object to avoid obtained attributive analysis result, are improved The accuracy of finally obtained attributive analysis result.

Example IV:

Corresponding to above-described embodiment two or embodiment three, a kind of video process apparatus is present embodiments provided, referring to fig. 4 institute A kind of structural schematic diagram of the video process apparatus shown, the device include:

Target determination module 41, for determining target object in the video of acquisition；

Image zooming-out module 42, for extracting multiple images to be analyzed from the video；It is wrapped in the image to be analyzed Containing target object；

Attributive analysis module 43 obtains multiple characteristic patterns for carrying out feature extraction to multiple images to be analyzed；It will Multiple characteristic patterns are merged, and fusion feature figure is obtained；The category of the target object is determined according to the fusion feature figure Property information.

In an alternative embodiment, the target determination module 41 can be also used for: by target detection network pair The video carries out the detection of target object, obtains the corresponding detection block of each picture frame in the video；By the detection Frame inputs target tracker, and the mark of the detection block is determined according to the similarity of detection block in each picture frame；According to having The detection block of mark determines position of the target object in each picture frame of the video.

Image zooming-out module 42 can be also used for: the image to be analyzed of specified quantity is extracted from the video；It is described to The quality of target object meets preset requirement in analysis image.

Image zooming-out module 42 can be also used for: from the picture frame comprising target object, extract comprising target object Area image；The quality for choosing target object meets the area image of preset requirement or picture frame is saved into image library；Work as institute When stating the image in image library more than specified quantity, the minimum figure of the quality of the image or target object that save at first is deleted Picture；Using the image in image library as image to be analyzed.

Image zooming-out module 42 can be also used for: detect network to the area image comprising target object by picture quality Or picture frame carries out quality testing；According to the image quality information that described image quality testing network exports, the region is judged Whether the quality of the target object in image meets preset requirement；Described image quality information includes at least image definition.

Attributive analysis module 43 can be also used for: each image to be analyzed input feature vector being extracted network, obtains institute State the corresponding characteristic pattern of each image to be analyzed of feature extraction network output.

In an alternative embodiment, image zooming-out module 42 be can be also used for: choose target from the video The picture frame or area image that the quality of object meets preset requirement are as image to be analyzed；The area image is from the figure As the area image comprising target object extracted in frame；And it is used for: according to the figure of the picture frame of target detection network output As quality information, judge whether the quality of the target object in described image frame meets preset requirement；Described image quality information Including at least image definition；If so, using described image frame as image to be analyzed.Attributive analysis module 43 can also be used In: feature extraction is carried out to each image to be analyzed by feature extraction network, it is corresponding to obtain each image to be analyzed Characteristic pattern；The characteristic pattern of specified quantity is chosen from the obtained characteristic pattern.

Attributive analysis module 43 can be also used for: the multiple characteristic pattern being weighted fusion according to default weight, is obtained To fusion feature figure.

Attributive analysis module 43 can be also used for: the fusion feature figure being inputted attributive analysis network, obtains the category Property analysis network output the target object attribute information.

Video process apparatus provided in an embodiment of the present invention, for carrying out attributive analysis to the target object in video.It is first Target object is first determined in the video of acquisition, multiple images to be analyzed comprising target object is extracted from video, to multiple Image to be analyzed carries out feature extraction, obtains multiple characteristic patterns, multiple characteristic patterns are merged, obtain fusion feature figure, then The attribute information of target object is determined according to fusion feature figure.It, can due to carrying out the fast speed of feature extraction to image Characteristic extraction procedure is performed a plurality of times, multiple characteristic patterns are obtained, and the operand for carrying out attributive analysis to image is big, executes speed Slowly, therefore based on fusion feature figure carry out attributive analysis, only once or it is less execute attributive analysis process secondaryly, guarantee this method Treatment effeciency.The multiple characteristic patterns obtained according to multiple images to be analyzed are merged, obtained fusion feature figure contains The features of multiple images to be analyzed carries out attributive analysis based on fusion feature figure, the accurate of attributive analysis result can be improved Property.

The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letter It describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

The embodiment of the invention also provides a kind of electronic equipment, including image collecting device, memory, processor.It is described Image collecting device, for acquiring image data；The computer that can be run on the processor is stored in the memory Program, the processor realize method documented by previous embodiment two or embodiment three when executing the computer program.

It is apparent to those skilled in the art that for convenience and simplicity of description, the electronics of foregoing description The specific work process of equipment, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Further, the present embodiment additionally provides a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, the computer program executes above-mentioned previous embodiment two or embodiment three when being run by processor The step of provided method, specific implementation can be found in embodiment two or embodiment three, and details are not described herein.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.

Claims

1. a kind of method for processing video frequency characterized by comprising

Target object is determined in the video of acquisition；

2. the method according to claim 1, wherein the step for determining target object in the video of acquisition Suddenly, comprising:

The detection for carrying out target object to the video by target detection network, obtains each picture frame pair in the video The detection block answered；

3. the method according to claim 1, wherein extracting the step of multiple images to be analyzed from the video Suddenly, comprising:

The image to be analyzed of specified quantity is extracted from the video；The quality of target object meets pre- in the image to be analyzed If it is required that.

4. according to the method described in claim 3, it is characterized in that, extracting the image to be analyzed of specified quantity from the video The step of, comprising:

When the image in described image library is more than specified quantity, the quality of the image or target object that save at first is deleted most Low image；

Using the image in image library as image to be analyzed.

5. according to the method described in claim 4, it is characterized in that, the method also includes:

According to the image quality information that described image quality testing network exports, the target object in the area image is judged Whether quality meets preset requirement；Described image quality information includes at least image definition.

6. the method according to any one of claim 3~5, which is characterized in that carried out to multiple images to be analyzed Feature extraction, the step of obtaining multiple characteristic patterns, comprising:

Each image to be analyzed input feature vector is extracted into network, it is to be analyzed to obtain each of described feature extraction network output The corresponding characteristic pattern of image.

7. the method according to claim 1, wherein extracting the step of multiple images to be analyzed from the video Suddenly, comprising:

The picture frame or area image that the quality that target object is chosen from the video meets preset requirement are as figure to be analyzed Picture；The area image is the area image comprising target object extracted from described image frame；

Feature extraction is carried out to each image to be analyzed by feature extraction network, it is corresponding to obtain each image to be analyzed Characteristic pattern；

8. the method according to the description of claim 7 is characterized in that the quality for choosing target object from the video meets in advance If it is required that picture frame or area image as image to be analyzed the step of, comprising:

According to the image quality information of the picture frame of target detection network output, the matter of the target object in described image frame is judged Whether amount meets preset requirement；Described image quality information includes at least image definition；

If so, using described image frame as image to be analyzed, alternatively, extracting the area comprising target object from described image frame Area image is as image to be analyzed.

9. it is special to obtain fusion the method according to claim 1, wherein the multiple characteristic pattern is merged The step of sign figure, comprising:

10. the method according to claim 1, wherein determining the target object according to the fusion feature figure Attribute information the step of, comprising:

The fusion feature figure is inputted into attributive analysis network, obtains the target object of the attributive analysis network output Attribute information.

11. a kind of video process apparatus characterized by comprising

Image zooming-out module, for extracting multiple images to be analyzed from the video；It include target in the image to be analyzed Object；

Attributive analysis module obtains multiple characteristic patterns for carrying out feature extraction to multiple images to be analyzed；By multiple institutes It states characteristic pattern to be merged, obtains fusion feature figure；The attribute information of the target object is determined according to the fusion feature figure.

12. a kind of electronic equipment, which is characterized in that including memory and processor；

The computer program that can be run on the processor is stored in the memory, the processor executes the calculating The step of method for processing video frequency described in any one of the claims 1~10 is realized when machine program.

13. electronic equipment according to claim 12, which is characterized in that the processor includes first processor and second Processor；The first processor is used for the step of executing non-convolution in the method for processing video frequency, the second processor In the step of executing progress convolution in the method for processing video frequency.

14. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium It is, the step of method described in any one of the claims 1~10 is executed when the computer program is run by processor Suddenly.