CN110418112A

CN110418112A - A kind of method for processing video frequency and device, electronic equipment and storage medium

Info

Publication number: CN110418112A
Application number: CN201910736109.4A
Authority: CN
Inventors: 刘建成; 由光鑫; 辛彦哲; 屈秋竹
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2019-11-05

Abstract

This disclosure relates to method for processing video frequency and device, electronic equipment and storage medium, wherein this method includes obtaining multiple video frames to decoding video stream, the target object in the video flowing comprising acquiring in real time；According to frame strategy selected part video frame to be processed from the multiple video frame is selected, the video frame to be processed is used to characterize the video frame that picture quality in multiple video frames comprising the target object meets real time monitoring demand；The structured message that the corresponding target object is extracted from the video frame to be processed, is sent to terminal for the structured message.Using the disclosure, it is able to satisfy the demand of real time monitoring.

Description

A kind of method for processing video frequency and device, electronic equipment and storage medium

Technical field

This disclosure relates to technical field of computer vision more particularly to a kind of method for processing video frequency and device, electronic equipment And storage medium.

Background technique

In the application scenarios of target object monitoring, target object can be adopted by acquiring equipment (such as camera) Collection result is pooled and gives the monitoring such as analysis and tracking and positioning to target object after video flowing and handle by collection.Target is regarding 5-10 seconds can generally be continued from occurring to disappearing in frequency picture, in the case where video frame rate is 25, up to a hundred mesh can be generated Mark is captured, these candid photographs contain many redundancies, in the limited situation of computing resource, all need not be spy Sign extraction etc., the operand that otherwise will lead to monitoring processing is excessive, and the excessive problem of resources occupation at least influences the reality of monitoring Shi Xing.The demand of real time monitoring is not achieved.However, in this regard, there is no effective solution in the related technology.

Summary of the invention

The present disclosure proposes a kind of technical solutions of video processing.

According to the one side of the disclosure, a kind of method for processing video frequency is provided, which comprises

Multiple video frames are obtained to decoding video stream, include the target object acquired in real time in the video flowing；

According to frame strategy selected part video frame to be processed from the multiple video frame is selected, the video frame to be processed is used Picture quality meets the video frame of real time monitoring demand in multiple video frames of the characterization comprising the target object；

The structured message that the corresponding target object is extracted from the video frame to be processed, the structuring is believed Breath is sent to terminal.

It since part video frame to be processed can be extracted by selecting frame strategy is not to all views using the disclosure Frequency frame is handled, and video frame to be processed is used to characterize picture quality in multiple video frames comprising the target object and meets Therefore the video frame of real time monitoring demand can reduce operand in the case where meeting real time monitoring conditions of demand, reduce accordingly Resources occupation rate.

In possible implementation, it is described multiple video frames are obtained to decoding video stream before, the method also includes:

Video render request is received, to extract the video flowing according to video render request；

The structured message is sent to terminal, comprising:

The structured message is sent to the terminal for issuing the video render request.

Using the disclosure, the video render request that can be issued for terminal select at frame and structured message extraction Reason, is adaptive trigger mechanism, rather than is also held always in terminal idle state (not initiating the state of video playing request) This selects frame and structured message extraction process to row, therefore, reduces the monitoring operand to real-time acquisition target object, Ke Yiman The demand monitored in real time in sufficient monitoring scene.

It is described according to selecting frame strategy selected part video to be processed from the multiple video frame in possible implementation Frame, comprising:

Extract it is described select frame strategy, select frame strategy to including same target object in the multiple video frame according to described At least two video frames carry out image quality measure respectively, obtain at least two assessment results；

At least two assessment result is subjected to fusion operation, obtains the fusion calculation result；

The video frame to be processed is filtered out from the multiple video frame according to the fusion calculation result.

Using the disclosure, after obtaining at least two assessment results according to image quality measure, can be commented according at least two Estimate result and obtains fusion calculation result.It is filtered out from the multiple video frame according to the fusion calculation result described to be processed Video frame, the obtained video frame to be processed are suboptimum frame.Since suboptimum frame image quality is lower than original video frame, it adopts Structured message extraction is carried out with suboptimum frame, operand can be reduced, it is corresponding to reduce resources occupation rate.

In possible implementation, it is described filtered out from the multiple video frame according to the fusion calculation result it is described Video frame to be processed, comprising:

Blending image is obtained according to the fusion calculation result；

The fusion calculation result be greater than it is described select the target score configured in frame strategy in the case where, will be greater than the mesh The blending image of score value is marked as the video frame to be processed.

Using the disclosure, for blending image, the mesh configured in frame strategy is selected greater than described in fusion calculation result In the case where marking score value, the blending image of the target score will be greater than as the video frame to be processed, i.e., really by fusion The structured message extracted needed for protecting is effective, and can reduce operand, corresponding to reduce resources occupation rate.

In possible implementation, it is described multiple video frames are obtained to decoding video stream after, the method also includes:

Target object in the multiple video frame is identified, the target object is extracted；

Classify to the target object, obtains at least one classification results.

Using the disclosure, it can classify to target object, preferably to carry out classification statistics.

It is described at least two videos in the multiple video frame including same target object in possible implementation Frame carries out image quality measure, before obtaining at least two assessment results, the method also includes:

According at least one described classification results, the target object in the multiple video frame is sorted out, will include At least two video frames of same target object are placed in the same categorization results.

It using the disclosure, is carried out in classification statistics according at least one classification results, can will include same target object At least two video frames be placed in the same categorization results, facilitate classified finishing.

Position detection is carried out to the target object in the multiple video frame, obtains at least one mesh of the target object Cursor position information.

Using the disclosure, position detection can be carried out to target object, so that the target position information to be used to finally regard In the synthesis of frequency stream.

In possible implementation, the structuring letter of the corresponding target object is extracted from the video frame to be processed Breath, is sent to terminal for the structured message, comprising:

Feature extraction and/or attributes extraction are carried out to the target object in the video frame to be processed, obtain being believed by feature The structured message that breath and/or attribute information are constituted；

The structured message of the same target object of correspondence and target position information are sent to the terminal, by the end After end carries out structured message rendering, obtained spatial cue and the target position information are added in the video flowing simultaneously It plays together.

Using the disclosure, by sorting out statistics, classified finishing, after being extracted to the structured message of same classification, will examine The structured message of the target position information and extraction that measure is synthesized in video flowing, plays the video flowing, so that it may realization pair The real time monitoring of target object.

According to the one side of the disclosure, a kind of video process apparatus is provided, described device includes:

Decoding unit includes to acquire in real time in the video flowing for obtaining multiple video frames to decoding video stream Target object；

Frame unit is selected, it is described for according to selecting frame strategy selected part video frame to be processed from the multiple video frame Video frame to be processed is used to characterize picture quality in multiple video frames comprising the target object and meets real time monitoring demand Video frame；

Information transmitting unit, for extracting the structuring letter of the corresponding target object from the video frame to be processed Breath, is sent to terminal for the structured message.

In possible implementation, described device further include:

Receiving unit, for receiving video render request, to extract the video according to video render request Stream；

The information transmitting unit, for the structured message to be sent to the end for issuing the video render request End.

It is described to select frame unit in possible implementation, it is used for:

Blending image is obtained according to the fusion calculation result；

In possible implementation, described device further includes taxon, is used for:

Classify to the target object, obtains at least one classification results.

In possible implementation, described device further includes sorting out unit, is used for:

In possible implementation, described device further includes detection unit, is used for:

In possible implementation, the information transmitting unit is used for:

According to the one side of the disclosure, a kind of electronic equipment is provided, comprising:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: execute above-mentioned method for processing video frequency.

According to the one side of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with Instruction, the computer program instructions realize above-mentioned method for processing video frequency when being executed by processor.

In the embodiments of the present disclosure, multiple video frames are obtained to decoding video stream, includes to adopt in real time in the video flowing The target object of collection；According to selecting frame strategy selected part video frame to be processed from the multiple video frame, the view to be processed Frequency frame is used to characterize the video frame that picture quality in multiple video frames comprising the target object meets real time monitoring demand；From The structured message that the corresponding target object is extracted in the video frame to be processed, is sent to end for the structured message End.Using the disclosure, due to not handling all frames, according to select frame strategy come selected part video to be processed Therefore frame reduces operand, reduce resources occupation rate accordingly, to meet the need monitored in real time in monitoring scene It asks.

It should be understood that above general description and following detailed description is only exemplary and explanatory, rather than Limit the disclosure.

According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and those figures show meet this public affairs The embodiment opened, and together with specification it is used to illustrate the technical solution of the disclosure.

Fig. 1 shows the flow chart of the method for processing video frequency according to the embodiment of the present disclosure.

Fig. 2 shows the flow charts according to the method for processing video frequency of the embodiment of the present disclosure.

Fig. 3, which is shown, according to an embodiment of the present invention obtains the schematic diagram of blending image by fusion calculation result.

Fig. 4 shows processing system for video schematic diagram in the application example according to the embodiment of the present disclosure.

Fig. 5 shows the schematic diagram using method for processing video frequency in example according to the embodiment of the present disclosure.

Fig. 6 shows the signal using the video flowing shown in example with structured message according to the embodiment of the present disclosure Figure.

Fig. 7 shows the system using tool alternate frames logic in example according to the embodiment of the present disclosure to video flow processing effect Data comparison schematic diagram.

Fig. 8 shows the block diagram of the video process apparatus according to the embodiment of the present disclosure.

Fig. 9 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.

Figure 10 shows the block diagram of the electronic equipment according to the embodiment of the present disclosure.

Specific embodiment

Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.

Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.

The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle term "at least one" indicate a variety of in any one or more at least two any combination, it may for example comprise A, B, at least one of C can indicate to include any one or more elements selected from the set that A, B and C are constituted.

In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.

Video structure analyzing can be applied to target object detection, recongnition of objects, target object classification, target In the scenes such as object monitor, the target in acquired video image can be extracted by artificial intelligence analysis' means such as deep learning Object classifies to target object, and classification results can be different classes of for pedestrian, vehicle etc..It can also be into one after classification The structured message (such as attribute information) that step extracts target object includes color characteristic, age characteristics, characteristic of division, model spy Sign, velocity characteristic etc..Structured message can also include characteristic information, such as clothes, hair, man is also in addition to attribute information It is woman etc., and the trace information for characterizing target object position is carried out together with the structured message exhibition of terminal side Show, or establishes mapping between the two in trace information and the structured message to facilitate the retrieval to structured message.Wherein, will The structured message that structured analysis obtains is carried out to video and gives real-time exhibition, in addition to that can detect in real time and show mesh The corresponding structured message of object is marked, also the real-time detection and treatment effeciency that improve video structure analyzing are proposed higher Demand is only improved the real-time detection and treatment effeciency of video structure analyzing by taking target object monitoring scene as an example, just can be with Meets the needs of real time monitoring.

For video carries out structured analysis, structured analysis can be carried out to every frame image in video flowing, it is real When provide corresponding display, although the accuracy of structured message can be clear with object however, ensure that real-time It spends and changes, and resource occupation is high.It can also postpone display structure information, however, not can guarantee real-time, the targets such as need Object leaves believing as a result, real-time structuring can not be provided for acquisition equipment (such as camera) ability display structureization analysis completely Breath display.It can be seen that: for both methods for video carries out structured analysis, the structured analysis result of presentation is all not complete enough It is kind.

Using the disclosure, by selected part video frame to be processed, rather than structured message analysis is carried out to all frames, The monitoring operand to real-time acquisition target object is reduced, and reduces resources occupation rate, improves video flowing structuring The process performance of information real-time display；Only when user's preview video and just adaptive triggering knot when issuing video playing request Structure information analysis, by analyze result be sent to initiate video playing request terminal, it is possible to reduce to graphics processor (GPU, Graphic Processing Unit) etc. computing resources use.

Fig. 1 shows the flow chart of the method for processing video frequency according to the embodiment of the present disclosure, and this method is applied to video processing dress It sets, for example, the device can be executed by terminal device or server or other processing equipments, wherein terminal device can be use Family equipment (UE, User Equipment), mobile device, cellular phone, wireless phone, personal digital assistant (PDA, Personal Digital Assistant), handheld device, calculate equipment, mobile unit, wearable device etc..It is some can In the implementation of energy, this method can be in such a way that processor calls the computer-readable instruction stored in memory come real It is existing.As shown in Figure 1, the process includes:

Step S101, multiple video frames are obtained to decoding video stream, includes the target acquired in real time in the video flowing Object.

In one example, after background server receives video render request, background server flows into requested video Row decoding, obtains multiple video frames.In video streaming comprising the target object acquired in real time, such as to the vehicle travelled on road In the scene being monitored, target object includes at least vehicle, collected in addition to vehicle, there are also the environmental information on periphery, than Building, lane markings, the traffic lights facility on such as periphery.Equipment (such as camera) is acquired in acquisition, in addition to being directed to vehicle , the pedestrian etc. that roadside, crossroad wait traffic lights can also be acquired, to collect all kinds of target objects.

Step S102, described wait locate according to selecting frame strategy selected part video frame to be processed from the multiple video frame Reason video frame is used to characterize the video that picture quality in multiple video frames comprising the target object meets real time monitoring demand Frame.

In one example, can according to the fusion calculation for selecting frame strategy to obtain as a result, from the multiple video frame selection portion Divide video frame to be processed.It should be pointed out that the fusion calculation is as a result, the synthesis that frame strategy can be selected to obtain according to selects frame to patrol Volume as a result, the Intelligent Selection frame of the disclosure is not for all frames, nor indiscriminate selecting video frame, but according to the synthesis The part for selecting frame logical consequence to choose from the multiple video frame video frame to be processed includes the figure of the target object Image quality amount more meets real time monitoring demand.

In one example, background server, which extracts, selects frame strategy, and frame strategy is selected to include how to carry out the processing of Intelligent Selection frame Logic selects frame strategy can be at least two video frames in multiple video frames including same target object (such as vehicle) according to this Image quality measure is carried out respectively, obtains at least two assessment results, and at least two assessment results are subjected to fusion operation, are obtained Fusion calculation result.Video frame to be processed is filtered out from multiple video frames according to the fusion calculation result.For example, there is 100 Video frame, the video frame comprising vehicle have 50, and the same automobile video frequency frame of the corresponding same license plate mark has 30, then right This 30 video frames carry out image quality measure and information merges to obtain fusion calculation as a result, according to fusion calculation result from 30 Video frame to be processed is filtered out in video frame, such as 10.A series of processing in this way, substantially reduce video to be processed Frame data amount, therefore, for the target object acquired in real time, data volume is reduced, then operand reduces, to facilitate subsequent More efficiently carry out the analysis of structured message.

Step S103, the structured message that the corresponding target object is extracted from the video frame to be processed, by institute It states structured message and is sent to terminal.

In one example, it is described multiple video frames are obtained to decoding video stream before, the method also includes background servers Video render request is received, to extract the video flowing according to video render request.Extracting the video flowing Afterwards, multiple video frames can be obtained to the decoding video stream, include the target object acquired in real time, root in the video flowing According to frame strategy selected part video frame to be processed from the multiple video frame is selected, the video frame to be processed, which is used to characterize, includes Picture quality meets the video frame of real time monitoring demand in multiple video frames of the target object.Background server can also lead to The neural network based on deep learning (figure convolutional neural networks) is crossed to structured message analysis is carried out in video frame to be processed, is obtained To the structured message of corresponding target object, then obtained structured message is sent to and issues above-mentioned view by the background server The terminal of frequency stream broadcasting request.

Using the disclosure, Intelligent Selection frame on the one hand may be implemented: to multiple video frames comprising acquiring target object in real time Therefrom selected part video frame is that structured message analysis is carried out to the target object in partial video frame, rather than to all Frame all carries out structured message analysis, so as to reduce resources occupation rate caused by reading all frames, more due to treatment effeciency High available real-time architecture information analysis result more high-definition.On the other hand the adaptive of structured message may be implemented It should show: structured message analysis is carried out to the terminal for issuing video render request, rather than in the state of no request Structured message analysis is carried out always, only just starts to do structured message analysis when user has preview demand and is shown, and It is not to show the information always.That is, in the disclosure, due to not handling all frames, according to selecting frame plan Slightly carry out selected part video frame to be processed, moreover, the structured message handled part video frame to be processed is sent Terminal (terminal is the terminal for issuing video render and requesting) is given, this selects frame and structured message according to video playing request triggering Processing, rather than also execute this always in terminal idle state (not initiating the state of video playing request) and frame and structuring is selected to believe Extraction process is ceased, therefore, not only reduces the monitoring operand to real-time acquisition target object, and reduce resources occupation Rate can satisfy the demand monitored in real time in monitoring scene.

Fig. 2 shows the flow chart according to the method for processing video frequency of the embodiment of the present disclosure, this method is applied to video processing dress It sets, for example, the device can be executed by terminal device or server or other processing equipments, wherein terminal device can be use Family equipment (UE, User Equipment), mobile device, cellular phone, wireless phone, personal digital assistant (PDA, Personal Digital Assistant), handheld device, calculate equipment, mobile unit, wearable device etc..It is some can In the implementation of energy, this method can be in such a way that processor calls the computer-readable instruction stored in memory come real It is existing.As shown in Fig. 2, the process includes:

Step S201, multiple video frames are obtained to decoding video stream, includes the target acquired in real time in the video flowing Object.

Step S202, the target object in multiple video frames is identified, extracts target object, to target object into Row classification, obtains at least one classification results.

It may include: vehicle, pedestrian etc. comprising the target object that acquires in real time in video flowing in one example, remove this it Outside, the environmental information (building, lane markings, the traffic lights facility on such as periphery) on periphery can also be acquired.It can pass through The target object acquired in real time is extracted from the video flowing based on the sorter network of deep learning and is classified to it, is obtained To at least one classification results.The target object in the multiple video frame can be carried out according at least one classification results Sort out, at least two video frames comprising same target object are placed in the same categorization results, with according to selecting frame strategy to obtain After the fusion calculation result and categorization results that arrive, the selected part video frame to be processed from the multiple video frame is described wait locate Reason video frame is used to characterize the video that picture quality in multiple video frames comprising the target object meets real time monitoring demand Frame.Structured message is sent to terminal, terminal by the structured message that corresponding target object is extracted from video frame to be processed For the terminal for issuing the video render request.

Step S203, frame strategy is selected in adaptive triggering, carries out synthesis to all frames of the same target identified and frame is selected to patrol Judgement, including picture quality marking are collected, the video frame that image quality score is more than the threshold value configured is chosen, frame strategy is selected according to this Obtained synthesis selects frame logical consequence and classification results, and selected part video frame to be processed, described from the multiple video frame Video frame to be processed is used to characterize picture quality in multiple video frames comprising the target object and meets real time monitoring demand Video frame.

In one example, after being handled respectively categorization results, for example, after sorting out respectively to vehicle, pedestrian etc., to including people's Image carries out image quality measure, carries out image quality measure etc. to the image comprising vehicle.

Above-mentioned steps S202- step S203 can be executed before step S204 selects frame, can also select frame in step S204 It executes later.That is, the present disclosure is not limited to first classifying, sorting out, then the target object in specific classification is handled, Target object in selected video frame to be processed is classified and is sorted out again after integrated treatment first being carried out to target object, It can be from the technical solution of selected part frame in all frames all within the protection scope of the disclosure.

Step S204, the structured message that corresponding target object is extracted from the video frame to be processed, by the structure Change information and is sent to video cache preprocessing module.

In one example, background server, which extracts, selects frame strategy, and frame strategy is selected to include how to carry out the processing of Intelligent Selection frame Logic is extracted at least two video frames in the same categorization results comprising same target object, is handled, for example, extremely The same target object for including in few two video frames is pedestrian, then selects frame strategy can be to including in multiple video frames according to this At least two video frame of pedestrian carries out image quality measure respectively, obtains at least two assessment results, at least two are commented Estimate result and carry out fusion operation, obtains fusion calculation result.According to the fusion calculation result filtered out from multiple video frames to Handle video frame.For example, there is 150 video frames, the video frame comprising pedestrian has 70, and the video frame of the corresponding same pedestrian has 50, image quality measure then is carried out to this 50 video frames and information merges to obtain fusion calculation as a result, being counted according to fusion It calculates result and filters out video frame to be processed from 50 video frames, such as 20.A series of processing in this way, substantially reduce Video requency frame data amount to be processed, therefore, for the target object acquired in real time, data volume is reduced, then operand reduces, from And facilitate the subsequent analysis for more efficiently carrying out structured message.

Step S205, video pre-filtering module is the location information of target object, the structured message of target object and original Beginning video flowing is fused together and original video frame sends jointly to terminal, and the terminal is to issue the video render to ask The terminal asked.

In one example, background server can be treated by the neural network (figure convolutional neural networks) based on deep learning It handles and carries out structured message analysis in video frame, obtain the structured message of corresponding target object, then the background server Obtained structured message is sent to the terminal for issuing above-mentioned video render request.

Using the disclosure, Intelligent Selection frame on the one hand may be implemented: to multiple video frames comprising acquiring target object in real time Therefrom selected part video frame, be in partial video frame target object carry out structured message analysis, and by classification and Sort out, structured message analysis is carried out to same class target object, in addition to the accuracy of identification to same class target object can be improved, The treatment effeciency that structured message analysis can also be optimized, due to being for same class target object and all being carried out to all frames Structured message analysis can obtain so as to reduce resources occupation rate caused by reading all frames since treatment effeciency is higher To real-time architecture information analysis result more high-definition.On the other hand the adaptive displaying of structured message may be implemented: Structured message analysis is carried out to the terminal for issuing video render request, rather than is also carried out always in the state of no request Structured message analysis only just starts to do structured message analysis and be shown when user has preview demand, rather than always Show the information.

In one example, the processing logic of above-mentioned steps S201- step S205 can integrate and set in an independent hardware It, can also be using soft or hard integrated equipment or the equipment of multicomponent cooperative cooperating, as long as in any one processor of equipment in standby In comprising above-mentioned processing logic just within the protection scope of the disclosure.Firstly, the requested view of background server access terminal Frequency stream is decoded, and obtains multiple video frames, to the target object (such as pedestrian, vehicle, non-motor vehicle) in multiple video frames Target detection and tracking and positioning are carried out, for for the same target object, can be obtained as time goes by same for this Multiple location informations of one target object.Then, to the target object candid photograph figure tracked according to select frame strategy carry out logic Frame is selected, picks out partial video frame to get the candid photograph figure of structured message analysis demand is met to picture quality.To target object Such as analysis of pedestrian, vehicle, non-motor vehicle progress structured message, target detection tracking information is (as same in being directed to this Multiple location informations of target object) it is merged with acquired structured analysis information, finally, by the structuring of target object Information is sent to access terminal, carries out real-time rendering in the access terminal for initiating video playing request and Dynamic Display includes structure Change the video flowing of information.By the Intelligent Selection frame strategy of selected section video frame, objective image quality can be calculated in real time, it is only right The preferred partial frame of picture quality (or suboptimum frame) is analyzed, and to reduce calculating demand, is improved system throughput, is improved essence Degree.It, can be according to deep learning and in the case that the Intelligent Selection frame strategy uses the neural fusion based on deep learning As a result training and dynamic adjustment network parameter, carry out Intelligent Selection frame using the neural network after training, structuring can be improved The real-time accuracy of information, real-time rendering structured message on real-time video flowing, to meet the reality in monitoring scene When property demand.Wherein, depth learning technology is detected, tracks and is analyzed to pedestrian, the vehicle amount etc. in real-time video, is extracted Structured message simultaneously gives real-time display.The structured attributes of pedestrian include but is not limited to gender, age, dress ornament etc., vehicle Structured attributes include but is not limited to type of vehicle, vehicle model, body color, license plate number etc.；The attribute packet of non-motor vehicle Include but be not limited to type, color etc..By depth learning technology and Intelligent Selection frame algorithm, it is real-time to improve video flowing structured message The process performance of display improves the handling capacity of system.Using depth learning technology and Intelligent Selection frame algorithm, several suboptimums are provided Frame improves structured message real-time accuracy.The only adaptive triggering video structure analyzing of ability when user's preview video, subtracts Few use of the system to computing resources such as GPU, guarantees system reliability, maintainability, availability.

In possible implementation, during Intelligent Selection frame, the figure that can be obtained according to above-mentioned fusion calculation result As mass fraction is greater than the view in the case where selecting the target score configured in frame strategy, by image quality score greater than target score Frequency frame is as video frame to be processed.The structured message that corresponding target object is extracted from video frame to be processed, by structuring Information is sent to terminal, and terminal is to issue the terminal of video render request.In one example, according to above-mentioned fusion calculation result After obtaining blending image, fusion calculation result be greater than it is described select the target score configured in frame strategy in the case where, will be greater than The blending image of the target score is as the video frame to be processed.The video frame to be processed can be compared for picture quality The suboptimum frame of original video frame suboptimum, due to being to the processing of the structured message of suboptimum frame, picture quality is lower, therefore, can be with Operand is reduced, is economized on resources.

Fig. 3 show it is according to an embodiment of the present invention the schematic diagram of blending image is obtained by fusion calculation result, according to upper It states fusion calculation result and obtains and by taking the multiframe input picture comprising target object A as an example, include in an example of blending image Two video frames of object A at present, are denoted as video frame 100 and video frame 101 respectively, and video frame 100 and video frame 101 can be with Are as follows: after the adaptive triggering disclosure selects frame strategy, synthesis is carried out to all frames of the same target identified and selects frame logic judgment Image quality measure marking meets expected two video frames and can be for example, giving a mark for image quality measure in the process Picture quality resolution ratio is slightly below other video frames but still can be applied to the extraction of video frame structure information, and not shadow The case where ringing the accuracy rate that structured message extracts；It can also be that image has the case where correlation etc. in two video frames, The case where not influencing the accuracy rate of structured message extraction, all within the protection scope of the disclosure.Two video frames will be directed to Image quality measure marking result, two video frames are melted for the feature input figure convolutional neural networks of target object A Operation is closed, blending image is obtained according to fusion calculation result, blending image is denoted as video frame 201, also includes in video frame 201 Same target object A.The picture quality of blending image is the suboptimum frame for meeting real time monitoring demand, passes through the structure to suboptimum frame Change information processing, since picture quality is lower than the extracted original video frame from video flowing, operand can be reduced, It economizes on resources, and the suboptimum frame image quality resolution ratio is lower than the extraction for not influencing structured message in the case of the original video, To not influence the accuracy rate of structured message extraction.

In possible implementation, the structured message of the corresponding target object is extracted from video frame to be processed In the process, feature extraction (such as hair) is carried out to the target object in the video frame to be processed and/or attributes extraction is (corresponding Hair color), obtain the structured message being made of characteristic information and/or attribute information.Wherein, the attributive character of extraction may be used also To include color characteristic, age characteristics, characteristic of division, the aspect of model and velocity characteristic etc..It can be obtained to decoding video stream After multiple video frames, position detection is carried out to the target object in multiple video frames, obtains at least one mesh of target object Cursor position information.Then, the structured message of the same target object of correspondence and target position information are sent to terminal, by end After end carries out structured message rendering, obtained spatial cue and corresponding target position information are added in video flowing and together It is played in terminal.

Using example:

Fig. 4 shows the processing system for video schematic diagram according to the embodiment of the present disclosure, (is such as imaged based on equipment is acquired in Fig. 4 Head), the video structural server with the processing of above-described embodiment video, video playing requesting terminal constitute and realize that the disclosure is real Apply a whole software and hardware system scheme of example.Video flowing is acquired by camera 12, a certain frame image such as video in video flowing Shown in frame 11, including the wagon flow that vehicle on road is constituted, the target objects such as pedestrian, speed limit mark on pavement, parking identify, Video flowing is output to video structural server 13 by camera 12, carries out vehicle, pedestrian by video structural server 13 Deng the target object detection change in location of such as target object (detect), after identifying target object (such as vehicle and/or pedestrian) Carry out structured message analysis, the corresponding video structure of target position information, target object for the target object that will test Change information and original video stream is fused together, obtains fusion calculation as a result, fusion calculation result is sent to sending playing request Terminal 14, and the displaying interface for being output to terminal 14 is shown, realize the modules of present treatment process as shown in figure 4, The each structured message for showing interface and being directed to each target object for being shown in terminal 14 is as shown in Figure 5.

Fig. 5 shows the schematic diagram using method for processing video frequency in example according to the embodiment of the present disclosure, can pass through realization Modules (Video decoding module, module of target detection, the selection logic module, adaptation module, structuring of present treatment process Analysis module, video preview mould is examined and video playback module) processing of Lai Zhihang video structural information, using including that target is examined The processing logic such as survey, select frame logic, structured analysis, structured message video display.Video decoding module is responsible for camera Video flowing be decoded.Wherein, Video decoding module is used to receive the requested video flowing of terminal for issuing playing request, right Video flowing is decoded, and obtains multiple video frames comprising target object, and module of target detection is for carrying out tracking and positioning, with inspection Target object (such as vehicle, pedestrian) target position information in each video frame is measured, it can be smart in complex background environment Really identify the real time position of vehicle, pedestrian etc..Select frame logic module for according to selecting frame strategy to select from all video frames Video frame to be processed is output to structured analysis module by part video frame to be processed out, and structured analysis module is used for The analysis that the structured message for target object is carried out to the video frame to be processed comprising target object, in video analysis, Target object can generate up to a hundred target candid photograph figures in video pictures, if capturing figure to these targets all carries out structure The analysis (such as characteristic attribute analysis) for changing information, needs to occupy a large amount of computing resource of system.In the situation that computing resource is limited Under, it can only select the target candid photograph figure of several suboptimums to carry out the analysis (such as characteristic attribute analysis) and structure of structured message Change the extraction of information.Select frame logic module can the neural network based on deep learning such as target following, sorter network etc. to defeated The multiple video frames comprising target object entered carry out picture quality marking, are more than the score value (ratio of configuration to image quality score Such as each threshold value such as 0.7,0.8,0.9,0.95) video frame respectively extract primary structure information (such as characteristic attribute).In order into One step reduces the use to computing resource, may be incorporated into adaptive model, it may be assumed that for per video flowing all the way, only terminal is logical It is just enabled when crossing video playback module and/or the video preview module request video flowing and selects frame logic module.It should be pointed out that It can be respectively set as shown in Figure 4 and select frame logic module and adaptive trigger module, also can be set and have adaptive triggering and connect Mouth selects frame logic module (adaptively will trigger and select frame to be integrated in one to select in frame logic module), using adaptive touching In the case where sending out interface, for every video flowing all the way, when receiving video playback module or video preview module request video flowing, The road video flowing can just be activated selects frame abstraction function, and the part frame to be processed chosen is output to structured analysis module and is gone forward side by side The analysis (such as characteristic attribute analysis) of row structured message and the extraction process of structured message.Structured analysis module can adopt With such as feature extraction of the neural network based on deep learning and/or attributes extraction network, to analyze the knot of each target object Structure information, for example, the structured attributes of vehicle include type of vehicle, vehicle model, body color, license plate number etc.；Non- machine The structured attributes of motor-car include type, color etc.；The structured attributes of pedestrian include gender, age, dress ornament, hair color Deng.The structure of target position information, the identification of structured analysis module that video preview module is used to module of target detection detect Change information and original video stream to be fused together, location information, structured message can be placed on supplemental enhancement information (SEI, Supplemental Enhancement Information) in message and original video frame is output to video playing mould together Block is shown.Video playback module can be for based on hypertext markup language (H5, Hyper Text Markup Language) the player of java script (JS, JavaScript), can (JS be the library Javascript, can be examined according to JS Measure the embedded video on webpage and enable them to become response type element) judge terminal type after remove control interface label again The broadcasting of the interface H5 is walked, for playing the video for having structured message, before being played, video playback module can parse video The structured message of target is carried out real-time rendering, the real-time Dynamic Display structured message by the SEI message in frame.

Fig. 6 shows the signal using the video flowing shown in example with structured message according to the embodiment of the present disclosure Figure.It in practical applications, can also be with the knot of real-time exhibition target object for the relative target object under the scene of background complexity Structure information realizes that structured message shows the real-time synchronization with video playing, the structuring of real-time Dynamic Display target object Information.And adaptive triggering interface is used, and the occupancy to system resources in computation can be more reduced, it is effective to guarantee target pair As monitoring and the real-time of counter structure information Dynamic Display, for example, the GPU card of a P4 can handle the resolution of 12 tunnels simultaneously Rate is the video flowing of 1920x1080.As shown in figure 5, there is multiple target objects (vehicle and pedestrian) in current video frame, wherein Target object is the pedestrian cycled, and corresponding structured message is information 21, and the information content shown includes: coat face Color: yellow；Hair color: yellow；Hair style: long hair；Gender: man；Age: adult；Shoes color: black etc..Target object is Pedestrian on pavement, corresponding structured message are information 22, and the information content shown includes: coat color: blue； Hair color: black；Hair style: bob；Age: adult；Shoes color: black etc..Target object is the Dongfeng brand car in wagon flow, Its corresponding structured message is information 23, and the information content shown includes: brand: east wind；Vehicle: east wind-well-to-do level V22； Body color: white；License plate: saliva xxx ... etc..Target object is the Tang Jun board car in wagon flow, corresponding structured message For information 24, the information content shown includes: brand: Tang Jun；Vehicle: Tang Jun-match water chestnut microcaloire；Body color: white etc..Mesh Marking object is the Yangze river and Huai river board car in wagon flow, and corresponding structured message is information 25, and the information content shown includes: product Board: Yangze river and Huai river；Vehicle: the auspicious wind of Yangze river and Huai river-；Body color: red etc..

Fig. 7 shows the system using tool alternate frames logic in example according to the embodiment of the present disclosure to video flow processing effect Data comparison schematic diagram.In the test of practical application, comparative test has been done to whether system has alternate frames logic.One P4 GPU card simultaneously handle resolution ratio be 1920x1080 video flowing in the case where, as shown in fig. 6, correspondence system scheme 1, system Has alternate frames logic, then accessible video flowing number is 12 tunnels, and correspondence system scheme 2, system do not have alternate frames logic, then Accessible video flowing number is 2 tunnels, it is seen that under same computing resource, what band adaptively triggered selects the system energy of frame logic Handle the video flowing of more numbers.

It will be understood by those skilled in the art that each step writes sequence simultaneously in the above method of specific embodiment It does not mean that stringent execution sequence and any restriction is constituted to implementation process, the specific execution sequence of each step should be with its function It can be determined with possible internal logic.

Above-mentioned each embodiment of the method that the disclosure refers to can phase each other without prejudice to principle logic The embodiment formed after combining is mutually combined, as space is limited, the disclosure repeats no more.

In addition, the disclosure additionally provides video process apparatus, electronic equipment, computer readable storage medium, program, it is above-mentioned It can be used to realize any method for processing video frequency that the disclosure provides, corresponding technical solution and description and referring to method part It is corresponding to record, it repeats no more.

Fig. 8 shows the block diagram of the video process apparatus according to the embodiment of the present disclosure, as shown in figure 8, the embodiment of the present disclosure Video process apparatus, comprising: decoding unit 31 includes in the video flowing for obtaining multiple video frames to decoding video stream The target object acquired in real time；Frame unit 32 is selected, for according to selecting frame strategy selected part from the multiple video frame to wait for Video frame is handled, the video frame to be processed is used to characterize picture quality in multiple video frames comprising the target object and meets The video frame of real time monitoring demand；Information transmitting unit 33, for extracting the corresponding mesh from the video frame to be processed The structured message for marking object, is sent to terminal for the structured message.

In possible implementation, described device further include: receiving unit, for receiving video render request, with root The video flowing is extracted according to video render request；The information transmitting unit, for sending the structured message To the terminal for issuing the video render request.

Described to select frame unit in possible implementation, be used for: extraction is described to select frame strategy, selects frame strategy according to described To in the multiple video frame include same target object at least two video frames carry out image quality measure respectively, obtain to Few two assessment results；At least two assessment result is subjected to fusion operation, obtains the fusion calculation result；According to institute It states fusion calculation result and filters out the video frame to be processed from the multiple video frame.

It is described to select frame unit in possible implementation, it is used for: blending image is obtained according to the fusion calculation result； The fusion calculation result be greater than it is described select the target score configured in frame strategy in the case where, will be greater than the target score Blending image is as the video frame to be processed.

In possible implementation, described device further includes taxon, is used for: to the target in the multiple video frame Object is identified, the target object is extracted；Classify to the target object, obtains at least one classification results.

In possible implementation, described device further includes sorting out unit, is used for: being tied according at least one described classification Fruit sorts out the target object in the multiple video frame, and at least two video frames comprising same target object are set In the same categorization results.

In possible implementation, described device further includes detection unit, is used for: to the target in the multiple video frame Object carries out position detection, obtains at least one target position information of the target object.

In possible implementation, the information transmitting unit is used for: to the target object in the video frame to be processed Feature extraction and/or attributes extraction are carried out, the structured message being made of characteristic information and/or attribute information is obtained；It will correspond to The structured message and target position information of same target object are sent to terminal, to carry out structured message wash with watercolours by the terminal After dye, obtained spatial cue and the target position information are added in the video flowing and played together.

In some embodiments, the embodiment of the present disclosure provides the function that has of device or comprising module can be used for holding The method of row embodiment of the method description above, specific implementation are referred to the description of embodiment of the method above, for sake of simplicity, this In repeat no more.

The embodiment of the present disclosure also proposes a kind of computer readable storage medium, is stored thereon with computer program instructions, institute It states when computer program instructions are executed by processor and realizes the above method.Computer readable storage medium can be non-volatile meter Calculation machine readable storage medium storing program for executing.

The embodiment of the present disclosure also proposes a kind of electronic equipment, comprising: processor；For storage processor executable instruction Memory；Wherein, the processor is configured to the above method.

The equipment that electronic equipment may be provided as terminal, server or other forms.

Fig. 9 is the block diagram of a kind of electronic equipment 800 shown according to an exemplary embodiment.For example, electronic equipment 800 can To be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, Medical Devices are good for Body equipment, the terminals such as personal digital assistant.

Referring to Fig. 9, electronic equipment 800 may include following one or more components: processing component 802, memory 804, Power supply module 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, And communication component 816.

The integrated operation of the usual controlling electronic devices 800 of processing component 802, such as with display, call, data are logical Letter, camera operation and record operate associated operation.Processing component 802 may include one or more processors 820 to hold Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more moulds Block, convenient for the interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, with Facilitate the interaction between multimedia component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in electronic equipment 800.These data Example include any application or method for being operated on electronic equipment 800 instruction, contact data, telephone directory Data, message, picture, video etc..Memory 804 can by any kind of volatibility or non-volatile memory device or it Combination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, fastly Flash memory, disk or CD.

Power supply module 806 provides electric power for the various assemblies of electronic equipment 800.Power supply module 806 may include power supply pipe Reason system, one or more power supplys and other with for electronic equipment 800 generate, manage, and distribute the associated component of electric power.

Multimedia component 808 includes the screen of one output interface of offer between the electronic equipment 800 and user. In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding The boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 808 includes a front camera and/or rear camera.When electronic equipment 800 is in operation mode, as clapped When taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when electronic equipment 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone It is configured as receiving external audio signal.The received audio signal can be further stored in memory 804 or via logical Believe that component 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.

I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 814 includes one or more sensors, for providing the state of various aspects for electronic equipment 800 Assessment.For example, sensor module 814 can detecte the state that opens/closes of electronic equipment 800, the relative positioning of component, example As the component be electronic equipment 800 display and keypad, sensor module 814 can also detect electronic equipment 800 or The position change of 800 1 components of electronic equipment, the existence or non-existence that user contacts with electronic equipment 800, electronic equipment 800 The temperature change of orientation or acceleration/deceleration and electronic equipment 800.Sensor module 814 may include proximity sensor, be configured For detecting the presence of nearby objects without any physical contact.Sensor module 814 can also include optical sensor, Such as CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which may be used also To include acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between electronic equipment 800 and other equipment. Electronic equipment 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at one In example property embodiment, communication component 816 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel Relevant information.In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, short to promote Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module (UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, electronic equipment 800 can be by one or more application specific integrated circuit (ASIC), number Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 804 of machine program instruction, above-mentioned computer program instructions can be executed by the processor 820 of electronic equipment 800 to complete The above method.

Figure 10 is the block diagram of a kind of electronic equipment 900 shown according to an exemplary embodiment.For example, electronic equipment 900 It may be provided as a server.Referring to Fig.1 0, electronic equipment 900 includes processing component 922, further comprises one or more A processor, and the memory resource as representated by memory 932, can be by the finger of the execution of processing component 922 for storing It enables, such as application program.The application program stored in memory 932 may include it is one or more each correspond to The module of one group of instruction.In addition, processing component 922 is configured as executing instruction, to execute the above method.

Electronic equipment 900 can also include that a power supply module 926 is configured as executing the power supply pipe of electronic equipment 900 Reason, a wired or wireless network interface 950 are configured as electronic equipment 900 being connected to network and an input and output (I/ O) interface 958.Electronic equipment 900 can be operated based on the operating system for being stored in memory 932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, for example including calculating The memory 932 of machine program instruction, above-mentioned computer program instructions can be executed by the processing component 922 of electronic equipment 900 with complete At the above method.

The disclosure can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.

Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

Computer program instructions for executing disclosure operation can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the disclosure Face.

Referring herein to according to the flow chart of the method, apparatus (system) of the embodiment of the present disclosure and computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.

The flow chart and block diagram in the drawings show system, method and the computer journeys according to multiple embodiments of the disclosure The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.

The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to technology in market for best explaining each embodiment, or make the art Other those of ordinary skill can understand each embodiment disclosed herein.

Claims

1. a kind of method for processing video frequency, which is characterized in that the described method includes:

According to frame strategy selected part video frame to be processed from the multiple video frame is selected, the video frame to be processed is used for table Sign meets the video frame of real time monitoring demand comprising picture quality in multiple video frames of the target object；

The structured message that the corresponding target object is extracted from the video frame to be processed, the structured message is sent out Give terminal.

2. the method according to claim 1, wherein it is described multiple video frames are obtained to decoding video stream before, The method also includes:

It is described that the structured message is sent to terminal, comprising:

3. method according to claim 1 or 2, which is characterized in that it is described according to select frame strategy from the multiple video frame Middle selected part video frame to be processed, comprising:

Extract it is described select frame strategy, according to it is described select frame strategy in the multiple video frame include same target object at least Two video frames carry out image quality measure respectively, obtain at least two assessment results；

4. according to the method described in claim 3, it is characterized in that, it is described according to the fusion calculation result from the multiple view The video frame to be processed is filtered out in frequency frame, comprising:

Blending image is obtained according to the fusion calculation result；

The fusion calculation result be greater than it is described select the target score configured in frame strategy in the case where, meter will be merged with described The corresponding blending image of result is calculated as the video frame to be processed.

5. method according to claim 1-4, which is characterized in that described to obtain multiple videos to decoding video stream After frame, the method also includes:

Classify to the target object, obtains at least one classification results.

6. according to the method described in claim 5, it is characterized in that, it is described in the multiple video frame include same target pair At least two video frames of elephant carry out image quality measure, before obtaining at least two assessment results, the method also includes:

According at least one described classification results, the target object in the multiple video frame is sorted out, will include same At least two video frames of target object are placed in the same categorization results.

7. method according to claim 1-5, which is characterized in that described to obtain multiple videos to decoding video stream After frame, the method also includes:

Position detection is carried out to the target object in the multiple video frame, obtains at least one target position of the target object Confidence breath.

8. a kind of video process apparatus, which is characterized in that described device includes:

Decoding unit includes the target acquired in real time in the video flowing for obtaining multiple video frames to decoding video stream Object；

Frame unit is selected, it is described wait locate for according to selecting frame strategy selected part video frame to be processed from the multiple video frame Reason video frame is used to characterize the video that picture quality in multiple video frames comprising the target object meets real time monitoring demand Frame；

Information transmitting unit, for extracting the structured message of the corresponding target object from the video frame to be processed, The structured message is sent to terminal.

9. a kind of electronic equipment characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: perform claim require any one of 1 to 7 described in method.

10. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that the computer Method described in any one of claim 1 to 7 is realized when program instruction is executed by processor.