CN108012156A

CN108012156A - A kind of method for processing video frequency and control platform

Info

Publication number: CN108012156A
Application number: CN201711147343.0A
Authority: CN
Inventors: 吴伟华; 贺武; 李殿平
Original assignee: SHENZHEN HARZONE TECHNOLOGY Co Ltd
Current assignee: SHENZHEN HARZONE TECHNOLOGY Co Ltd
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2018-05-08
Anticipated expiration: 2037-11-17
Also published as: CN108012156B

Abstract

An embodiment of the present invention provides a kind of method for processing video frequency and control platform, the described method includes：Receive the analysis request sent by client, the attribute information of pending image data is carried in analysis request, GPU resource is configured for pending image data according to attribute information, pending image data is received by GPU resource, and treat process image data and carry out decoding operate, acceleration operation is carried out to deep neural network model by multi-stage compression optimization method, obtain the deep neural network model after accelerating operation and video structure analyzing is carried out to the pending image data after decoding operate, feature set is obtained, feature set is sent to client.Video structure analyzing efficiency can be improved using the embodiment of the present invention.

Description

A kind of method for processing video frequency and control platform

Technical field

The present invention relates to technical field of video processing, and in particular to a kind of method for processing video frequency and control platform.

Background technology

At present, traditional video surveillance is recorded a video, and is generally based on cpu server or CPU and GPU heterogeneous servers carry out this Ground processing, it is impossible to realize high in the clouds, limited be subject to space-time, it is extremely inconvenient, and also video decoding is normally placed on CPU and carries out, and leads CPU memories and GPU video memorys is caused to cause bottleneck there are mass data interaction, influence video structure analyzing performance, processing time mistake Long, usually user can not endure.Therefore, the problem of how lifting video structure analyzing efficiency is urgently to be resolved hurrily.

The content of the invention

An embodiment of the present invention provides a kind of method for processing video frequency and control platform, can lift video structure analyzing effect Rate.

First aspect of the embodiment of the present invention provides a kind of method for processing video frequency, including：

The analysis request sent by client is received, the attribute letter of pending image data is carried in the analysis request Breath；

GPU resource is configured for the pending image data according to the attribute information；

The pending image data is received by the GPU resource, and the pending image data is decoded Operation；

Acceleration operation is carried out to deep neural network model by multi-stage compression optimization method；

Accelerate the deep neural network model after operating to the pending image data after the decoding operate by described Video structure analyzing is carried out, obtains feature set；

The feature set is sent to the client.

Second aspect of the embodiment of the present invention provides a kind of control platform, and it is distributed that the control platform includes high-throughput Distribution subscription information apparatus, it is used to communicate between server cluster, and the high-throughput distributed post, which is subscribed to, to disappear Breath device includes receiving unit, dispensing unit, accelerator module, analytic unit and transmitting element, wherein,

The receiving unit, for receiving the analysis request sent by client, carries pending in the analysis request The attribute information of image data；

The dispensing unit, for configuring GPU resource according to the attribute information for the pending image data；

The receiving unit, is additionally operable to receive the pending image data by the GPU resource, and waits to locate to described Manage image data and carry out decoding operate；

The accelerator module, for carrying out acceleration operation to deep neural network model by multi-stage compression optimization method；

The analytic unit, after by the deep neural network model after the acceleration operation to the decoding operate Pending image data carry out video structure analyzing, obtain feature set；

The transmitting element, for the feature set to be sent to the client.

The third aspect, an embodiment of the present invention provides a kind of control platform, including：Processor and memory；And one Or multiple programs, one or more of programs are stored in the memory, and it is configured to be held by the processor OK, described program includes being used for such as the instruction of the part or all of step described in first aspect.

Fourth aspect, an embodiment of the present invention provides a kind of computer-readable recording medium, wherein, it is described computer-readable Storage medium is used to store computer program, wherein, the computer program causes computer to perform such as the embodiment of the present invention the The instruction of part or all of step described in one side.

5th aspect, an embodiment of the present invention provides a kind of computer program product, wherein, the computer program product Non-transient computer-readable recording medium including storing computer program, the computer program are operable to make calculating Machine is performed such as the part or all of step described in first aspect of the embodiment of the present invention.The computer program product can be one A software installation bag.

Implement the embodiment of the present invention, have the advantages that：

As can be seen that by the embodiment of the present invention, the analysis request sent by client is received, carries and treats in analysis request The attribute information of process image data, configures GPU resource for pending image data according to attribute information, is connect by GPU resource Pending image data is received, and treats process image data and carries out decoding operate, by multi-stage compression optimization method to depth god Acceleration operation is carried out through network model, by accelerating the deep neural network model after operating to the pending shadow after decoding operate As data progress video structure analyzing, feature set is obtained, feature set is sent to client, in this way, can be to pending shadow As data distribution GPU resource, and decoded by it, on this basis, by multi-stage compression optimization method to depth nerve Network model carries out acceleration operation, carries out video structure analyzing to the pending image data after decoding operate, is divided Analysis is as a result, so as to improve video structure analyzing efficiency.

Brief description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Fig. 1 a are a kind of network architecture diagrams of processing system for video provided in an embodiment of the present invention；

Fig. 1 b are a kind of first embodiment flow diagrams of method for processing video frequency provided in an embodiment of the present invention；

Fig. 2 is a kind of second embodiment flow diagram of method for processing video frequency provided in an embodiment of the present invention；

Fig. 3 a are a kind of example structure schematic diagrams of control platform provided in an embodiment of the present invention；

Fig. 3 b are the structure diagrams of the dispensing unit of the described control platforms of Fig. 3 a provided in an embodiment of the present invention；

Fig. 3 c are the structure diagrams of the accelerator module of the described control platforms of Fig. 3 a provided in an embodiment of the present invention；

Fig. 3 d are the another structure diagrams of the described video process apparatus of Fig. 3 a provided in an embodiment of the present invention；

Fig. 4 is a kind of example structure schematic diagram of control platform provided in an embodiment of the present invention.

Embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts Example, belongs to the scope of protection of the invention.

Term " first ", " second ", " the 3rd " in description and claims of this specification and the attached drawing and " Four " etc. be to be used to distinguish different objects, rather than for describing particular order.In addition, term " comprising " and " having " and it Any deformation, it is intended that cover non-exclusive include.Such as contain the process of series of steps or unit, method, be The step of system, product or equipment are not limited to list or unit, but alternatively further include the step of not listing or list Member, or alternatively further include for the intrinsic other steps of these processes, method, product or equipment or unit.

Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments Containing at least one embodiment of the present invention.It is identical that each position in the description shows that the phrase might not be each meant Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.

It should be noted that the video structure analyzing system in correlation technique is all based on graphics processor GPU (Graphics Processing Unit) decoding+GPU analyses are realized.And correlation technique is CPU decodings+GPU analyses, usually CPU decodings are realized using existing decoding storehouse, the Pixel Information that the video-information decoding of compression can be understood into computer, storage Deposit in memory, GPU analyses are using the deep neural network model of pre-training according to demand, realize video structure end to end Change analysis.

Further, the soft decodings of CPU, are decoded using CPU computing resources, but HD video resolution ratio is increasingly Height, network transmission video invitation compression ratio are very high so that it is very big to decode the operand of computing, greatly consumes CPU operational capabilities. And decoded video Pixel Information is deposited in memory, carrying out mass data with GPU video memorys exchanges, and causes bandwidth bottleneck, analyzes Speed does not often reach the demand of user.

Therefore, deep neural network model extracts video features, and target detection, identification are realized by characteristic information And tracking, achieve the purpose that video structure analyzing.But often parameter there are bulk redundancy, occupies deep neural network model Massive band width and computing resource, hard solution require height, cause system cost to rise.

For these reasons, the embodiment of the present invention can realize the cloud system of video hard decoder and structured analysis using GPU System, CPU is only responsible for scheduling, not high to performance requirement.Decoding and processing are all placed on benefit on GPU and avoid data exchange from bringing Performance loss, shortcoming are more to video memory and computational resource requirements, and here it is the embodiment of the present invention to solve the problems, such as, therefore, As shown in Figure 1a, there is provided a kind of network architecture of processing system for video, it includes client, control platform and server set Group, the server cluster can include multiple servers.Control platform described by the embodiment of the present invention can be video matrix, clothes It is engaged in device, etc., the control platform includes high-throughput distributed post and subscribes to information apparatus, the depth in the embodiment of the present invention Neural network model can be pre-stored in control platform or server cluster.Pending image in the embodiment of the present invention Data can be following at least one：Video data, view data etc..Pending image data in the embodiment of the present invention Attribute information can include following at least one：Memory size, data type, data format, data source etc..Need to illustrate , deep neural network model in the embodiment of the present invention such as classifies, identifies, detection, all in many machine vision tasks Show that there is powerful fulfillment capability.And test and show, with the increase of network depth and range, the expressive ability of model Have greatly improved.But also there is the problems such as calculation amount increase, model parameter increases severely therewith in this.Nerve in the embodiment of the present invention Network model can be used for realizing following at least one function：Recognition of face, Car license recognition, vehicle cab recognition, target detection, target Tracking etc..Client in the embodiment of the present invention can include smart mobile phone (such as Android phone, iOS mobile phones, Windows Phone mobile phones etc.), tablet computer, video matrix, monitor supervision platform, mobile unit, satellite, palm PC, laptop, shifting Dynamic internet device (MID, Mobile Internet Devices) or Wearable etc., above-mentioned is only citing, rather than thoroughly Lift, including but not limited to above device, certainly, above-mentioned data processing equipment can also be server.

In addition, in the embodiment of the present invention, high-throughput distributed post is subscribed to information apparatus and is located in processing system for video In core pivotal role, it is responsible for task scheduling, load balance between user and computing cluster.For example, if user's uploaded videos, First, the high-speed queue buffering area of high distributed message module of handling up is uploaded to, high-throughput distributed post subscribes to message dress Put and available computational resources are determined by loading algorithm, initiate a push message, will be regarded by the computing unit with pull states Unit carries out subsequent arithmetic where frequency pulls in.The push-pull mechanism of this message route, avoids directly pushing to computing cluster Video, causes data flows congestion, hydraulic performance decline.

An embodiment of the present invention provides a kind of method for processing video frequency, and the method for processing video frequency, is implemented, specifically by control platform Include the following steps：

The feature set is sent to the client.

As can be seen that by the embodiment of the present invention, the analysis request sent by client is received, carries and treats in analysis request The attribute information of process image data, configures GPU resource for pending image data according to attribute information, is connect by GPU resource Pending image data is received, and treats process image data and carries out decoding operate, by multi-stage compression optimization method to depth god Acceleration operation is carried out through network model, after the deep neural network model after being operated by the acceleration is to the decoding operate Pending image data carries out video structure analyzing, obtains feature set, feature set is sent to client, in this way, can be right Pending image data distributes GPU resource, and is decoded by it, on this basis, by deep neural network model into Row acceleration is handled, and carries out video structure analyzing to the pending image data after acceleration processing, obtains analysis result, from And improve video structure analyzing efficiency.

Based on the described network architectures of Fig. 1 a, b is please referred to Fig.1, is a kind of Video processing provided in an embodiment of the present invention The first embodiment flow diagram of method.Method for processing video frequency described in the present embodiment, comprises the following steps：

101st, the analysis request sent by client is received, the attribute of pending image data is carried in the analysis request Information.

Wherein, control platform includes high-throughput distributed post subscription information apparatus, it is used to receive client transmission Analysis request.

102nd, GPU resource is configured for the pending image data according to the attribute information.

Wherein, different attribute informations can use different GPU resource mechanism, if for example, the memory of video is smaller, It can not implement the embodiment of the present invention, but when the memory of only video is larger, just implement the embodiment of the present invention.

Alternatively, above-mentioned steps 102, GPU resource is configured according to the attribute information for the pending image data, can Include the following steps：

21st, the resource state information of server cluster is obtained；

22nd, the pending image is determined according to the resource state information of the server cluster and the attribute information The GPU resource of data.

Wherein, the resource state information of server cluster includes the resource status letter of each server in server cluster Breath, wherein, resource state information may include following at least one：GPU service conditions, the interface of GPU, the bandwidth of GPU, GPU Priority etc..In this way, the mapping relations between attribute information and GPU resource can be pre-set, and then, it may be determined that wait to locate The corresponding GPU resource of attribute information of image data is managed, is obtained according to the resource state information of the GPU resource and server cluster Take corresponding resource.

103rd, the pending image data is received by the GPU resource, and the pending image data is carried out Decoding operate.

Wherein, GPU has the performance for being several times as much as CPU in video decoding capability, but hard decoder is carried out in GPU, accounts for With video memory resource, the way that GPU carries out video structure analyzing is greatly limited so that GPU parallel computation energy cannot be given full play to Power, causes the waste to GPU computing resources, adds system cost.

Alternatively, during the present invention is implemented, the process that decoding operate is carried out to the pending image data is being performed In, specific execution treats process image data using the shared video memory decoding technique of GPU high speeds and carries out decoding operate.Single channel decoding accounts for Include two parts with resource：Internal hard solution device context resources and video texture buffering area, shared video memory decoding is logical at a high speed by GPU The context resources in shared decoding process are crossed, realize multi-channel video parallel decoding, rather than individually take per decoding all the way Context resources, shared video memory realize 1 road * (internal hard solution device context resources)+n roads * (video texture buffering area), Context resources are close with video memory shared by screen buffer so that video memory reduces 1 times, improves resource utilization.

104th, acceleration operation is carried out to deep neural network model by multi-stage compression optimization method.

Wherein, acceleration operation carries out deep neural network model by multi-stage compression optimization method, and by accelerating to grasp Deep neural network model after work carries out structured analysis to the pending image data after decoding operate, can be lifted and wait to locate The treatment effeciency of image data is managed, lowers GPU resource consumption.

Alternatively, above-mentioned steps 104, acceleration behaviour is carried out by multi-stage compression optimization method to deep neural network model Make, it may include following steps：

41st, the precision threshold by deep neural network model is obtained；

42nd, according to the multi-stage compression optimization method to deep neural network model to the pending image after decoding operate Data carry out multistage acceleration operation, and execution sequence is the multi-stage compression optimization method successively：Layer mixing operation, the sparse behaviour of passage Make, the operation of core Regularization and weights INT8 quantizations, the precision for accelerating the deep neural network model after operating are higher than institute State precision threshold.

Wherein, above-mentioned precision threshold can be determined by multi-stage compression optimization method, and certainly, precision threshold can also be by user Voluntarily setting or system default.

In addition, in correlation technique, deep neural network model has good effect in video structure analyzing, still Model takes a large amount of storages and computing resource there are bulk redundancy.Therefore, in order to engineering hardware and software platform, this hair is better achieved In bright embodiment, deep neural network model carries out acceleration operation using multi-stage compression optimization method so that is realized in GPU platform High speed real-time operation.First, the detection needed to structured analysis and identification model carry out fused layer, passage is sparse, core is regular Change and the multi-stage compression optimization methods such as INT8 quantifies, this method are limited from model, can reach to video memory and computing resource Optimization use.

Fused layer, tri- layers of Ke Yishi, conv+bn+scale become only to calculate conv, reduce two layers next of calculating Amount, is to carry out Convolution layers common in neutral net, BatchNorm layers, Scale layers to this three layer originals weight Renewal, is fused into one Conv+BN+Scale layers, eliminates BatchNorm and Scale layers after layer fusion, reduce this two layers Caused calculation amount, reduces computation complexity.

Wherein, C₁、C₂It is Convolution layers of weight, B₁、B₂、B₃It is BatchNorm layers of weight, S₁、S₂It is Scale Layer weight.After fusion, using above-mentioned formula Section 1 as C₁, latter three are used as C₂, Convolution weights are updated, that is, are eliminated BatchNorm and Scale layers.

Passage is sparse and core Regularization INT8 quantifies, and specifically, can be led to by reducing by Convolution layers of output Road, while less model parameter amount, reduces calculation amount, reduces the GPU video memory resources that intermediate result takes, INT8 quantification theories 4 times of concurrent operation acceleration effects can be obtained, i.e., fine-tuning again are carried out to existing model, remove the passage of redundancy, Network node is set to be in unactivated state, while convolution layer parameter carries out core Regularization, finds a saturation threshold T, makes parameter Original 32 floating-point moulds (FP32) to the symmetrical compression nearby of 0 value, to carry out INT8 quantizations, i.e., are converted to 8 by value as far as possible Shaping model (INT8) come compress video memory and parallelization speed-raising method.Pass through this convex optimization problem so that after INT8 quantifies Deep neural network model be unlikely to cause precise decreasing.Under normal conditions, the activation primitive output valve of c passage is carried out It is sparse so that the output activation value A of archetype from original c passage is sparse with being cut into c ' (0<c’<C) cutting of a passage Each activation value of model asks cost function to minimize, and in addition n corresponds to each activation primitive output layer, asks for FP32 activation values Statistical distribution P_nWith core Regularization activation value statistical distribution Q_nSimilarity, and convolution kernel is restrained to be caused to activation value Regularization To (- | T |, | T |) in the range of, while obtain the maximum phase of FP32 activation values statistical distribution and core Regularization activation value statistical distribution Like degree, equation below：

It is Frobenius norms, W_iIt is convolution kernel weight, λ is most mutually penalty factor, is worth the passage of more big sparse cutting It is more, P_nIt is a FP32 activation values statistical distribution of n active coating (i=1 ..., c), Q_nIt is the activation of core Regularization active coating The INT8 quantitative statistics distribution of value, related entropy function KL (q, p) are used for FP32 activation values and core Regularization activation value statistical distribution Similarity measurement, it is anti-to release saturation threshold T to obtain both most like distributions, so as to by input and output Feature map Also quantify to arrive INT8 scopes.Fine-tuning is by fixing Beta, training W_i, then fix W_iTraining β, obtains optimal solution.

105th, video is carried out to the pending image data accelerated after operation by the deep neural network model Structured analysis, obtains feature set.

Wherein, features described above collection can be following at least one：Key message (time, place, position), characteristic point, spy Levy region, target person and attribute (for example, gender, height, age, identity etc.), comparison result (for example, similarity value, With image etc.).

Alternatively, above-mentioned steps 105, accelerate the deep neural network model after operating to the decoding operate by described Pending image data afterwards carries out video structure analyzing, obtains feature set, it may include following steps：

Accelerate the deep neural network model after operating to the pending image data after the decoding operate by described Target detection is carried out, obtains target, and aspect ratio pair, and identification are carried out to the target, and determines the key of the target Feature, obtains the feature set.

106th, the feature set is sent to the client.

Wherein it is possible to which feature set is sent to client, client can consult video analysis result.

As can be seen that by the embodiment of the present invention, the analysis request sent by client is received, carries and treats in analysis request The attribute information of process image data, configures GPU resource for pending image data according to attribute information, is connect by GPU resource Pending image data is received, and treats process image data and carries out decoding operate, by multi-stage compression optimization method to depth god Acceleration operation is carried out through network model, by accelerating the deep neural network model after operating to the pending shadow after decoding operate As data progress video structure analyzing, feature set is obtained, feature set is sent to client, in this way, can be to pending shadow As data distribution GPU resource, and decoded by it, on this basis, by multi-stage compression optimization method to depth nerve Network model carries out acceleration operation, and carries out video structure analyzing to the pending image data after decoding operate, is divided Analysis is as a result, so as to improve video structure analyzing efficiency.

Consistent with the abovely, referring to Fig. 2, second for a kind of method for processing video frequency provided in an embodiment of the present invention implements Example flow diagram.Method for processing video frequency described in the present embodiment, comprises the following steps：

201st, the analysis request sent by client is received, the attribute of pending image data is carried in the analysis request Information.

202nd, current network speed is obtained.

Wherein, when current network speed is slower, the embodiment of the present invention also can not preferably be implemented, therefore, the present invention is implemented Example can also be applied in the preferable environment of network rate.

203rd, when the memory size of the current network speed and the pending image data meets preset condition, GPU resource is configured for the pending image data according to the attribute information.

Wherein, above-mentioned preset condition can be by system default, alternatively, user is voluntarily set.Above-mentioned preset condition can be： Network rate is more than the first predetermined threshold value, and the memory size of pending image data is more than the second predetermined threshold value, and above-mentioned first is pre- If threshold value, the second predetermined threshold value can voluntarily be set by user or system default, alternatively, above-mentioned preset condition can be： Network rate is in the first preset range, and the memory size of pending image data is in the second preset range, and above-mentioned first is pre- If scope, the second preset range can voluntarily be set by user or system default.

204th, the pending image data is received by the GPU resource, and the pending image data is carried out Decoding operate.

205th, acceleration operation is carried out to deep neural network model by multi-stage compression optimization method.

206th, the deep neural network model after operating is accelerated to the pending image after the decoding operate by described Data carry out video structure analyzing, obtain feature set.

207th, the feature set is sent to the client.

Wherein, above-mentioned steps 201, the specific descriptions of 203- steps 207 can refer to the described method for processing video frequency of Fig. 1 b Correspondence step 101- steps 106, details are not described herein.

As can be seen that by the embodiment of the present invention, the analysis request sent by client is received, carries and treats in analysis request The attribute information of process image data, obtains current network speed, in current network speed and pending image data When depositing size and meeting preset condition, GPU resource is configured for pending image data according to attribute information, is received by GPU resource Pending image data, and treat process image data and carry out decoding operate, by multi-stage compression optimization method to depth nerve Network model carries out acceleration operation, accelerates the deep neural network model after operating to pending after decoding operate by described Image data carries out video structure analyzing, obtains feature set, feature set is sent to client, in this way, can be to pending Image data distributes GPU resource, and is decoded by it, on this basis, by multi-stage compression optimization method to depth god Acceleration operation is carried out through network model, and video structure analyzing is carried out to the pending image data after decoding operate, is obtained Analysis result, so that, improve video structure analyzing efficiency.

Consistent with the abovely, it is specific as follows below to implement the device of above-mentioned method for processing video frequency：

Fig. 3 a are referred to, are a kind of example structure schematic diagram of control platform provided in an embodiment of the present invention.This implementation The control platform described in example includes high-throughput distributed post and subscribes to information apparatus, it is used for and server cluster Between communicate, including：Receiving unit 301, dispensing unit 302, accelerator module 303, analytic unit 304 and transmitting element 305, it is specific as follows：

The receiving unit 301, for receiving the analysis request sent by client, carries in the analysis request and waits to locate Manage the attribute information of image data；

The dispensing unit 302, for configuring GPU resource according to the attribute information for the pending image data；

The receiving unit 301, is additionally operable to receive the pending image data by the GPU resource, and to described Pending image data carries out decoding operate；

The accelerator module 303, for carrying out acceleration behaviour to deep neural network model by multi-stage compression optimization method Make；

The analytic unit 304, for being grasped by the deep neural network model accelerated after operation to the decoding Pending image data after work carries out video structure analyzing, obtains feature set；

The transmitting element 305, for the feature set to be sent to the client.

Alternatively, if Fig. 3 b, Fig. 3 b are the specific refinement knot of the dispensing unit 302 in the control platform described in Fig. 3 a Structure, the dispensing unit 302 may include：First acquisition module 3021 and configuration module 3022, it is specific as follows：

First acquisition module 3021, for obtaining the resource state information of server cluster；

Configuration module 3022, determines for the resource state information according to the server cluster and the attribute information The GPU resource of the pending image data.

Alternatively, if Fig. 3 c, Fig. 3 c are the specific refinement knot of the accelerator module 303 in the control platform described in Fig. 3 a Structure, the accelerator module 303 may include：Second acquisition module 3031 and accelerating module 3032, it is specific as follows：

Second acquisition module 3031, for obtaining the precision threshold by deep neural network model；

Accelerating module 3032, for carrying out multistage add to deep neural network model according to the multi-stage compression optimization method Speed operation, execution sequence is the multi-stage compression optimization method successively：Layer mixing operation, passage sparse operation, core Regularization behaviour Make and weights INT8 quantifies, the precision for accelerating the deep neural network model after operating is higher than the precision threshold.

Alternatively, the analytic unit 304 is specifically used for：

Alternatively, the attribute information includes the memory size of the pending image data, and Fig. 3 d in Fig. 3 a by retouching The another modification structures for the control platform stated, compared with Fig. 3 a, it may also include Fig. 3 d：Acquiring unit 306, it is specific as follows：

Acquiring unit 306, for obtaining current network speed；By the dispensing unit 302 in the current network speed And the memory size of the pending image data is when meeting preset condition, it according to the attribute information is described to perform described The step of pending image data configuration GPU resource.

As can be seen that by the described control platform of the embodiment of the present invention, the analysis request sent by client is received, The attribute information of pending image data is carried in analysis request, is provided according to attribute information for pending image data GP configuring U Source, pending image data is received by GPU resource, and is treated process image data and carried out decoding operate, passes through multi-stage compression Optimization method carries out acceleration operation to deep neural network model, by accelerating the deep neural network model after operating to decoding Pending image data after operation carries out video structure analyzing, obtains feature set, feature set is sent to client, such as This, can treat process image data distribution GPU resource, and be decoded by it, on this basis, pass through depth nerve net Network model carries out acceleration processing, and carries out video structure analyzing to the pending image data after acceleration processing, is divided Analysis is as a result, so as to improve video structure analyzing efficiency.

Consistent with the abovely, referring to Fig. 4, showing for a kind of example structure of control platform provided in an embodiment of the present invention It is intended to.Control platform described in the present embodiment, including：At least one input equipment 1000；At least one output equipment 2000；At least one processor 3000, such as CPU；With memory 4000, above-mentioned input equipment 1000, output equipment 2000, place Reason device 3000 and memory 4000 are connected by bus 5000.The control platform includes the subscription of high-throughput distributed post and disappears Device is ceased, it is used to communicate between server cluster, and the control platform is subscribed to including high-throughput distributed post Information apparatus is integrated in processor 3000.

Wherein, above-mentioned input equipment 1000 concretely contact panel, physical button or mouse.

Above-mentioned output equipment 2000 concretely display screen.

Above-mentioned memory 4000 can be high-speed RAM memory, or nonvolatile storage (non-volatile ), such as magnetic disk storage memory.Above-mentioned memory 4000 is used to store batch processing code, above-mentioned input equipment 1000, defeated Go out equipment 2000 and processor 3000 is used to call the program code stored in memory 4000, execution includes above method implementation The part or all of step of any type method for processing video frequency described in example.

The embodiment of the present invention also provides a kind of computer-readable storage medium, wherein, which can be stored with journey Sequence, the part or all of step including any type method for processing video frequency described in above method embodiment when which performs Suddenly.

The embodiment of the present invention also provides a kind of computer program product, and the computer program product includes storing calculating The non-transient computer-readable recording medium of machine program, the computer program are operable to make computer perform such as above-mentioned side The part or all of step of any type method for processing video frequency described in method embodiment.

Although combining each embodiment herein, invention has been described, however, implementing the present invention for required protection During, those skilled in the art are by checking the attached drawing, disclosure and the appended claims, it will be appreciated that and it is real Other changes of the existing open embodiment.In the claims, " comprising " (comprising) word is not excluded for other compositions Part or step, "a" or "an" are not excluded for multiple situations.Single processor or other units can realize claim In some functions enumerating.Mutually different has been recited in mutually different dependent some measures, it is not intended that these are arranged Apply to combine and produce good effect.

It will be understood by those skilled in the art that the embodiment of the present invention can be provided as method, apparatus (equipment) or computer journey Sequence product.Therefore, in terms of the present invention can use complete hardware embodiment, complete software embodiment or combine software and hardware The form of embodiment.Moreover, the present invention can use the calculating for wherein including computer usable program code in one or more The computer program that machine usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.Computer program is stored/distributed in suitable medium, is provided together with other hardware or one as hardware Part, can also use other distribution forms, such as pass through the wired or wireless telecommunication systems of Internet or other.

The present invention be with reference to the embodiment of the present invention method, apparatus (equipment) and computer program product flow chart with/ Or block diagram describes.It should be understood that can be realized by computer program instructions each flow in flowchart and/or the block diagram and/ Or the flow in square frame and flowchart and/or the block diagram and/or the combination of square frame.These computer program instructions can be provided To the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable vision processing equipments to produce one A machine so that the instruction performed by computer or the processor of other programmable vision processing equipments, which produces, to be used for realization The device for the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.

These computer program instructions, which may also be stored in, can guide computer or other programmable vision processing equipments with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable vision processing equipments so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided and is used for realization in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a square frame or multiple square frames.

Although with reference to specific features and embodiment, invention has been described, it is clear that, do not departing from this hair In the case of bright spirit and scope, various modifications and combinations can be carried out to it.Correspondingly, the specification and drawings are only institute The exemplary illustration of the invention that attached claim is defined, and be considered as covered in the scope of the invention any and all and repair Change, change, combining or equivalent.Obviously, those skilled in the art various changes and modifications can be made to the invention without Depart from the spirit and scope of the present invention.If in this way, these modifications and changes of the present invention belong to the claims in the present invention and its Within the scope of equivalent technologies, then the present invention is also intended to comprising including these modification and variations.

Claims

A kind of 1. method for processing video frequency, it is characterised in that including：

The analysis request sent by client is received, the attribute information of pending image data is carried in the analysis request；

GPU resource is configured for the pending image data according to the attribute information；

The pending image data is received by the GPU resource, and decoding operate is carried out to the pending image data；

Acceleration operation is carried out to deep neural network model by multi-stage compression optimization method；

The pending image data after the decoding operate is carried out by the deep neural network model accelerated after operation Video structure analyzing, obtains feature set；

The feature set is sent to the client.
2. according to the method described in claim 1, it is characterized in that, described according to the attribute information is the pending image Data configuration GPU resource, including：

Obtain the resource state information of server cluster；

The pending image data is determined according to the resource state information of the server cluster and the attribute information GPU resource.
3. method according to claim 1 or 2, it is characterised in that described refreshing to depth by multi-stage compression optimization method Acceleration operation is carried out through network model, including：

Obtain the precision threshold of the deep neural network model；

Acceleration operation, the multi-stage compression optimization side carry out deep neural network model according to the multi-stage compression optimization method Execution sequence is method successively：Layer mixing operation, passage sparse operation, the operation of core Regularization and weights INT8 quantify, described to add The precision of deep neural network model after speed operation is higher than the precision threshold.
4. method according to any one of claims 1 to 3, it is characterised in that described to accelerate the depth after operating by described Spend neural network model and video structure analyzing carried out to the pending image data after the decoding operate, obtain feature set, Including：

The pending image data after the decoding operate is carried out by the deep neural network model accelerated after operation Target detection, obtains target, and aspect ratio pair, and identification are carried out to the target, and determines the key feature of the target, Obtain the feature set.
5. method according to any one of claims 1 to 4, it is characterised in that the attribute information includes described pending The memory size of image data；

The method further includes：

Obtain current network speed；

When the memory size of the current network speed and the pending image data meets preset condition, described in execution The step of GPU resource is configured for the pending image data according to the attribute information.
6. a kind of control platform, the control platform includes high-throughput distributed post and subscribes to information apparatus, it is used for and takes Communicate between business device cluster, it is characterised in that the high-throughput distributed post, which subscribes to information apparatus, to be included receiving list Member, dispensing unit, accelerator module, analytic unit and transmitting element, wherein,

The receiving unit, for receiving the analysis request sent by client, pending image is carried in the analysis request The attribute information of data；

The dispensing unit, for configuring GPU resource according to the attribute information for the pending image data；

The receiving unit, is additionally operable to receive the pending image data by the GPU resource, and to the pending shadow As data carry out decoding operate；

The accelerator module, for carrying out acceleration operation to deep neural network model by multi-stage compression optimization method；

The analytic unit, for accelerating the deep neural network model after operating to being treated after the decoding operate by described Process image data carries out video structure analyzing, obtains feature set；

The transmitting element, for the feature set to be sent to the client.
7. control platform according to claim 6, it is characterised in that the dispensing unit includes：

First acquisition module, for obtaining the resource state information of server cluster；

Configuration module, for the resource state information according to the server cluster and the attribute information determine described in wait to locate Manage the GPU resource of image data.
8. the control platform according to claim 6 or 7, it is characterised in that the accelerator module includes：

Second acquisition module, for obtaining the precision threshold by deep neural network model；

Accelerating module, it is described for carrying out acceleration operation to deep neural network model according to the multi-stage compression optimization method Execution sequence is multi-stage compression optimization method successively：Layer mixing operation, passage sparse operation, the operation of core Regularization and weights INT8 quantifies, and the precision for accelerating the deep neural network model after operating is higher than the precision threshold.
9. according to claim 6 to 8 any one of them control platform, it is characterised in that the analytic unit is specifically used for：

The pending image data after the decoding operate is carried out by the deep neural network model accelerated after operation Target detection, obtains target, and aspect ratio pair, and identification are carried out to the target, and determines the key feature of the target, Obtain the feature set.
10. according to claim 6 to 9 any one of them control platform, it is characterised in that the attribute information includes described treat The memory size of process image data；

The control platform further includes：

Acquiring unit, for obtaining current network speed；In the current network speed and described treated by the dispensing unit When the memory size of process image data meets preset condition, it according to the attribute information is the pending image to perform described The step of data configuration GPU resource.