WO2021026775A1 - 神经网络数据流加速方法、装置、计算机设备及存储介质 - Google Patents

神经网络数据流加速方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021026775A1
WO2021026775A1 PCT/CN2019/100402 CN2019100402W WO2021026775A1 WO 2021026775 A1 WO2021026775 A1 WO 2021026775A1 CN 2019100402 W CN2019100402 W CN 2019100402W WO 2021026775 A1 WO2021026775 A1 WO 2021026775A1
Authority
WO
WIPO (PCT)
Prior art keywords
data stream
hardware
neural network
video data
acceleration
Prior art date
Application number
PCT/CN2019/100402
Other languages
English (en)
French (fr)
Inventor
姜浩
蔡权雄
牛昕宇
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to CN201980066983.XA priority Critical patent/CN115462079A/zh
Priority to PCT/CN2019/100402 priority patent/WO2021026775A1/zh
Publication of WO2021026775A1 publication Critical patent/WO2021026775A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to neural network data stream acceleration methods, devices, computer equipment, and storage media.
  • the mainstream streaming media (such as video stream, audio stream, etc.) acceleration framework on the market uses a main control CPU, which will call a dedicated video stream video codec chip or SOC module for video acquisition and video decoding, and then decode it
  • the finished picture uses GPU or TPU acceleration chips for artificial intelligence analysis, especially the use of deep learning neural networks for recognition, prediction, etc.
  • the final result will be analyzed or stored by the main control CPU.
  • the existing technical solutions have disadvantages such as high cost and high power consumption by using GPU or TPU for hardware acceleration.
  • problems such as insufficient hardware utilization, too much communication data between chips, and low efficiency.
  • the commonly used GPU for neural network acceleration also has high cost and high power consumption. problem.
  • the purpose of the embodiments of the present application is to propose a neural network data stream acceleration method, device, computer equipment, and storage medium, so as to reduce the cost and power consumption of neural network data stream processing, and improve the data stream processing efficiency.
  • an embodiment of the present application provides a neural network data flow method, which adopts the following technical solutions:
  • the step of obtaining a video data stream includes:
  • step of hardware decoding the video data stream includes:
  • the video data stream is decoded by the built-in graphics processing unit of the chip.
  • the method further includes the steps:
  • the step of configuring hardware resources based on the neural network structure, and performing hardware acceleration of the data stream on the video data stream decoded by the hardware through the configured hardware resources includes:
  • the method further includes the steps:
  • the output result of the neural network is post-processed.
  • an embodiment of the present application also provides a neural network data stream acceleration device, which adopts the following technical solutions:
  • the neural network data stream acceleration device includes:
  • a decoding module for hardware decoding the video data stream
  • An acceleration module configured to configure hardware resources based on the structure of a neural network, and perform data stream hardware acceleration on the video data stream decoded by the hardware through the configured hardware resources;
  • the output module is used to input the data accelerated by the data stream hardware to the neural network and output the result.
  • the acquisition module includes:
  • the obtaining subunit is used to obtain the video data stream from a webcam or a video stream server.
  • the embodiments of the present application also provide a computer device, which adopts the following technical solutions:
  • the computer device includes a memory and a processor, and a computer program is stored in the memory.
  • the processor executes the computer program, the neural network data stream acceleration method according to any one of the embodiments of the present application is implemented A step of.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the neural network data stream acceleration method proposed in any one of the embodiments of the present application are implemented.
  • the embodiments of the present application mainly have the following beneficial effects: obtain a video data stream; perform hardware decoding on the video data stream; configure hardware resources based on the structure of a neural network, and decode the video after the hardware
  • the data flow is hardware accelerated through the configured hardware resources; the data after the hardware acceleration of the data flow is input to the neural network and the result is output.
  • the neural network processing data stream is improved Efficiency and reduce cost and power consumption.
  • Fig. 1 is a flowchart of an embodiment of a neural network data stream acceleration method according to the present application
  • FIG. 2 is a flowchart of a specific implementation of step 103 in FIG. 1;
  • Fig. 3 is a schematic structural diagram of an embodiment of a neural network data stream acceleration device according to the present application.
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • Fig. 1 shows a flowchart of an embodiment of a neural network data flow acceleration method according to the present application.
  • the neural network data flow acceleration method includes the following steps:
  • Step 101 Obtain a video data stream.
  • the video data stream is a continuous picture frame, such as an RGB picture, which is divided into blocks and then encoded.
  • the method of obtaining can be obtained from a local or web camera, from other video streaming servers, etc., through wired or wireless methods. .
  • Step 102 Perform hardware decoding on the video data stream.
  • decoding basically performs the completely opposite process of the encoding process, and the design of the video codec is usually standardized, that is, the published encoding document accurately regulates how the decoding is performed.
  • the video data stream can be decoded by hardware through the chip's built-in graphics processing unit or a dedicated decoder, which improves decoding efficiency and resource utilization.
  • Step 103 Configure hardware resources based on the structure of the neural network, and perform data stream hardware acceleration on the video data stream decoded by the hardware through the configured hardware resources.
  • the neural network includes a neural network graph (neural network structure) and parameters corresponding to the structure.
  • the structure of the neural network uses layers as the calculation unit, including but not limited to convolutional layers, pooling layers, and ReLU (Activation function), fully connected layer, etc.
  • each layer in the neural network structure also has a large number of parameters, including but not limited to: weight (weight), bias (bias), etc.
  • weight weight
  • bias bias
  • Step 104 Input the hardware accelerated data of the data stream to the neural network and output the result.
  • the neural network may be a deep learning model for object detection, target recognition, position prediction, etc., such as faster-RCNN, Yolo, SSD, etc.
  • the video data stream is acquired; the video data stream is decoded by hardware; the hardware resources are configured based on the structure of the neural network, and the video data stream after the hardware decoding is performed through the configured hardware resources Data stream hardware acceleration; input the data after the data stream hardware acceleration to the neural network and output the result.
  • the neural network processing data stream is improved Efficiency and reduce cost and power consumption.
  • step 101 may include:
  • Step 1011 Obtain the video data stream from a webcam or a video stream server.
  • the method of obtaining the video data stream can be obtained from a local or web camera or network video streaming server through a wired or wireless method through a video streaming protocol such as HTTP or RTSP, and save it to the local memory In space.
  • a video streaming protocol such as HTTP or RTSP
  • step 102 may include the following steps:
  • Step 1021 The video data stream is decoded by the built-in graphics processing unit of the chip.
  • the first step of a typical digital video codec is to convert the video input from the camera from RGB color space to YCbCr color space, and it is usually accompanied by chroma sampling to generate 4:2:0 format video. (Sometimes 4:2:2 sampling is used in the case of interlaced scanning). Converting to YCbCr chrominance space will bring two benefits: first, this partially removes the correlation in the chrominance signal and improves the compressibility; second, it separates the luminance signal, and the luminance signal is The visual perception is the most important. Relatively speaking, the chrominance signal is not so important to the visual perception. It can be sampled to a lower resolution (4:2:0 or 4:2:2) without affecting the perception of people.
  • sampling the space or time domain can effectively reduce the data volume of the original video data.
  • the input video image is usually divided into macroblocks for encoding respectively, and the size of the macroblock is usually 16x16 luminance block information and corresponding chrominance block information.
  • block motion compensation to predict the data of the current frame from the encoded frame.
  • block transform or subband decomposition is used to reduce the statistical correlation in the spatial domain.
  • the most common transform is the 8x8 discrete cosine transform (DCT fordiscrete cosine transform).
  • DCT discrete cosine transform
  • the quantized two-dimensional coefficients when using DCT transform, usually use Zig-zag scanning to express the coefficients as one-dimensional, and then encode the number of consecutive 0 coefficients and the size of non-zero coefficients (Level) Get a symbol, usually there is also a special symbol to indicate that all the remaining coefficients are equal to 0.
  • Entropy coding at this time usually uses variable length coding.
  • the design of the video codec is usually standardized, that is to say, the published code document accurately regulates how the decoding is performed.
  • the code stream coded by the A encoder can be decoded by the B decoder, and vice versa
  • the encoding process is not completely defined by a standard. Users have the freedom to design their own encoder, as long as the code stream generated by the encoder designed by the user conforms to the decoding specification.
  • the decoding process is basically the opposite of the encoding process, that is, the graphics processing unit GPU that comes with the chip (such as Intel central processing unit, etc.) can be used to decode the obtained video data stream according to the encoding specification, so as to obtain the decoded Continuous RGBA picture frame or YUV420 picture frame.
  • the graphics processing unit GPU that comes with the chip (such as Intel central processing unit, etc.) can be used to decode the obtained video data stream according to the encoding specification, so as to obtain the decoded Continuous RGBA picture frame or YUV420 picture frame.
  • step 102 of hardware decoding the video data stream hardware resources are configured based on the structure of the neural network, and the video data stream after the hardware decoding is processed through the configured hardware resources.
  • step 103 of streaming hardware acceleration the method further includes the steps:
  • step 103 specifically includes the following steps:
  • Step 1031 Obtain the structure of the neural network.
  • the structure of the neural network is based on the layer as the computing unit, including but not limited to the input layer, convolutional layer, pooling layer, ReLU (activation function), fully connected layer, etc.
  • different neural networks use different types and different numbers
  • the layers are combined to form a neural network structure with different functions.
  • each layer in the neural network structure also has a large number of parameters, including but not limited to: weight (weight), bias (bias), etc.
  • Step 1032 Dynamically allocate hardware resources according to the structure of the neural network and optimize the timing of the hardware resources.
  • Step 1033 Use the hardware resource to perform data stream acceleration on the video data stream decoded by the hardware.
  • the hardware resources required by the corresponding structure can be dynamically allocated.
  • the corresponding calculation unit is allocated to perform Calculate operations and store the calculation results through the register cache unit, which is convenient for the next layer to quickly read, save data copy time, accelerate the calculation speed of the neural network, and optimize the timing of the neural network calculation through the pipeline unit.
  • Hardware acceleration is performed on the video data stream, thereby improving the efficiency of the neural network to process the data stream and reducing power consumption.
  • the method further includes the steps:
  • the output result of the neural network is post-processed.
  • the output result of the neural network is the feature value, which can be understood as an abstract representation of the input picture or data.
  • Post-processing is mainly to convert the abstract representation, that is, the feature value into a meaningful output, such as classification.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • this application provides an embodiment of a neural network data stream acceleration device.
  • the device embodiment corresponds to the method embodiment shown in FIG.
  • the device can be specifically applied to various electronic devices.
  • the neural network data stream acceleration device 200 in this embodiment includes: an acquisition module 201, a decoding module 202, an acceleration module 203, and an output module 204. among them:
  • the obtaining module 201 is used to obtain a video data stream
  • the decoding module 202 is configured to perform hardware decoding on the video data stream
  • the acceleration module 203 is configured to configure hardware resources based on the structure of a neural network, and perform data stream hardware acceleration on the video data stream decoded by the hardware through the configured hardware resources;
  • the output module 204 is configured to input the data accelerated by the data stream hardware to the neural network and output the result.
  • the above-mentioned obtaining module 201 includes: an obtaining subunit, configured to obtain the video data stream from a webcam or a video stream server.
  • the above-mentioned apparatus 200 may further include: a pre-processing module and a post-processing module, where:
  • the pre-processing module is used for pre-processing the video data stream after hardware decoding.
  • the post-processing module is used for post-processing the output result of the neural network.
  • the neural network data stream acceleration device provided by the embodiment of the present application can realize the various implementation manners in the method embodiment of FIG. 1 and the corresponding beneficial effects. To avoid repetition, details are not repeated here.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other in communication via a system bus. It should be pointed out that the figure only shows the computer device 4 with components 41-43, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes but is not limited to microprocessors, dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 41 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or memory of the computer device 4.
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 16, a smart media card (SMC), a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is generally used to store the operating system and various application software installed in the computer device 4, such as the program code of the neural network data stream acceleration method.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4.
  • the processor 42 is configured to run the program code stored in the memory 41 or process data, for example, run the program code of the neural network data flow acceleration method.
  • the network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is usually used to establish a communication connection between the computer device 4 and other electronic devices.
  • This application also provides another implementation manner, that is, a computer-readable storage medium that stores a neural network data stream acceleration program, and the neural network data stream acceleration program can be processed by at least one The processor executes, so that the at least one processor executes the steps of the neural network data stream acceleration method described above.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种人工智能领域的神经网络数据流加速方法、装置、计算机设备及存储介质,方法包括:获取视频数据流(101);将所述视频数据流进行硬件解码(102);基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速(103);将所述数据流硬件加速后的数据输入到所述神经网络并输出结果(104)。通过将所述视频数据流进行硬件解码,并基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过配置后的硬件资源进行数据流硬件加速,提高神经网络处理数据流的效率并降低成本和功耗。

Description

神经网络数据流加速方法、装置、计算机设备及存储介质 技术领域
本申请涉及人工智能技术领域,尤其涉及神经网络数据流加速方法、装置、计算机设备及存储介质。
背景技术
目前,市面上的流媒体(如视频流、音频流等)加速主流框架采用一个主控CPU,主控CPU会调用专用视频流视频编解码芯片或者SOC模块进行视频获取和视频解码,之后将解码完的图片使用GPU或TPU等加速芯片进行人工智能分析,尤其是使用深度学习的神经网络进行识别、预测等,最后结果将由主控CPU进行下一步分析或存储。但是,现有的技术方案存在采用GPU或TPU进行硬件加速具有成本高,功耗高等缺点。另外存在硬件使用率不够高,各芯片间通信数据太多效率低等问题,特别是在对视频流数据进行人工智能处理的时候,普遍采用的GPU进行神经网络加速同样具有高成本高功耗等问题。
发明内容
本申请实施例的目的在于提出一种神经网络数据流加速方法、装置、计算机设备及存储介质,以降低神经网络数据流处理的成本和功耗,提高数据流处理效率。
为了解决上述技术问题,本申请实施例提供一种神经网络数据流方法,采用了如下所述的技术方案:
包括下述步骤:
获取视频数据流;
将所述视频数据流进行硬件解码;
基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;
将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
进一步的,所述获取视频数据流的步骤包括:
从网络摄像头或视频流服务器获取所述视频数据流。
进一步的,所述将所述视频数据流进行硬件解码的步骤包括:
将所述视频数据流通过芯片自带图形处理单元进行解码。
进一步的,在所述将所述视频数据流进行硬件解码的步骤之后,基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速的步骤之前,所述方法还包括步骤:
将所述进行硬件解码后的视频数据流进行前处理。
进一步的,所述基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速的步骤包括:
获取所述神经网络的结构;
根据所述神经网络的结构动态分配硬件资源并对硬件资源进行时序优化;
使用所述硬件资源对所述硬件解码后的视频数据流进行数据流加速。
进一步的,在所述将所述数据流硬件加速后的数据输入到所述神经网络并输出结果的步骤之后,所述方法还包括步骤:
将所述神经网络的输出结果进行后处理。
为了解决上述技术问题,本申请实施例还提供一种神经网络数据流加速装置,采用了如下所述的技术方案:
所述神经网络数据流加速装置,包括:
获取模块,用于获取视频数据流;
解码模块,用于将所述视频数据流进行硬件解码;
加速模块,用于基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;
输出模块,用于将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
进一步的,所述获取模块包括:
获取子单元,用于从网络摄像头或视频流服务器获取所述视频数据流。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:
所述计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现本申请实施例中提出的任一项所述的神经网络数据流加速方法的步骤。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:
所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例中提出的任一项所述的神经网络数据流加速方法的步骤。
与现有技术相比,本申请实施例主要有以下有益效果:获取视频数据流;将所述视频数据流进行硬件解码;基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。通过将所述视频数据流进行硬件解码,并基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过配置后的硬件资源进行数据流硬件加速,提高神经网络处理数据流的效率并降低成本和功耗。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1根据本申请的神经网络数据流加速方法的一个实施例的流程图;
图2是图1中步骤103的一种具体实施方式的流程图;
图3是根据本申请的神经网络数据流加速装置的一个实施例的结构示意 图;
图4是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
如图1所示,图1示出了根据本申请的神经网络数据流加速方法的一个实施例的流程图。所述的神经网络数据流加速方法,包括以下步骤:
步骤101,获取视频数据流。
在本实施例中,视频数据流是连续的图片帧,如RGB图片,经过分割分块然后编码,获取的方式可以是从本地或网络摄像头、从其他视频流服务器等通过有线或无线的方式获得。
步骤102,将所述视频数据流进行硬件解码。
其中,解码基本上执行和编码的过程完全相反的过程,而视频编解码器的设计通常是标准化的,即由发布的编码文档来准确的规范解码如何进行。 在本实施例中,可以通过芯片自带图形处理单元或专用解码器对视频数据流进行硬件解码,提高解码效率和资源利用率。
步骤103,基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速。
在本实施例中,神经网络包含神经网络图(神经网络结构)及对应该结构的参数,其中神经网络的结构是以层为计算单元的,包含且不限于卷积层、池化层、ReLU(激活函数)、全连接层等。神经网络结构中的每一层除了接收上一层输出的数据流外还具有大量的参数,这些参数包含且不限于:weight(权重)、bias(偏置)等。根据所述神经网络的结构,分配对应结构所需要的硬件资源,例如计算单元和缓存单元以及可进行时序优化的流水线单元等,对所述视频数据流进行硬件加速,从而提高神经网络处理数据流的效率并降低功耗。
步骤104,将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
在本实施例中,所述神经网络可以是进行物体检测、目标识别、位置预测等的深度学习模型,如faster-RCNN、Yolo、SSD等。将所述硬件加速后的视频数据流输入到这些神经网络中,并从输出的结果中不仅可以识别出物体属于哪个分类,还可以得到物体在图片中的具体位置,而且识别错误率低,速度也较快,能满足如视频流中的目标的实时检测的场景。
在本实施例中,获取视频数据流;将所述视频数据流进行硬件解码;基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。通过将所述视频数据流进行硬件解码,并基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过配置后的硬件资源进行数据流硬件加速,提高神经网络处理数据流的效率并降低成本和功耗。
进一步的,上述步骤101可以包括:
步骤1011,从网络摄像头或视频流服务器获取所述视频数据流。
在本实施例中,获取所述视频数据流的方式可通过HTTP或者RTSP等视频流传输协议,从本地或网络摄像头、网络视频流服务器通过有线或无线的方式获取,并将其保存到本地内存空间中。
进一步的,上述步骤102可以包括以下步骤:
步骤1021,将所述视频数据流通过芯片自带图形处理单元进行解码。
其中,一个典型的数字视频编解码器的第一步是将从摄像机输入的视频从RGB色度空间转换到YCbCr色度空间,而且通常还伴有色度抽样来生成4:2:0格式的视频(有时候在隔行扫描的情况下会采用4:2:2的抽样方式)。转换到YCbCr色度空间会带来两点好处:第一,这样做部分的解除了色度信号中的相关性,提高了可压缩能力;第二这样做将亮度信号分离出来,而亮度信号对视觉感觉是最重要的,相对来说色度信号对视觉感觉就不是那么重要,可以抽样到较低的分辨率(4:2:0或者4:2:2)而不影响人观看的感觉。
此外,在真正的编码之前,对空域或者时域抽样可以有效地降低原始视频数据的数据量。输入的视频图像通常被分割为宏块分别进行编码,宏块的大小通常是16x16的亮度块信息和对应的色度块信息。然后使用分块的运动补偿从已编码的帧对当前帧的数据进行预测。之后,使用块变换或者子带分解来减少空域的统计相关性。最常见的变换是8x8的离散余弦变换(DCT fordiscrete cosine transform)。变换的输出系数接下来被量化,量化后的系数进行熵编码并成为输出码流的一部分。实际上在使用DCT变换的时候,量化后的二维的系数通常使用Zig-zag扫描将系数表示为一维的,再通过对连续0系数的个数和非0系数的大小(Level)进行编码得到一个符号,通常也有特殊的符号来表示后面剩余的所有系数全部等于0。这时候的熵编码通常使用变长编码。
视频编解码器的设计通常是标准化的,也就是说,由发布的编码文档来准确的规范解码如何进行。实际上,为了使编码的码流具有互操作性(即由A编码器编成的码流可以由B解码器解码,反之亦然),仅仅对解码器的解码过 程进行规范就足够了。通常编码的过程并不完全被一个标准所定义,用户有设计自己编码器的自由,只要用户设计的编码器编码产生的码流是符合解码规范的就可以了。解码基本上执行和编码的过程完全相反的过程,即可以使用芯片(如Intel中央处理器等)自带的图形处理单元GPU根据编码规范对获得的视频数据流进行解码,从而可以获得解码后的连续的RGBA图片帧或YUV420图片帧。
进一步的,在所述将所述视频数据流进行硬件解码的步骤102之后,基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速的步骤103之前,所述方法还包括步骤:
将所述进行硬件解码后的视频数据流进行前处理。
在本实施例中,需要对上述硬件解码后的视频数据流进行前处理,包括将视频流的每一图片帧的格式转换为神经网络进行识别和预测所需要的图片尺寸以及数据格式,然后再输入上述神经网络,以提高其处理速度。
进一步的,如图2所示,上述步骤103具体包括以下步骤:
步骤1031,获取所述神经网络的结构。
其中,神经网络的结构是以层为计算单元的,包含且不限于输入层、卷积层、池化层、ReLU(激活函数)、全连接层等,不同的神经网络通过不同类型和不同数量的层进行组合形成有不同功能的神经网络结构。神经网络结构中的每一层除了接收上一层输出的数据流外还具有大量的参数,这些参数包含且不限于:weight(权重)、bias(偏置)等。
步骤1032,根据所述神经网络的结构动态分配硬件资源并对硬件资源进行时序优化。
步骤1033,使用所述硬件资源对所述硬件解码后的视频数据流进行数据流加速。
在本实施例中,根据所述获取的神经网络的结构,可以动态分配对应结构所需要的硬件资源,例如根据每一层或某几层组合的具有特定功能的结构, 分配相应的计算单元进行计算操作,并将计算结果通过寄存器缓存单元存储起来,方便下一层快速读取,节省数据的拷贝时间,加速神经网络的计算速度,还可通过流水线单元对神经网络的计算进行时序优化等,对所述视频数据流进行硬件加速,从而提高神经网络处理数据流的效率并降低功耗。
进一步的,在所述将所述数据流硬件加速后的数据输入到所述神经网络并输出结果的步骤之后,所述方法还包括步骤:
将所述神经网络的输出结果进行后处理。
其中,神经网络输出的结果是特征值,可以理解为是对于输入图片或数据的一种抽象表征,后处理主要是通过一些计算方法将抽象的表征即特征值转换为有意义的输出,如分类问题中图片类别及对应的概率,检测问题中,图片中包含的目标类别、概率及坐标等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
进一步参考图3,作为对上述图3所示方法的实现,本申请提供了一种神经网络数据流加速装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图3所示,本实施例所述的神经网络数据流加速装置200包括:获取模块201、解码模块202、加速模块203以及输出模块204。其中:
获取模块201,用于获取视频数据流;
解码模块202,用于将所述视频数据流进行硬件解码;
加速模块203,用于基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;
输出模块204,用于将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
进一步的,上述获取模块201包括:获取子单元,用于从网络摄像头或视频流服务器获取所述视频数据流。
在本实施例的一些可选的实现方式中,上述装置200还可以包括:前处理模块和后处理模块,其中:
前处理模块用于将所述进行硬件解码后的视频数据流进行前处理。
后处理模块用于将所述神经网络的输出结果进行后处理。
本申请实施例提供的神经网络数据流加速装置能够实现图1的方法实施例中的各个实施方式,以及相应有益效果,为避免重复,这里不再赘述。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器 等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备16上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如神经网络数据流加速方法的程序代码等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的程序代码或者处理数据,例如运行所述神经网络数据流加速方法的程序代码。
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有神经网络数据流加速程序,所述神经网络数据流加速程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的神经网络数据流加速方法的步骤。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (10)

  1. 一种神经网络数据流加速方法,其特征在于,包括:
    获取视频数据流;
    将所述视频数据流进行硬件解码;
    基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;
    将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
  2. 如权利要求1所述的方法,其特征在于,所述获取视频数据流的步骤包括:
    从网络摄像头或视频流服务器获取所述视频数据流。
  3. 如权利要求1所述的方法,其特征在于,所述将所述视频数据流进行硬件解码的步骤包括:
    将所述视频数据流通过芯片自带图形处理单元进行解码。
  4. 如权利要求3所述的方法,其特征在于,在所述将所述视频数据流进行硬件解码的步骤之后,基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速的步骤之前,所述方法还包括步骤:
    将所述进行硬件解码后的视频数据流进行前处理。
  5. 如权利要求4所述的方法,其特征在于,所述基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速的步骤包括:
    获取所述神经网络的结构;
    根据所述神经网络的结构动态分配硬件资源并对硬件资源进行时序优化;
    使用所述硬件资源对所述硬件解码后的视频数据流进行数据流加速。
  6. 如权利要求5所述的方法,其特征在于,在所述将所述数据流硬件加速后的数据输入到所述神经网络并输出结果的步骤之后,所述方法还包括步骤:
    将所述神经网络的输出结果进行后处理。
  7. 一种神经网络数据流加速装置,其特征在于,包括:
    获取模块,用于获取视频数据流;
    解码模块,用于将所述视频数据流进行硬件解码;
    加速模块,用于基于神经网络的结构配置硬件资源,并将所述硬件解码后的视频数据流通过所述配置后的硬件资源进行数据流硬件加速;
    输出模块,用于将所述数据流硬件加速后的数据输入到所述神经网络并输出结果。
  8. 如权利要求7所述装置,其特征在于,所述获取模块包括:
    获取子单元,用于从网络摄像头或视频流服务器获取所述视频数据流。
  9. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至6中任一项所述的神经网络数据流加速方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6中任一项所述的神经网络数据流加速方法的步骤。
PCT/CN2019/100402 2019-08-13 2019-08-13 神经网络数据流加速方法、装置、计算机设备及存储介质 WO2021026775A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980066983.XA CN115462079A (zh) 2019-08-13 2019-08-13 神经网络数据流加速方法、装置、计算机设备及存储介质
PCT/CN2019/100402 WO2021026775A1 (zh) 2019-08-13 2019-08-13 神经网络数据流加速方法、装置、计算机设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/100402 WO2021026775A1 (zh) 2019-08-13 2019-08-13 神经网络数据流加速方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021026775A1 true WO2021026775A1 (zh) 2021-02-18

Family

ID=74570853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/100402 WO2021026775A1 (zh) 2019-08-13 2019-08-13 神经网络数据流加速方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN115462079A (zh)
WO (1) WO2021026775A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179898A1 (en) * 2011-11-24 2016-06-23 Alibaba Group Holding Limited Distributed data stream processing method and system
CN107067365A (zh) * 2017-04-25 2017-08-18 中国石油大学(华东) 基于深度学习的分布嵌入式实时视频流处理系统及方法
CN108012156A (zh) * 2017-11-17 2018-05-08 深圳市华尊科技股份有限公司 一种视频处理方法及控制平台
CN108520296A (zh) * 2018-03-20 2018-09-11 福州瑞芯微电子股份有限公司 一种基于深度学习芯片动态cache分配的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179898A1 (en) * 2011-11-24 2016-06-23 Alibaba Group Holding Limited Distributed data stream processing method and system
CN107067365A (zh) * 2017-04-25 2017-08-18 中国石油大学(华东) 基于深度学习的分布嵌入式实时视频流处理系统及方法
CN108012156A (zh) * 2017-11-17 2018-05-08 深圳市华尊科技股份有限公司 一种视频处理方法及控制平台
CN108520296A (zh) * 2018-03-20 2018-09-11 福州瑞芯微电子股份有限公司 一种基于深度学习芯片动态cache分配的方法和装置

Also Published As

Publication number Publication date
CN115462079A (zh) 2022-12-09

Similar Documents

Publication Publication Date Title
CN108780499B (zh) 基于量化参数的视频处理的系统和方法
EP3746944A1 (en) Use of non-linear function applied to quantization parameters in machine-learning models for video coding
WO2021208247A1 (zh) 一种视频图像的拟态压缩方法、装置、存储介质及终端
CN105163127A (zh) 视频分析方法及装置
CN111182303A (zh) 共享屏幕的编码方法、装置、计算机可读介质及电子设备
CN111614956B (zh) Dc系数符号代码化方案
CN112203088B (zh) 用于非基带信号代码化的变换选择
WO2019027523A1 (en) SCANNING ORDER ADAPTATION FOR ENTROPY ENCODING IMAGE DATA BLOCKS
CN107018416B (zh) 用于视频和图像压缩的自适应贴片数据大小编码
CN109429066A (zh) 视频编码装置和视频编码系统
WO2018222238A1 (en) Improved coding of intra-prediction modes
CN109495742B (zh) 一种视频帧编码方法、装置及设备
WO2023142715A1 (zh) 视频编码方法、实时通信方法、装置、设备及存储介质
WO2023143349A1 (zh) 一种面部视频编码方法、解码方法及装置
Zhang et al. Globally variance-constrained sparse representation and its application in image set coding
WO2021026775A1 (zh) 神经网络数据流加速方法、装置、计算机设备及存储介质
CN116567246A (zh) Avc编码方法和装置
WO2022100140A1 (zh) 一种压缩编码、解压缩方法以及装置
CN115442617A (zh) 一种基于视频编码的视频处理方法和装置
WO2022258055A1 (zh) 点云属性信息编码方法、解码方法、装置及相关设备
WO2023025024A1 (zh) 点云属性编码方法、点云属性解码方法及终端
WO2022258009A1 (zh) 熵编码、解码方法及装置
WO2024007977A1 (zh) 图像处理方法、装置及设备
WO2023078204A1 (zh) 数据处理方法、装置、设备、可读存储介质及程序产品
WO2023098807A1 (zh) 点云编、解码处理方法、装置、编码设备及解码设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941138

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941138

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05-04-2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19941138

Country of ref document: EP

Kind code of ref document: A1