JP6714297B2

JP6714297B2 - Processing device and inference processing method

Info

Publication number: JP6714297B2
Application number: JP2017174491A
Authority: JP
Inventors: 一樹客野
Original assignee: Axell Corp
Current assignee: Axell Corp
Priority date: 2017-09-12
Filing date: 2017-09-12
Publication date: 2020-06-24
Anticipated expiration: 2037-09-12
Also published as: JP2019049916A

Description

本発明は、例えば、画像認識等の推論処理を実行する処理装置等に関する。 The present invention relates to, for example, a processing device that executes inference processing such as image recognition.

従来、画像認識等の推論処理を行うために、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を実装した計算機が用いられている。計算機においてＣＮＮを実装する場合においては、ディープラーニング等によって予め得られたＣＮＮで使用する係数を含む学習済みデータを計算機のメインメモリに無歪（非圧縮）の状態で格納させるか、或いは、単純なベクトル量子化された状態で格納させている。 Conventionally, in order to perform inference processing such as image recognition, a computer equipped with a convolutional neural network (CNN: Convolutional Neural Network) has been used. When implementing CNN in a computer, the learned data including the coefficient used in CNN obtained in advance by deep learning etc. is stored in the main memory of the computer in an undistorted (uncompressed) state, or simply The vector is quantized and stored.

例えば、学習済みデータは、ネットワークを介して計算機に対して提供される場合がある。学習済みデータは、例えば、無歪の状態であれば、２００ＭＢ等のように比較的データサイズが大きいので、ネットワークを介して送信すると、ネットワークに負荷がかかると共に、送信コストが高くなるという問題がある。 For example, the learned data may be provided to the computer via the network. The learned data has a relatively large data size, such as 200 MB, in the undistorted state. Therefore, when the data is transmitted via the network, the network is burdened and the transmission cost increases. is there.

これに対して、例えば、学習済みデータを圧縮して送信する技術が提案されている（例えば、非特許文献１参照）。 On the other hand, for example, a technique of compressing and transmitting learned data has been proposed (for example, see Non-Patent Document 1).

なお、ノイマン型の計算機と異なるニューロコンピュータにおいて、メモリ空間を有効活用する技術として、従来のニューロコンピュータに、主メモリユニット、圧縮ユニット、展開ユニットを付加し、１６ビットの重み値入力データを４段階に分類し、出現頻度の高い順に低位ビットに割当てることにより、データ圧縮を行う技術が知られている（例えば、特許文献１参照）。 As a technique for effectively utilizing the memory space in a neuro computer different from the Neumann computer, a main memory unit, a compression unit, and a decompression unit are added to a conventional neuro computer, and 16-bit weight value input data is divided into four stages. There is known a technique of performing data compression by classifying the data into a plurality of bits and assigning them to the lower bits in descending order of appearance frequency (for example, refer to Patent Document 1).

特開平６−１７６０００号公報JP-A-6-176000

Ｄｅｅｐ−Ｃｏｍｐｒｅｓｓｉｏｎ−ＡｌｅｘＮｅｔ、インターネット（URL:https://github.com/songhan/Deep-Compression-AlexNet）Deep-Compression-AlexNet, Internet (URL:https://github.com/songhan/Deep-Compression-AlexNet)

例えば、非特許文献１による技術によると、ネットワークにおける学習済みデータの転送を比較的高速に行うことができるようになる。しかしながら、計算機において推論処理を行う場合においては、推論処理に使用する学習済みデータの全てを無歪の状態としてメモリ上に展開させることとなるので、多くのメモリが消費されてしまう。 For example, according to the technique of Non-Patent Document 1, it becomes possible to transfer learned data in a network at a relatively high speed. However, when performing inference processing in a computer, all of the learned data used for inference processing is expanded in the memory as a distortion-free state, and a large amount of memory is consumed.

本発明は、上記事情に鑑みなされたものであり、その目的は、推論処理に必要なメモリの容量を低減させることのできる技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a technique capable of reducing the memory capacity required for inference processing.

上記目的を達成するため、第１の観点に係る処理装置は、処理対象データに対して、推論処理を実行する処理装置であって、推論処理を構成する複数の処理レイヤーにおけるそれぞれの部分処理を実行する複数の部分処理部と、推論処理を構成する複数の処理レイヤーにおける部分処理に使用される処理係数に対応する処理係数情報が処理レイヤーの実行順に従って並べられたストリームデータを取得するストリームデータ取得部と、ストリームデータから処理係数情報を取出し、処理係数情報に対応する処理係数を使用する部分処理部に対して、処理係数情報に対応する処理係数を供給する処理係数供給部と、を備える。 In order to achieve the above object, a processing device according to a first aspect is a processing device that executes inference processing on processing target data, and performs partial processing in each of a plurality of processing layers that configure inference processing. Stream data that obtains stream data in which processing coefficient information corresponding to processing coefficients used for partial processing in a plurality of processing layers that configure the inference processing and the plurality of partial processing units to be executed are arranged in the order of execution of the processing layers An acquisition unit and a processing coefficient supply unit that extracts processing coefficient information from the stream data and supplies the processing coefficient corresponding to the processing coefficient information to the partial processing unit that uses the processing coefficient corresponding to the processing coefficient information. ..

上記処理装置において、ストリームデータにおける少なくとも一部の処理係数情報は、処理係数が圧縮された圧縮処理係数であり、処理係数供給部は、圧縮処理係数を伸長し、伸長された処理係数を部分処理部に供給するようにしてもよい。 In the above processing device, at least a part of the processing coefficient information in the stream data is a compression processing coefficient obtained by compressing the processing coefficient, and the processing coefficient supply unit expands the compression processing coefficient and partially processes the expanded processing coefficient. It may be supplied to the department.

また、上記処理装置において、ストリームデータは、複数の処理レイヤーにおけるそれぞれの部分処理を特定する部分処理特定情報を含み、ストリームデータの部分処理特定情報に基づいて、部分処理を実行する部分処理部を構築する部分処理部構築部をさらに有するようにしてもよい。 Further, in the above processing device, the stream data includes partial processing specifying information for specifying each partial processing in the plurality of processing layers, and a partial processing unit for executing the partial processing is executed based on the partial processing specifying information of the stream data. You may make it further have a partial process part construction part to build.

また、上記処理装置において、ストリームデータは、圧縮処理係数における処理係数の圧縮方法を示す圧縮情報を含み、処理係数供給部は、圧縮情報に基づいて、圧縮処理係数を伸長するようにしてもよい。 Further, in the above processing device, the stream data may include compression information indicating a compression method of the processing coefficient in the compression processing coefficient, and the processing coefficient supply unit may expand the compression processing coefficient based on the compression information. ..

また、上記処理装置において、処理係数は、２次元平面の各領域に対応する値を含むフィルタデータを含み、フィルタデータを圧縮させた圧縮処理係数は、フィルタデータを予測符号化した圧縮フィルタデータであり、圧縮情報には、フィルタデータを予測符号化した際に使用した予測方向を示す予測方向情報を含み、処理係数供給部は、予測方向情報が示す予測方向に基づいて、圧縮フィルタデータを伸長するようにしてもよい。 Further, in the above processing device, the processing coefficient includes filter data including a value corresponding to each area of the two-dimensional plane, and the compression processing coefficient obtained by compressing the filter data is compression filter data obtained by predictively encoding the filter data. Yes, the compression information includes prediction direction information indicating the prediction direction used when predictively encoding the filter data, and the processing coefficient supply unit expands the compression filter data based on the prediction direction indicated by the prediction direction information. You may do so.

また、上記処理装置において、複数の部分処理部は、畳み込み処理を実行する畳込処理部を含み、処理係数は、畳み込み処理に使用する複数のフィルタのデータであってもよい。 Further, in the above processing device, the plurality of partial processing units may include a convolution processing unit that performs a convolution process, and the processing coefficient may be data of a plurality of filters used in the convolution process.

また、上記処理装置において、画像処理の実行に適した画像処理プロセッサを備え、畳込処理部は、画像処理プロセッサを用いて構成されていてもよい。 Further, the above processing device may include an image processing processor suitable for executing image processing, and the convolution processing unit may be configured using the image processing processor.

また、上記処理装置において、複数の部分処理部は、直前の処理レイヤーの部分処理部の処理結果のすべてを入力として処理を行う全結合処理部を含み、処理係数は、全結合処理部に入力される処理結果のそれぞれに対する重み付けの係数であってもよい。 Further, in the above processing device, the plurality of partial processing units include a total combination processing unit that performs processing by using all the processing results of the partial processing units of the immediately preceding processing layer as input, and the processing coefficient is input to the total combination processing unit. It may be a weighting coefficient for each of the processed results.

また、上記処理装置において、ストリームデータは、通信ネットワークを介して外部装置から受信されてもよい。 Further, in the above processing device, the stream data may be received from an external device via a communication network.

また、上記目的を達成するため、第２の観点に係る処理装置は、処理対象データに対して、推論処理を実行する処理装置であって、推論処理を構成する複数の処理レイヤーにおけるそれぞれの部分処理を実行する複数の部分処理部と、少なくとも１以上の部分処理部において使用される処理係数を圧縮した圧縮処理係数を記憶するメモリ部と、推論処理における第１部分処理を実行する際に、第１部分処理を実行する第１部分処理部が使用する処理係数に対応する圧縮処理係数をメモリ部から読み出す読出制御部と、読み出された圧縮処理係数を伸長し、伸長した処理係数を第１部分処理部に渡す伸長処理部と、を備える。 Further, in order to achieve the above object, the processing device according to the second aspect is a processing device that executes inference processing on data to be processed, and is a part of each of a plurality of processing layers forming the inference processing. A plurality of partial processing units that execute processing, a memory unit that stores compression processing coefficients obtained by compressing processing coefficients used in at least one or more partial processing units, and a first partial processing in the inference processing, A read control unit that reads out a compression processing coefficient corresponding to a processing coefficient used by the first partial processing unit that executes the first partial processing from the memory unit, expands the read compression processing coefficient, and expands the expanded processing coefficient. A decompression processing unit that is passed to the one-part processing unit.

また、上記目的を達成するため、第３の観点に係る推論処理方法は、処理対象データに対して、推論処理を実行する処理装置による推論処理方法であって、処理装置には、推論処理を構成する複数の処理レイヤーにおけるそれぞれの部分処理を実行する複数の部分処理部が構築され、推論処理を構成する複数の処理レイヤーにおける部分処理に使用される処理係数に対応する処理係数情報を前記処理レイヤーの実行順に従って並べたストリームデータを取得し、ストリームデータの先頭から順に処理係数情報を取出し、処理係数情報に対応する処理係数を使用する部分処理部に対して、処理係数情報に基づく処理係数を供給する。 Further, in order to achieve the above object, the inference processing method according to the third aspect is an inference processing method by a processing device that executes inference processing on data to be processed, and the processing device is provided with an inference processing. A plurality of partial processing units that execute respective partial processes in the plurality of constituent processing layers are constructed, and the processing coefficient information corresponding to the processing coefficients used in the partial processing in the plurality of processing layers that constitute the inference processing is provided as the processing. Acquires stream data arranged according to the execution order of layers, extracts processing coefficient information in order from the beginning of the stream data, and outputs processing coefficient based on the processing coefficient information to the partial processing unit that uses the processing coefficient corresponding to the processing coefficient information. To supply.

また、上記目的を達成するため、第４の観点に係る推論処理方法は、処理対象データに対して、推論処理を実行する処理装置による推論処理方法であって、処理装置は、推論処理を構成する複数の処理レイヤーにおけるそれぞれの部分処理を実行する複数の部分処理部と、少なくとも１以上の部分処理部において使用される処理係数を圧縮した圧縮処理係数を記憶するメモリ部とを有しており、推論処理における第１部分処理を実行する際に、第１部分処理を実行する第１部分処理部が使用する処理係数に対応する圧縮処理係数をメモリ部から読み出し、読み出された圧縮処理係数を伸長し、伸長した処理係数を第１部分処理部に渡す。 In order to achieve the above object, the inference processing method according to the fourth aspect is an inference processing method by a processing device that executes inference processing on data to be processed, and the processing device configures the inference processing. A plurality of partial processing units that execute respective partial processes in a plurality of processing layers, and a memory unit that stores compression processing coefficients obtained by compressing processing coefficients used in at least one or more partial processing units. When executing the first partial processing in the inference processing, the compression processing coefficient corresponding to the processing coefficient used by the first partial processing unit that executes the first partial processing is read from the memory unit, and the read compression processing coefficient is read. Is extended, and the extended processing coefficient is passed to the first partial processing unit.

本発明によれば、推論処理に必要なメモリの容量を低減できる。 According to the present invention, it is possible to reduce the memory capacity required for inference processing.

図１は、第１実施形態に係る処理装置の機能構成図である。FIG. 1 is a functional configuration diagram of a processing device according to the first embodiment. 図２は、第１実施形態に係るストリームデータのフォーマットを説明する図である。FIG. 2 is a diagram illustrating the format of stream data according to the first embodiment. 図３は、第１実施形態に係るストリームヘッダ部と、レイヤーヘッダ部のフォーマットを説明する図である。FIG. 3 is a diagram for explaining the formats of the stream header part and the layer header part according to the first embodiment. 図４は、第１実施形態に係るレイヤー固有情報部のフォーマットを説明する図である。FIG. 4 is a diagram for explaining the format of the layer unique information section according to the first embodiment. 図５は、第１実施形態に係る処理装置のハードウェア構成図である。FIG. 5 is a hardware configuration diagram of the processing device according to the first embodiment. 図６は、第２実施形態に係る処理装置の機能構成図である。FIG. 6 is a functional configuration diagram of the processing device according to the second embodiment. 図７は、第３実施形態に係る処理装置の機能構成図である。FIG. 7 is a functional configuration diagram of the processing device according to the third embodiment.

いくつか実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 Some embodiments will be described with reference to the drawings. It should be noted that the embodiments described below do not limit the invention according to the claims, and all of the elements and combinations described in the embodiments are essential to the solution means of the invention. Not necessarily.

まず、第１実施形態に係る処理装置について説明する。 First, the processing apparatus according to the first embodiment will be described.

図１は、第１実施形態に係る処理装置の機能構成図である。 FIG. 1 is a functional configuration diagram of a processing device according to the first embodiment.

処理装置１００は、処理制御部１０と、１以上の処理部（部分処理部の一例）１１（１１−１〜１１−ｎ）と、１以上のデコーダ１２（１２−１，１２−３，１２−ｎ−１等）とを備える。ここで、処理制御部１０は、ストリームデータ取得部、部分処理部構築部に対応する。処理制御部１０及びデコーダ１２は、処理係数供給部に対応する。 The processing device 100 includes a processing control unit 10, one or more processing units (an example of partial processing units) 11 (11-1 to 11-n), and one or more decoders 12 (12-1, 12-3, 12). -N-1 etc.). Here, the processing control unit 10 corresponds to the stream data acquisition unit and the partial processing unit construction unit. The processing control unit 10 and the decoder 12 correspond to the processing coefficient supply unit.

処理装置１００には、推論処理を実行する畳み込みニューラルネットワーク（ＣＮＮ）が実装されている。ＣＮＮは、複数の処理レイヤー（レイヤー）により構成されている。図１の例では、ＣＮＮは、レイヤー１〜レイヤーＮにより構成されている。各レイヤーにおける処理（部分処理）は、各処理部１１によって実行される。ＣＮＮは、例えば、処理対象とする画像データが何を表しているか（例えば、人、犬、猫等の何が含まれているか）を推論する推論処理を実行して推論結果を出力する。ＣＮＮにおけるレイヤー数や、各レイヤーで実行する部分処理の種類は、任意に設定することができる。本実施形態では、レイヤー数や、各レイヤーで実行する部分処理については、後述するストリームデータ５０の内容に応じて変えることができる。 A convolutional neural network (CNN) that executes inference processing is installed in the processing device 100. The CNN is composed of a plurality of processing layers. In the example of FIG. 1, the CNN includes layers 1 to N. The processing (partial processing) in each layer is executed by each processing unit 11. The CNN executes, for example, an inference process that infers what the image data to be processed represents (for example, what is included in a person, a dog, a cat, etc.) and outputs the inference result. The number of layers in the CNN and the type of partial processing executed in each layer can be set arbitrarily. In the present embodiment, the number of layers and the partial processing executed in each layer can be changed according to the contents of stream data 50 described later.

図１に示すＣＮＮを構成するレイヤーにおける部分処理の種類は、一例であるが、図１に示す例では、処理部１１としては、レイヤー１の部分処理を実行する畳込処理部１１−１、レイヤー２の部分処理を実行するＲｅｌｕ処理部１１−２、レイヤー３の部分処理を実行する畳込処理部１１−３、レイヤーＮ−１の部分処理を実行する全結合処理部１１−ｎ−１、レイヤーＮの部分処理を実行するＳｏｆｔＭａｘ処理部１１−ｎ等がある。 The type of partial processing in the layers forming the CNN illustrated in FIG. 1 is an example, but in the example illustrated in FIG. 1, the processing unit 11 includes a convolution processing unit 11-1 that executes partial processing of Layer 1, The Relu processing unit 11-2 that executes the partial processing of the layer 2, the convolution processing unit 11-3 that executes the partial processing of the layer 3, and the total combination processing unit 11-n-1 that executes the partial processing of the layer N-1. , The SoftMax processing unit 11-n for executing the partial processing of the layer N.

畳込処理部１１−１，１１−３は、入力された画像データに対して、複数のフィルタデータ（処理係数の一例）のそれぞれを用いて畳込処理を行う。Ｒｅｌｕ処理部１１−２は、直前のレイヤーで生成された各画像に対して、Ｒｅｌｕ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔＲｅｃｔｉｆｉｅｒ：正規化線形関数）を適用する処理（Ｒｅｌｕ処理という）を実行する。全結合処理部１１−ｎ−１は、直前のレイヤーによる複数の結果の全てを入力として複数の重み係数（処理係数の一例）を用いて結合する処理を実行する。 The convolution processing units 11-1 and 11-3 perform convolution processing on the input image data using each of a plurality of filter data (an example of processing coefficients). The Relu processing unit 11-2 executes a process (referred to as a Relu process) of applying Relu (Rectified Linear Unit Rectifier) to each image generated in the immediately previous layer. The all-combining processing unit 11-n-1 executes a process of combining all the plurality of results from the immediately preceding layer as inputs and using a plurality of weighting coefficients (an example of processing coefficients).

デコーダ１２は、処理制御部１０から渡された圧縮データ（圧縮処理係数：圧縮フィルタデータ、圧縮重みデータ）を伸長し、伸長によって得られたデータ（処理係数）を処理部１１に渡す。 The decoder 12 expands the compressed data (compression processing coefficient: compression filter data, compression weight data) passed from the processing control unit 10, and passes the data (processing coefficient) obtained by the expansion to the processing unit 11.

処理制御部１０は、ストリームデータ５０を取得し、ストリームデータ５０に従って、ＣＮＮにおける処理部１１の構築や、ストリームデータ５０中の圧縮データをデコーダ１２に伸長させる制御等を行う。ストリームデータ５０は、例えば、後述するメインメモリ１０２（図５参照）や補助記憶装置１０６（図５参照）から取り出してもよく、通信Ｉ／Ｆ１０５及びネットワークを介して外部装置から受信してもよい。ネットワークを介して外部装置から受信するようにする場合には、ストリームデータ５０の全ての受信が完了する前から、前の方のレイヤーから処理部１１を構築して、ＣＮＮにおける推論処理を開始することができ、必要なデータの全てを受信した後に推論処理を開始する場合に比して、早期に推論処理を完了させることができる。 The processing control unit 10 acquires the stream data 50, and according to the stream data 50, constructs the processing unit 11 in the CNN, and controls the decoder 12 to expand the compressed data in the stream data 50. The stream data 50 may be extracted from, for example, a main memory 102 (see FIG. 5) or an auxiliary storage device 106 (see FIG. 5) described later, or may be received from an external device via the communication I/F 105 and a network. .. When receiving from the external device via the network, the processing unit 11 is constructed from the previous layer before the reception of all the stream data 50 is completed, and the inference processing in the CNN is started. It is possible to complete the inference process earlier than when the inference process is started after receiving all the necessary data.

ここで、処理制御部１０の動作を詳細に説明する前に、ストリームデータ５０のフォーマットについて詳細に説明する。 Before describing the operation of the processing control unit 10 in detail, the format of the stream data 50 will be described in detail.

図２は、第１実施形態に係るストリームデータのフォーマットを説明する図である。図３は、第１実施形態に係るストリームヘッダ部と、レイヤーヘッダ部のフォーマットを説明する図である。図４は、第１実施形態に係るレイヤー固有情報部のフォーマットを説明する図である。 FIG. 2 is a diagram illustrating the format of stream data according to the first embodiment. FIG. 3 is a diagram for explaining the formats of the stream header part and the layer header part according to the first embodiment. FIG. 4 is a diagram for explaining the format of the layer unique information section according to the first embodiment.

ストリームデータ５０は、ストリームヘッダ部５１と、ＣＮＮを構成する各レイヤーに対応する１以上のレイヤー部５２とを含む。各レイヤー部５２は、ＣＮＮにおけるレイヤーの実行順に従った順番で並んでいる。レイヤー部５２は、レイヤーヘッダ部６１と、レイヤー固有情報部６２と、１以上のチャンネル部６３とを含む。なお、レイヤー部５２に対応するＣＮＮのレイヤーの種別によっては、レイヤー固有情報部６２や、チャンネル部６３が存在しない場合がある。チャンネル部６３は、１以上のチャンネルデータ７１を含む。 The stream data 50 includes a stream header part 51 and one or more layer parts 52 corresponding to the respective layers forming the CNN. The layer units 52 are arranged in the order according to the execution order of the layers in the CNN. The layer section 52 includes a layer header section 61, a layer unique information section 62, and one or more channel sections 63. Depending on the type of the CNN layer corresponding to the layer section 52, the layer unique information section 62 and the channel section 63 may not exist. The channel section 63 includes one or more channel data 71.

ストリームヘッダ部５１は、図３（ａ）に示すように、識別子、バージョン、ヘッダサイズ、ストリームサイズ、レイヤー数、出力ベクトルサイズ、ＣＮＮ入力画像縦幅、ＣＮＮ入力画像横幅、ＣＮＮ入力画像フォーマット、ＣＮＮチャンネル数、フラグ等のフィールドを含む。 As shown in FIG. 3A, the stream header section 51 includes an identifier, a version, a header size, a stream size, a layer number, an output vector size, a CNN input image vertical width, a CNN input image horizontal width, a CNN input image format, and a CNN. It includes fields such as the number of channels and flags.

識別子フィールドには、ストリームデータ５０を識別する識別子が格納される。バージョンフィールドには、ストリームデータ５０が対応しているバージョンが格納される。ヘッダサイズフィールドには、ストリームヘッダ部５１のサイズ（例えば、８バイト単位）が格納される。ストリームサイズフィールドには、ストリームデータ５０のサイズ（例えば、８バイト単位）が格納される。レイヤー数フィールドには、ストリームデータ５０に対応するＣＮＮのレイヤー数が格納される。出力ベクトルサイズフィールドには、ＣＮＮの結果として出力されるベクトルのサイズが格納される。ＣＮＮ入力画像縦幅フィールドには、ＣＮＮに入力される画像データの縦幅（縦の画素数）が格納される。ＣＮＮ入力画像横幅フィールドには、ＣＮＮに入力される画像データの横幅（横の画素数）が格納される。ＣＮＮ入力画像フォーマットフィールドには、ＣＮＮに入力される画像データのフォーマット（画像フォーマット）を示す情報が格納される。画像フォーマットを示す情報としては、例えば、８ビット整数型を示す“ＦＩＸＥＤ８”や、半精度浮動小数点型（ＦＰ１６）等がある。ＣＮＮ入力チャンネル数フィールドには、ＣＮＮにおける入力チャンネル数（例えば、１つの処理対象として入力される画像データ数等）が格納される。フラグフィールドには、各種情報が含まれる。 An identifier for identifying the stream data 50 is stored in the identifier field. The version field stores the version to which the stream data 50 corresponds. The size of the stream header section 51 (for example, in units of 8 bytes) is stored in the header size field. The size of the stream data 50 (for example, in units of 8 bytes) is stored in the stream size field. The number of layers of the CNN corresponding to the stream data 50 is stored in the number of layers field. The output vector size field stores the size of the vector output as a result of CNN. The CNN input image vertical width field stores the vertical width (the number of vertical pixels) of the image data input to the CNN. The CNN input image width field stores the width (number of horizontal pixels) of the image data input to the CNN. The CNN input image format field stores information indicating the format (image format) of image data input to the CNN. The information indicating the image format includes, for example, "FIXED8" indicating an 8-bit integer type, half-precision floating point type (FP16), and the like. The number of input channels in the CNN (for example, the number of image data input as one processing target) is stored in the CNN input channel number field. Various information is contained in the flag field.

レイヤーヘッダ部６１は、図３（ｂ）に示すように、識別子、ヘッダサイズ、レイヤーサイズ、レイヤー種別等のフィールドを含む。識別子フィールドには、レイヤーを識別する識別子が格納される。ヘッダサイズフィールドには、レイヤーヘッダ部６１のサイズ（例えば、８バイト単位）が格納される。レイヤーサイズフィールドには、レイヤー部５２のサイズ（例えば、８バイト単位）が格納される。レイヤー種別フィールドには、ＣＮＮを構成するレイヤーの種別を示す情報（レイヤ種別情報）が格納される。レイヤー種別情報としては、例えば、Ｃｏｎｖｏｌｕｔｉｏｎ（畳込）処理を行うＣｏｎｖｏｌｕｔｉｏｎレイヤーを示す“１”、Ｐｏｏｌｉｎｇ（プーリング）処理を行うｐｏｏｌｉｎｇレイヤーを示す“２”、Ｒｅｌｕ処理を行うＲｅｌｕレイヤーを示す“３”、ＦｕｌｌＣｏｎｎｅｃｔｉｏｎ（全結合）処理を行うＦｕｌｌＣｏｎｎｅｃｔｉｏｎレイヤーを示す“４”、ＳｏｆｔＭａｘ処理を行うＳｏｆｔＭａｘレイヤーを示す“５”がある。 As shown in FIG. 3B, the layer header section 61 includes fields such as an identifier, a header size, a layer size, and a layer type. An identifier for identifying a layer is stored in the identifier field. The header size field stores the size of the layer header portion 61 (for example, in units of 8 bytes). The layer size field stores the size of the layer unit 52 (for example, in units of 8 bytes). In the layer type field, information (layer type information) indicating the type of layers forming the CNN is stored. As the layer type information, for example, “1” indicating a Convolution layer for performing a Convolution process, “2” indicating a Pooling layer for performing a Pooling process, and “3” indicating a Relu layer for performing a Relu process. , "4" indicating a FullConnection layer for performing a FullConnection process, and "5" indicating a SoftMax layer for performing a SoftMax process.

レイヤー固有情報部６２は、例えば、対応するレイヤーヘッド部６１のレイヤー種別フィールドのレイヤー種別が、Ｃｏｎｖｏｌｕｔｉｏｎレイヤー、Ｐｏｏｌｉｎｇレイヤー、ＦｕｌｌＣｏｎｎｅｃｔｉｏｎレイヤーである場合に存在し、レイヤー種別が、Ｒｅｌｕレイヤー、ＳｏｆｔＭａｘレイヤーである場合には、存在しない。 The layer unique information part 62 exists, for example, when the layer type of the layer type field of the corresponding layer head part 61 is a Convolution layer, a Pooling layer, and a FullConnection layer, and the layer types are Relu layer and SoftMax layer. In some cases, it doesn't exist.

レイヤー種別が、Ｃｏｎｖｏｌｕｔｉｏｎレイヤーである場合のレイヤー固有情報部６２は、図４（ａ）に示すように、入力チャンネル数、出力チャンネル数、ストライド、カーネルサイズ、パディング、重み圧縮方式のフィールドを含む。 When the layer type is the Convolution layer, the layer unique information section 62 includes fields for the number of input channels, the number of output channels, stride, kernel size, padding, and weight compression method, as shown in FIG. 4A.

入力チャンネル数フィールドには、このレイヤーに入力されるチャンネルの数が格納される。出力チャンネル数フィールドには、このレイヤーから出力されるチャンネルの数が格納される。ストライドフィールドには、畳込処理において畳み込みフィルタを移動させる量（ストライド）が格納される。カーネルサイズフィールドには、畳込処理において使用する畳み込みフィルタのカーネルサイズが格納される。パディングフィールドには、入力されるデータの回りに追加するパディングの量が格納される。重み圧縮方式フィールドには、このレイヤーに使用されるフィルタのデータ（フィルタデータ）に対する圧縮方法を示す情報（圧縮情報）が格納される。本実施形態では、圧縮情報として、無圧縮の場合には“０”とし、Ｐｒｕｎｉｎｇ符号化の場合には、“１”とし、ＷｅｉｇｈｔＳｈａｒｉｎｇ符号化の場合には、“２”とし、予測符号化の場合には、“３”としている。 The number of channels input to this layer is stored in the input channel number field. The number of output channels field stores the number of channels output from this layer. The stride field stores the amount (stride) of moving the convolution filter in the convolution process. The kernel size field stores the kernel size of the convolution filter used in the convolution process. The padding field stores the amount of padding added around the input data. The weight compression method field stores information (compression information) indicating the compression method for the filter data (filter data) used for this layer. In the present embodiment, the compression information is “0” in the case of no compression, “1” in the case of Pruning encoding, and “2” in the case of Weight Sharing encoding, and the prediction encoding. In the case of, it is set to "3".

レイヤー種別が、Ｐｏｏｌｉｎｇレイヤーである場合のレイヤー固有情報部６２は、図４（ｂ）に示すように、入力チャンネル数、出力チャンネル数、ストライド、カーネルサイズのフィールドを含む。 When the layer type is the Pooling layer, the layer unique information section 62 includes fields for the number of input channels, the number of output channels, stride, and kernel size, as shown in FIG. 4B.

入力チャンネル数フィールドには、このレイヤーに入力されるチャンネルの数が格納される。出力チャンネル数フィールドには、このレイヤーから出力されるチャンネルの数が格納される。ストライドフィールドには、プーリング処理において処理範囲を移動させる量（ストライド）が格納される。カーネルサイズフィールドには、プーリング処理を行うカーネルサイズが格納される。 The number of channels input to this layer is stored in the input channel number field. The number of output channels field stores the number of channels output from this layer. The stride field stores an amount (stride) for moving the processing range in the pooling processing. The kernel size field stores the kernel size for pooling processing.

レイヤー種別が、全結合レイヤーである場合のレイヤー固有情報部６２は、図４（ｃ）に示すように、入力チャンネル数、出力チャンネル数、重み圧縮方式のフィールドを含む。 When the layer type is the total combined layer, the layer unique information section 62 includes fields for the number of input channels, the number of output channels, and the weight compression method, as shown in FIG.

入力チャンネル数フィールドには、このレイヤーに入力されるチャンネルの数が格納される。出力チャンネル数フィールドには、このレイヤーから出力されるチャンネルの数が格納される。重み圧縮方式フィールドには、この全結合レイヤーに使用される、各入力チャンネルのデータに対する重みに対する圧縮方法を示す情報（圧縮情報）が格納される。本実施形態では、圧縮情報として、無圧縮の場合には“０”とし、Ｐｒｕｎｉｎｇ符号化の場合には、“１”とし、ＷｅｉｇｈｔＳｈａｒｉｎｇ符号化の場合には、“２”とし、予測符号化の場合には、“３”としている。 The number of channels input to this layer is stored in the input channel number field. The number of output channels field stores the number of channels output from this layer. The weight compression method field stores information (compression information) indicating the compression method for the weight of the data of each input channel, which is used for all the combined layers. In the present embodiment, the compression information is “0” in the case of no compression, “1” in the case of Pruning encoding, and “2” in the case of Weight Sharing encoding, and the prediction encoding. In the case of, it is set to "3".

チャンネル部６３は、例えば、対応するレイヤーヘッド部６１のレイヤー種別フィールドのレイヤー種別が、Ｃｏｎｖｏｌｕｔｉｏｎレイヤー、ＦｕｌｌＣｏｎｎｅｃｔｉｏｎレイヤーである場合に存在する。 The channel part 63 exists when the layer type of the layer type field of the corresponding layer head part 61 is, for example, a Convolution layer or a Full Connection layer.

チャンネル部６３のチャンネルデータ７１には、チャンネル部６３に対応するレイヤー固有情報部６２の重み圧縮方式フィールドに格納された圧縮情報に対応する圧縮方法により圧縮されたデータ（処理係数情報、圧縮処理係数：例えば、畳込レイヤーの場合には、フィルタデータ（処理係数）の圧縮データ、全結合レイヤーの場合には、各入力チャンネルに対する重み（処理係数）の圧縮データ）が格納される。フィルタデータは、例えば、予測符号化によって圧縮されていてもよい。フィルタデータの予測符号化は、例えば、基準とするフィルタデータ（例えば、直前のフィルタデータ）と、所定の位置（例えば、上端又は左端）の画素を基準とした予測方向に基づいて予測した予測フィルタデータとの差分値を所定の符号化（例えば、Ｐｒｕｎｉｎｇ符号化及びＷｅｉｇｈｔＳｈａｒｉｎｇ符号化等）により符号化することにより、データを圧縮させる方法である。なお、このようにデータが予測符号化により圧縮されている場合には、各チャンネルデータ７１の先頭には、そのチャンネルデータ７１を予測符号化する際に使用した予測方向を示す情報（予測方向情報）が格納される。 In the channel data 71 of the channel part 63, data compressed by the compression method corresponding to the compression information stored in the weight compression method field of the layer specific information part 62 corresponding to the channel part 63 (processing coefficient information, compression processing coefficient). : For example, in the case of the convolutional layer, the compressed data of the filter data (processing coefficient), and in the case of the fully connected layer, the compression data of the weight (processing coefficient) for each input channel is stored. The filter data may be compressed by predictive coding, for example. The predictive encoding of the filter data is performed by, for example, a predictive filter that is predicted based on a reference filter data (for example, immediately preceding filter data) and a prediction direction based on a pixel at a predetermined position (for example, an upper end or a left end). This is a method of compressing data by encoding a difference value with data by predetermined encoding (for example, Pruning encoding and Weight Sharing encoding). When the data is compressed by predictive coding as described above, information indicating the prediction direction used when predictively coding the channel data 71 is provided at the beginning of each channel data 71 (prediction direction information). ) Is stored.

図１に戻り、処理制御部１０の処理動作について説明する。 Returning to FIG. 1, the processing operation of the processing control unit 10 will be described.

処理制御部１０は、ストリームデータ５０を先頭から順に読み込み、レイヤー部５２を読んだ場合には、レイヤー部５２のレイヤーヘッダ部６１のレイヤー種別フィールドに格納されているレイヤー種別に対応する処理部１１を構築する。例えば、レイヤー種別が、Ｃｏｎｖｏｌｕｔｉｏｎに対応する場合には、畳込処理部を構築し、Ｐｏｏｌｉｎｇに対応する場合には、Ｐｏｏｌｉｎｇ処理部を構築し、Ｒｅｌｕに対応する場合には、Ｒｅｌｕ処理部を構築し、全結合に対応する場合には、全結合処理部を構築し、ＳｏｆｔＭａｘに対応する場合には、ＳｏｆｔＭａｘ処理部を構築する。また、処理制御部１０は、レイヤーヘッダ部６１の次に、レイヤー固有情報部６２が存在すれば、そのレイヤー固有情報部６２の各フィールドの内容に従って、構築する処理部１１の設定を行う。 The processing control unit 10 reads the stream data 50 in order from the beginning, and when reading the layer unit 52, the processing unit 11 corresponding to the layer type stored in the layer type field of the layer header unit 61 of the layer unit 52. To build. For example, when the layer type corresponds to Convolution, a convolution processing unit is constructed, when it corresponds to Pooling, a Pooling processing unit is constructed, and when it corresponds to Relu, a Relu processing unit is constructed. Then, in the case of being compatible with all combinations, the all-combining processing unit is constructed, and in the case of being compatible with SoftMax, the SoftMax processing unit is constructed. Further, if the layer unique information section 62 exists next to the layer header section 61, the processing control section 10 sets the processing section 11 to be constructed according to the contents of each field of the layer unique information section 62.

また、処理制御部１０は、レイヤー部５２のレイヤー固有情報部６２に重み圧縮方式フィールドがある場合（すなわち、レイヤーが畳込レイヤー、又は全結合レイヤーである場合）には、重み圧縮方式フィールドの圧縮情報に従って、チャンネル部６３のチャンネルデータ７１に対する伸長処理を実行させ、そのレイヤーの処理を実行する処理部１１に対して伸長処理後のデータを渡すように動作させるようにデコーダ１２を生成（又は設定）して、制御する。これにより、処理制御部１０は、処理部１１に対して必要なデータを渡すことができる。ここで、チャンネルデータ７１の先頭に、予測符号化の予測方向情報が含まれている場合には、デコーダ１２は、その予測方向情報に基づいて、予測符号化されているデータを伸長する。 Further, when the layer unique information section 62 of the layer section 52 has a weight compression method field (that is, when the layer is a convolutional layer or a fully combined layer), the processing control section 10 changes the weight compression method field. In accordance with the compression information, the decoder 12 is generated (or the decoder 12 is operated so as to execute the decompression process for the channel data 71 of the channel unit 63 and pass the data after the decompression process to the processing unit 11 that executes the process of the layer. Control). Thereby, the processing control unit 10 can pass necessary data to the processing unit 11. Here, when the prediction direction information of the predictive coding is included at the beginning of the channel data 71, the decoder 12 expands the predictively coded data based on the predictive direction information.

次に、処理装置１００のハードウェア構成について詳細に説明する。 Next, the hardware configuration of the processing device 100 will be described in detail.

図５は、第１実施形態に係る処理装置のハードウェア構成図である。 FIG. 5 is a hardware configuration diagram of the processing device according to the first embodiment.

処理装置１００は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎＵｎｉｔ）１０１と、メインメモリ１０２と、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０３と、リーダライタ１０４と、通信インターフェース（通信Ｉ／Ｆ）１０５と、補助記憶装置１０６と、入出力インターフェース（入出力Ｉ／Ｆ）１０７と、表示装置１０８と、入力装置１０９とを備えるコンピュータにより構成される。ＣＰＵ１０１、メインメモリ１０２、ＧＰＵ１０３、リーダライタ１０４、通信Ｉ／Ｆ１０５、補助記憶装置１０６、入出力Ｉ／Ｆ１０７、及び表示装置１０８は、バス１１０を介して接続されている。 The processing device 100 includes, for example, a CPU (Central Processing Unit) 101, a main memory 102, a GPU (Graphics Processing Unit) 103, a reader/writer 104, a communication interface (communication I/F) 105, and an auxiliary storage device 106. And a input/output interface (input/output I/F) 107, a display device 108, and an input device 109. The CPU 101, main memory 102, GPU 103, reader/writer 104, communication I/F 105, auxiliary storage device 106, input/output I/F 107, and display device 108 are connected via a bus 110.

ＣＰＵ１０１は、処理装置１００の全体を統括制御する。ＣＰＵ１０１は、補助記憶装置１０６に格納されてプログラムをメインメモリ１０２に読み出して実行することにより各種処理を実行する。本実施形態では、ＣＰＵ１０１は、補助記憶装置１０６に格納された処理プログラムを実行することにより、例えば、処理制御部１０、Ｒｅｌｕ処理部１１−２、ＳｏｆｔＭａｘ処理部１１ｎ、及びデコーダ１２を構成する。 The CPU 101 centrally controls the entire processing device 100. The CPU 101 executes various processes by reading the program stored in the auxiliary storage device 106 into the main memory 102 and executing the program. In the present embodiment, the CPU 101 configures, for example, the processing control unit 10, the Relu processing unit 11-2, the SoftMax processing unit 11n, and the decoder 12 by executing the processing program stored in the auxiliary storage device 106.

メインメモリ１０２は、例えば、ＲＡＭ、ＲＯＭ等であり、ＣＰＵ１０１に実行されるプログラム（処理プログラム等）や、各種情報を記憶する。補助記憶装置１０６は、例えば、ＨＤＤ（ＨａｒｄＤＩＳＫＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の非一時的記憶デバイス（不揮発性記憶デバイス）であり、ＣＰＵ１０１で実行されるプログラムや、各種情報を記憶する。各種情報としては、例えば、図２に示すようなストリームデータや、例えば、画像データ等の処理対象データがある。処理対象データが画像データである場合には、例えば、処理装置１００の図示しないカメラにより撮像された画像データであってもよい。 The main memory 102 is, for example, a RAM, a ROM, or the like, and stores programs (processing programs and the like) executed by the CPU 101 and various types of information. The auxiliary storage device 106 is, for example, a non-temporary storage device (nonvolatile storage device) such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive), and stores programs executed by the CPU 101 and various information. .. The various information includes, for example, stream data as shown in FIG. 2 and processing target data such as image data. When the processing target data is image data, it may be image data captured by a camera (not shown) of the processing device 100, for example.

ＧＰＵ１０３は、例えば、画像処理等の特定の処理の実行に適しているプロセッサであり、例えば、並列的に行われる処理の実行に適している。本実施形態では、ＧＰＵ１０３は、ＣＰＵ１０１の指示に従って所定の処理を実行する。本実施形態では、ＧＰＵ１０３は、例えば、畳込処理部１１-１、全結合処理部１１−ｎ−１等を構成する。 The GPU 103 is, for example, a processor suitable for executing a specific process such as image processing, and is suitable for executing processes performed in parallel, for example. In this embodiment, the GPU 103 executes a predetermined process according to an instruction from the CPU 101. In the present embodiment, the GPU 103 constitutes, for example, the convolution processing unit 11-1, the total combination processing unit 11-n-1 and the like.

リーダライタ１０４は、記録媒体１１１を着脱可能であり、記録媒体１１１からのデータの読み出し、及び記録媒体１１１へのデータの書き込みを行う。記録媒体１１１としては、例えば、ＳＤメモリーカード、ＦＤ（フロッピーディスク：登録商標）、ＣＤ、ＤＶＤ、ＢＤ（登録商標）、フラッシュメモリ等の非一時的記録媒体（不揮発性記録媒体）がある。本実施形態においては、記録媒体１１１に、処理プログラムを格納しておき、リードライタ１０４により、これを読み出して、利用するようにしてもよい。また、記録媒体１１１に、ストリームデータを格納しておき、リードライタ１０４により、これを読み出して利用するようにしてもよい。また、記録媒体１１１に、処理対象データを格納しておき、リードライタ１０４により、これを読み出して利用するようにしてもよい。 The reader/writer 104 can attach and detach the recording medium 111, and reads data from the recording medium 111 and writes data to the recording medium 111. Examples of the recording medium 111 include a non-transitory recording medium (nonvolatile recording medium) such as an SD memory card, an FD (floppy disk: registered trademark), a CD, a DVD, a BD (registered trademark), and a flash memory. In the present embodiment, the processing program may be stored in the recording medium 111 and read by the reader/writer 104 to be used. Alternatively, the stream data may be stored in the recording medium 111, and the read/write unit 104 may read the stream data for use. Alternatively, the data to be processed may be stored in the recording medium 111, and the reader/writer 104 may read and use the data.

通信Ｉ／Ｆ１０５は、ネットワーク（通信ネットワーク）に接続されており、ネットワークに接続された他の装置（外部装置）との間でのデータの送受信を行う。例えば、ストリームデータ、処理対象データ等を通信Ｉ／Ｆ１０５を介してネットワークに接続された他の装置から受信するようにしてもよい。 The communication I/F 105 is connected to a network (communication network) and transmits/receives data to/from another device (external device) connected to the network. For example, the stream data, the processing target data, and the like may be received from another device connected to the network via the communication I/F 105.

入出力Ｉ／Ｆ１０７は、例えば、マウス、キーボード等の入力装置１０９と接続されており、入力装置１０９からのユーザによる操作入力を受け付ける。 The input/output I/F 107 is connected to an input device 109 such as a mouse and a keyboard, and receives an operation input by the user from the input device 109.

表示装置１０８は、例えば、液晶ディスプレイ等のディスプレイ装置であり、各種情報を表示出力する。 The display device 108 is, for example, a display device such as a liquid crystal display, and displays and outputs various information.

次に、図１に示す処理装置１００における処理動作について説明する。 Next, the processing operation of the processing apparatus 100 shown in FIG. 1 will be described.

処理装置１００の処理制御部１０は、推論処理を実行するために利用する所定のストリームデータ５０の取得を開始する。処理制御部１０は、取得したストリームデータ５０のＣＮＮのレイヤー１の畳込処理部１１−１に対応するレイヤー部５２を受信すると、畳込処理部１１−１を生成し、デコーダ１２-１を生成する。次いで、処理制御部１０は、このレイヤー部５２の各チャンネルデータ７１に含まれる圧縮フィルタデータをデコーダ１２−１に逐次供給する。処理制御部１０が圧縮フィルタデータをデコーダ１２−１に供給するタイミングとしては、例えば、畳込処理部１１−１からのフィルタデータの要求に応じたタイミングとしてもよい。 The processing control unit 10 of the processing device 100 starts acquisition of predetermined stream data 50 used to execute the inference processing. When the processing control unit 10 receives the layer unit 52 corresponding to the convolution processing unit 11-1 of the layer 1 of the CNN of the acquired stream data 50, the processing control unit 10 generates the convolution processing unit 11-1 and outputs the decoder 12-1. To generate. Next, the processing control unit 10 sequentially supplies the compression filter data included in each channel data 71 of the layer unit 52 to the decoder 12-1. The timing at which the processing control unit 10 supplies the compressed filter data to the decoder 12-1 may be, for example, the timing according to the request for the filter data from the convolution processing unit 11-1.

一方、処理制御部１０は、処理対象データである画像データをレイヤー１の畳込処理部１１−１に入力する。畳込処理部１１−１は、入力された画像データに対して、デコーダ１２−１から供給される各フィルタデータを用いて畳込処理を実行し、各フィルタデータを用いた全ての実行結果を後段のレイヤー２の処理部１１（図１の例では、Ｒｅｌｕ処理部１１−２）に渡すように出力する。ここで、レイヤー２の処理部１１に対して実行結果を直接渡すことが可能である場合には、直接渡してもよく、また、直接渡すことが不可能な場合には、レイヤー２の処理部１１が読み込み可能な所定のバッファに一時的に格納してもよい。 On the other hand, the processing control unit 10 inputs the image data, which is the processing target data, to the convolution processing unit 11-1 of Layer 1. The convolution processing unit 11-1 performs convolution processing on the input image data using each filter data supplied from the decoder 12-1, and obtains all execution results using each filter data. The data is output so as to be passed to the processing unit 11 (Relu processing unit 11-2 in the example of FIG. 1) of the layer 2 in the subsequent stage. Here, when the execution result can be directly passed to the processing unit 11 of the layer 2, it may be directly passed, and when it cannot be directly passed, the processing unit of the layer 2 may be passed. 11 may be temporarily stored in a predetermined readable buffer.

また、処理制御部１０は、ストリームデータ５０のＣＮＮのレイヤー２のＲｅｌｕ処理部１１−２に対応するレイヤー部５２を受信すると、ＣＮＮのレイヤー２のＲｅｌｕ処理部１１−２を生成する。Ｒｅｌｕ処理部１１−２は、畳込処理部１１−１による各処理結果の画像のそれぞれを入力として、Ｒｅｌｕ処理を実行し、後段のレイヤー３に渡すように出力する。 When the processing control unit 10 receives the layer unit 52 corresponding to the Relu processing unit 11-2 of the CNN layer 2 of the stream data 50, the processing control unit 10 generates the Relu processing unit 11-2 of the CNN layer 2. The Relu processing unit 11-2 receives each of the images of the respective processing results by the convolution processing unit 11-1 as input, executes Relu processing, and outputs so as to pass it to the layer 3 in the subsequent stage.

同様にして、処理制御部１０は、ストリームデータ５０の次のレイヤーに対応するレイヤー部５２を受信すると、レイヤー毎に処理部１１（畳込処理部１１−３、全結合処理部１１−ｎ−１、ＳｏｆｔＭａｘ処理部１１ｎ等）を生成し、必要な場合には、その処理部１１に対してデータを伸長して供給するデコーダ１２（１２−３，１２−ｎ−１等）を生成する。このような一連の処理が実行されることにより、ＣＮＮの最終のレイヤーからは推論処理の結果（推論結果）が出力される。なお、推論結果は、メインメモリ１０２等に出力するようにしてもよい。処理装置１００は、この推論結果を表示装置１０８に表示したり、この推論結果に基づいて処理を実行したりしてもよい。 Similarly, when the processing control unit 10 receives the layer unit 52 corresponding to the next layer of the stream data 50, the processing unit 11 (convolution processing unit 11-3, total combination processing unit 11-n-) for each layer. 1, SoftMax processing unit 11n, etc.) and, when necessary, a decoder 12 (12-3, 12-n-1 etc.) for expanding and supplying data to the processing unit 11 is generated. By executing such a series of processes, the result of the inference process (inference result) is output from the final layer of the CNN. The inference result may be output to the main memory 102 or the like. The processing device 100 may display the inference result on the display device 108 or execute the process based on the inference result.

上記した第１実施形態に係る処理装置１００によると、ストリームデータ５０に従って、ＣＮＮの各レイヤーを適切に構築でき、レイヤーの処理を実行する部分処理部で必要なデータをストリームデータ５０から取得して供給することができる。このため、メインメモリ１０２に予めＣＮＮの全てのレイヤーの処理部で使用する全ての係数を記憶させておいたり、ＣＮＮの全体のレイヤー構成の情報についてメインメモリ１０２に格納させておいたりする必要がなくなる。これにより、メインメモリ１０２に必要な容量を低減することができる。また、ストリームデータ５０に従って、ＣＮＮの各レイヤーを適切に構築できるので、ストリームデータ５０の内容を変えることにより、所望する推論処理を実行するＣＮＮを容易に構築することができる。また、ストリームデータ５０中のチャネルデータについては、圧縮データとすることができるので、ストリームデータ５０のデータ量を低減することができ、ストリームデータ５０の読出しや、送信等に係る処理時間を低減することができる。 According to the processing device 100 according to the first embodiment described above, each layer of the CNN can be appropriately constructed according to the stream data 50, and the necessary data can be acquired from the stream data 50 by the partial processing unit that executes the processing of the layer. Can be supplied. Therefore, it is necessary to previously store all the coefficients used in the processing units of all the layers of the CNN in the main memory 102, or store the information of the entire CNN layer configuration in the main memory 102. Disappear. As a result, the capacity required for the main memory 102 can be reduced. Further, since each layer of the CNN can be appropriately constructed according to the stream data 50, by changing the content of the stream data 50, it is possible to easily construct the CNN that executes a desired inference process. Further, since the channel data in the stream data 50 can be compressed data, the data amount of the stream data 50 can be reduced, and the processing time for reading the stream data 50, transmission, etc. can be reduced. be able to.

次に、第２実施形態に係る処理装置について説明する。なお、第２実施形態においては、第１実施形態に係る処理装置と同様な部分には、同一の符号を付す。 Next, a processing device according to the second embodiment will be described. In the second embodiment, the same parts as those in the processing device according to the first embodiment are designated by the same reference numerals.

図６は、第２実施形態に係る処理装置の機能構成図である。 FIG. 6 is a functional configuration diagram of the processing device according to the second embodiment.

第２実施形態に係る処理装置１００Ａは、１以上の処理部（部分処理部）１１（１１−１〜１１−ｎ）と、１以上のデコーダ１３（１３−１，１３−３，１３−ｎ−１等）と、ＦＩＦＯ（ＦｉｒｓｔＩＮ，ＦｉｒｓｔＯｕｔ）１４と、構築部１５と、メモリ部１６とを備える。本実施形態におけるストリームデータは、第１実施形態におけるストリームデータ５０の内のＣＮＮの構成に関するデータを除いたデータ、例えば、処理部１１に設定するデータ（処理係数情報）が含まれたチャンネル部６３のみを含むデータとなっている。ここで、処理装置１００Ａは、図５に示すハードウェアで構成してもよく、例えば、構築部１５、Ｒｅｌｕ処理部１１−２、全結合処理部１１−ｎ−１、ＳｏｆｔＭａｘ処理部１１ｎ、及びデコーダ１３は、ＣＰＵ１０１によって構成してもよい。また、ＦＩＦＯ１４は、ＣＰＵ１０１及びメインメモリ１０２で構成してもよい。また、メモリ部１６は、メインメモリ１０２で構成してもよい。また、畳込処理部１１−１，１１−３等は、ＧＰＵ１０３で構成してもよい。 The processing device 100A according to the second embodiment includes one or more processing units (partial processing units) 11 (11-1 to 11-n) and one or more decoders 13 (13-1, 13-3, 13-n). −1 etc.), a FIFO (First IN, First Out) 14, a construction unit 15, and a memory unit 16. The stream data according to the present embodiment is data excluding data regarding the CNN configuration in the stream data 50 according to the first embodiment, for example, the channel unit 63 including data (processing coefficient information) set in the processing unit 11. It is data including only. Here, the processing device 100A may be configured by the hardware illustrated in FIG. 5, and for example, the construction unit 15, the Relu processing unit 11-2, the total combination processing unit 11-n-1, the SoftMax processing unit 11n, and the The decoder 13 may be configured by the CPU 101. Further, the FIFO 14 may be composed of the CPU 101 and the main memory 102. The memory unit 16 may be composed of the main memory 102. The convolution processing units 11-1, 11-3 and the like may be configured by the GPU 103.

メモリ部１６は、ＣＮＮの各レイヤーの構造を示す情報（ニューラルネットワーク構造情報）１６ａを記憶する。ニューラルネットワーク構造情報１６ａは、ＣＮＮにおいて、各レイヤーがどのような処理部、デコーダ、ＦＩＦＯ等で構成され、どのような接続関係にあるかを特定可能な情報である。 The memory unit 16 stores information (neural network structure information) 16a indicating the structure of each layer of CNN. The neural network structure information 16a is information that can identify what processing units, decoders, FIFOs, etc., each layer has in the CNN, and what kind of connection they have.

構築部１５は、メモリ部１６のニューラルネットワーク構造情報１６ａに基づいて、ＣＮＮを構成する各処理部１１、各デコーダ１３、及び各ＦＩＦＯ１４を作成し、それらの間での接続関係を設定する。 The construction unit 15 creates each processing unit 11, each decoder 13, and each FIFO 14 that configure the CNN based on the neural network structure information 16a of the memory unit 16 and sets the connection relationship among them.

デコーダ１３は、ストリームデータ中のチャンネル部６３のデータを使用する各処理部１１に対応して配置され、上流側からストリームデータを取得し、その中から自身に対応する処理部１１で使用するデータを含むチャンネル部６３のデータを抽出し、そのデータを伸長して対応する処理部１１に供給する。また、デコーダ１３は、ストリームデータ中の自身に対応する処理部１１のデータを格納しているチャンネル部６３以降のデータについては、下流側に接続されたＦＩＦＯ１４にそのまま渡す。 The decoder 13 is arranged corresponding to each processing unit 11 that uses the data of the channel unit 63 in the stream data, acquires the stream data from the upstream side, and uses the data used by the processing unit 11 corresponding to itself. The data of the channel section 63 including the data is extracted, and the data is decompressed and supplied to the corresponding processing section 11. Further, the decoder 13 passes the data of the channel unit 63 and the subsequent data storing the data of the processing unit 11 corresponding to itself in the stream data to the FIFO 14 connected to the downstream side as it is.

ＦＩＦＯ１４は、上流側のデコーダ１３から入力されるストリームデータのデータを格納し、最初に入力されたデータから下流側のデコーダ１３に渡す。 The FIFO 14 stores the data of the stream data input from the decoder 13 on the upstream side, and transfers the data input first to the decoder 13 on the downstream side.

上記した第２実施形態に係る処理装置１００Ａによると、処理部１１で使用する処理係数情報をストリームデータとして送信することにより、各処理部１１における部分処理を実行させることができる。このため、メインメモリ１０２にＣＮＮの全てのレイヤーの処理部１１で使用する全ての処理係数を予め記憶させておく必要がなくなる。これにより、必要なメインメモリ１０２の容量を抑制できる。また、ストリームデータ中のチャネル部６３に格納させる処理係数情報について、圧縮させたデータとしているので、ストリームデータのデータ量を低減することができ、ストリームデータの読出しや、送信等に係る処理時間を低減することができる。 According to the processing device 100A according to the second embodiment described above, by transmitting the processing coefficient information used by the processing unit 11 as stream data, the partial processing in each processing unit 11 can be executed. Therefore, it is not necessary to store in the main memory 102 all the processing coefficients used by the processing units 11 of all the layers of the CNN in advance. Thereby, the required capacity of the main memory 102 can be suppressed. In addition, since the processing coefficient information stored in the channel unit 63 in the stream data is compressed data, the data amount of the stream data can be reduced, and the processing time for reading and transmitting the stream data can be reduced. It can be reduced.

次に、第３実施形態に係る処理装置について説明する。なお、第３実施形態においては、第１実施形態、第２実施形態に係る処理装置と同様な部分には、同一の符号を付す。 Next, a processing device according to the third embodiment will be described. In the third embodiment, the same parts as those in the processing devices according to the first and second embodiments are designated by the same reference numerals.

図７は、第３実施形態に係る処理装置の機能構成図である。 FIG. 7 is a functional configuration diagram of the processing device according to the third embodiment.

第３実施形態に係る処理装置１００Ｂは、１以上の処理部（部分処理部）１１（１１−１〜１１−ｎ）と、１以上のデコーダ（伸長処理部）１２（１２−１，１２−３，１２−ｎ−１等）と、読出制御部１７と、メモリ部１６とを備える。ここで、処理装置１００Ｂは、図５に示すハードウェアで構成してもよく、例えば、読出制御部１７、Ｒｅｌｕ処理部１１−２、全結合処理部１１−ｎ−１、ＳｏｆｔＭａｘ処理部１１ｎ、及びデコーダ１２は、ＣＰＵ１０１によって構成してもよい。また、畳込処理部１１−１，１１−３等は、ＧＰＵ１０３によって構成してもよい。また、メモリ部１６は、メインメモリ１０２で構成してもよい。 The processing device 100B according to the third embodiment includes one or more processing units (partial processing units) 11 (11-1 to 11-n) and one or more decoders (decompression processing units) 12 (12-1, 12-). 3, 12-n-1), the read control unit 17, and the memory unit 16. Here, the processing device 100B may be configured by the hardware shown in FIG. 5, and for example, the read control unit 17, the Relu processing unit 11-2, the all-combining processing unit 11-n-1, the SoftMax processing unit 11n, The decoder 12 and the decoder 12 may be configured by the CPU 101. The convolution processing units 11-1, 11-3 and the like may be configured by the GPU 103. The memory unit 16 may be composed of the main memory 102.

メモリ部１６は、ニューラルネットワーク構造情報１６ａと、圧縮係数情報の一例としての圧縮フィルタデータ１６ｂと、圧縮係数情報の一例としての圧縮重みデータ１６ｃとを記憶する。圧縮フィルタデータ１６ｂは、ＣＮＮのすべての畳込処理部１１−１、１１−３等で使用されるフィルタデータを圧縮したデータであり、圧縮重みデータ１６ｃは、全結合処理部１１−ｎ−１で使用される複数の重みを圧縮したデータである。 The memory unit 16 stores neural network structure information 16a, compression filter data 16b as an example of compression coefficient information, and compression weight data 16c as an example of compression coefficient information. The compression filter data 16b is data obtained by compressing the filter data used by all the convolution processing units 11-1 and 11-3 of the CNN, and the compression weight data 16c is the total combination processing unit 11-n-1. This is data obtained by compressing a plurality of weights used in.

読出制御部１７は、ＣＮＮの処理部１１からの処理の実行に必要な処理係数の要求を受けて、この処理部１１で使用する要求を受けた処理係数を圧縮させてある圧縮係数データ（圧縮フィルタデータ１６ｂ又は圧縮重みデータ１６ｃの一部）をメモリ部１６から取得し、処理部１１に接続されているデコーダ１２に対して圧縮係数データを送信する。なお、処理部１１による処理の実行に必要な処理係数の要求は、例えば、処理部１１が処理を実行する直前に行われるようにしてもよく、また、比較的近いレイヤーの処理が実行されている場合に行われるようにしてもよい。 The read control unit 17 receives a request from the CNN processing unit 11 for a processing coefficient required to execute a process, and compresses the requested processing coefficient to be used by the processing unit 11 (compressed coefficient data (compression). The filter data 16b or a part of the compression weight data 16c) is acquired from the memory unit 16 and the compression coefficient data is transmitted to the decoder 12 connected to the processing unit 11. The processing unit 11 may request the processing coefficient necessary for executing the processing, for example, immediately before the processing unit 11 executes the processing, or the processing of a relatively close layer is executed. It may be performed when there is.

上記した第３実施形態に係る処理装置１００Ｂによると、メインメモリ１０２において、処理部１１で使用する係数データを圧縮して記憶させているので、メインメモリ１０２に必要とされる容量を抑制できる。また、比較的処理が近いレイヤーの処理部１１が処理係数を要求するようにしているので、処理係数を要求する処理部１１の数を抑えることができ、ＣＰＵ１０１への負荷や、メインメモリ１０２の使用容量を比較的低減できる。 According to the processing device 100B of the third embodiment described above, since the coefficient data used by the processing unit 11 is compressed and stored in the main memory 102, it is possible to suppress the capacity required in the main memory 102. Further, since the processing units 11 of the layers that are relatively close to each other request the processing coefficients, the number of the processing units 11 requesting the processing coefficients can be suppressed, and the load on the CPU 101 and the main memory 102. The used capacity can be relatively reduced.

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で、適宜変形して実施することが可能である。 It should be noted that the present invention is not limited to the above-described embodiment, and can be appropriately modified and carried out without departing from the spirit of the present invention.

例えば、上記各実施形態においては、画像認識を行うＣＮＮが構成される処理装置に対して本発明を適用した例を示していたが、本発明は、画像認識以外の推論処理を実行するニューラルネットワークが構成される処理装置に対しても適用することができる。 For example, in each of the above-described embodiments, an example in which the present invention is applied to a processing device including a CNN that performs image recognition has been described. However, the present invention is a neural network that executes inference processing other than image recognition. The present invention can also be applied to a processing device configured with.

また、上記第１及び第２実施形態においては、ストリームデータ５０において、処理部１１に利用されるフィルタデータ等の処理係数を圧縮させて含ませるようにしていたが、本発明はこれに限られず、これら処理係数の少なくとも一部を圧縮させずにストリームデータ５０に含ませるようにしてもよい。このようにしても、メインメモリ１０２に、ストリームデータ５０の全体を記憶させておく必要が無いので、メインメモリ１０２に要求される容量を抑えることができ、メインメモリ１０２における他の処理に利用可能な容量を適切に増加させることができる。 Further, in the first and second embodiments, the stream data 50 is configured to include processing coefficients such as filter data used by the processing unit 11 after being compressed, but the present invention is not limited to this. Alternatively, at least a part of these processing coefficients may be included in the stream data 50 without being compressed. Even in this case, since it is not necessary to store the entire stream data 50 in the main memory 102, the capacity required for the main memory 102 can be suppressed and it can be used for other processing in the main memory 102. The capacity can be increased appropriately.

また、上記第２実施形態においては、デコーダ１３とデコーダ１３との間に、ＦＩＦＯ１４を備えるようにしていたが、本発明はこれに限られず、デコーダ１３内にＦＩＦＯ１４の機能を備えるようにしてもよい。 Further, in the second embodiment, the FIFO 14 is provided between the decoder 13 and the decoder 13, but the present invention is not limited to this, and the decoder 14 may be provided with the function of the FIFO 14. Good.

また、上記各実施形態において、ＣＰＵ１０１又はＧＰＵ１０３がプログラムを実行することにより構成していた機能部の少なくとも一部を、別のプロセッサや、特定の処理を実行するハードウェア回路で構成するようにしてもよい。例えば、ＧＰＵ１０３により構成していた処理部１１（例えば、畳込処理部１１−１，１１−３等）を、ＣＰＵ１０１により構成するようにしてもよく、別のハードウェア回路で構成してもよい。また、ＣＰＵ１０１により構成していたデコーダ１３を別のハードウェア回路で構成するようにしてもよい。 Further, in each of the above-described embodiments, at least a part of the functional unit configured by the CPU 101 or the GPU 103 executing a program is configured by another processor or a hardware circuit that executes a specific process. Good. For example, the processing unit 11 (for example, the convolution processing units 11-1, 11-3, etc.) configured by the GPU 103 may be configured by the CPU 101, or may be configured by another hardware circuit. .. Further, the decoder 13 configured by the CPU 101 may be configured by another hardware circuit.

１０…処理制御部、１１…処理部、１２，１３…デコーダ、１４…ＦＩＦＯ、１５…構築部、１６…メモリ部、１７…読出制御部、５０…ストリームデータ、１００，１００Ａ，１００Ｂ…処理装置、１０１…ＣＰＵ、１０２…メインメモリ、１０３…ＧＰＵ 10... Processing control unit, 11... Processing unit, 12, 13... Decoder, 14... FIFO, 15... Construction unit, 16... Memory unit, 17... Read control unit, 50... Stream data, 100, 100A, 100B... Processing device , 101... CPU, 102... Main memory, 103... GPU

Claims

A processing device that performs inference processing on data to be processed,
A plurality of partial processing units that execute respective partial processing in a plurality of processing layers that form the inference processing;
A stream data acquisition unit that acquires stream data in which processing coefficient information corresponding to processing coefficients used in partial processing in a plurality of processing layers that form the inference processing is arranged in accordance with an execution order of the processing layer,
A processing coefficient supply unit that extracts the processing coefficient information from the stream data and supplies a processing coefficient corresponding to the processing coefficient information to a partial processing unit that uses the processing coefficient corresponding to the processing coefficient information;
A processing device comprising.

At least a part of the processing coefficient information in the stream data is a compression processing coefficient obtained by compressing the processing coefficient,
The processing device according to claim 1, wherein the processing coefficient supply unit expands the compression processing coefficient and supplies the expanded processing coefficient to the partial processing unit.

The stream data includes partial processing specifying information for specifying each partial processing in the plurality of processing layers,
The processing device according to claim 1, further comprising a partial processing unit construction unit that constructs a partial processing unit that executes the partial processing based on the partial processing specifying information of the stream data.

The stream data includes compression information indicating a compression method of the processing coefficient in the compression processing coefficient,
The processing device according to claim 2, wherein the processing coefficient supply unit expands the compression processing coefficient based on the compression information.

The processing coefficient includes filter data including values corresponding to respective regions of a two-dimensional plane, and the compression processing coefficient obtained by compressing the filter data is compression filter data obtained by predictively encoding the filter data,
The compression information includes prediction direction information indicating a prediction direction used when predictively encoding the filter data,
The processing device according to claim 4, wherein the processing coefficient supply unit expands the compression filter data based on a prediction direction indicated by the prediction direction information.

The processing device according to any one of claims 1 to 5, wherein the stream data is received from an external device via a communication network.

An inference processing method by a processing device that executes inference processing on data to be processed,
In the processing device, a plurality of partial processing units that execute respective partial processing in a plurality of processing layers that constitute the inference processing are constructed,
Obtaining stream data in which processing coefficient information corresponding to processing coefficients used in partial processing in a plurality of processing layers that form the inference processing is arranged in accordance with the execution order of the processing layers,
The processing coefficient information is taken out in order from the beginning of the stream data, and the processing coefficient based on the processing coefficient information is supplied to the partial processing unit that uses the processing coefficient corresponding to the processing coefficient information.
Inference processing method.