JP2020112840A

JP2020112840A - Data processing device, data processing system, data processing method and program

Info

Publication number: JP2020112840A
Application number: JP2018235212A
Authority: JP
Inventors: ハンノリースケ; Lieske Hanno
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-07-27

Abstract

To implement a decision tree efficiently in a data processing device.SOLUTION: A data processing device comprises a comparator, a combination operation unit, and a data controller. Each comparator receives a plurality of input data vectors and outputs a comparison result vector. The combination operation unit combines the comparison result vectors output from the comparator and outputs a binary output vector. The data controller controls data processing based on the binary output vector.SELECTED DRAWING: Figure 3

Description

本発明は、データ処理装置、データ処理システム、データ処理方法及びプログラムに関する。 The present invention relates to a data processing device, a data processing system, a data processing method and a program.

画像処理や動作認識などの多くの異なる技術分野において、決定木に基づいた画像及び動作の分類アルゴリズムが広く用いられている。決定木は、樹状のグラフ又は決定モデルとこれらの生じ得る結果を用いた決定支援ツールである。 Image and motion classification algorithms based on decision trees are widely used in many different technical fields such as image processing and motion recognition. A decision tree is a decision support tool that uses a dendritic graph or decision model and their possible outcomes.

画像分類（例えば、ＡＣＦ（aggregated channel features）アルゴリズム）は、決定木に基づいている。このアルゴリズムは、最先端のパフォーマンスを実現しつつ、特徴をより高いレベルで近似するものであり、これまでのHOG（histogram of gradient）アルゴリズムよりも高速である。 Image classification (eg, ACF (aggregated channel features) algorithm) is based on decision trees. This algorithm approximates features at a higher level while achieving leading edge performance and is faster than previous HOG (histogram of gradient) algorithms.

特許文献１では、生成されたマスクに基づいて、オペランド選択を伴う条件付き（conditional）ムーブ用の条件付きムーブマスクが生成される。生成されたマスクは、複数オペランドでのシングルＳＩＭＤ（Single Instruction/Multiple Data）スタイル演算の出力である。 In Patent Document 1, a conditional move mask for a conditional move involving operand selection is generated based on the generated mask. The generated mask is the output of a single SIMD (Single Instruction/Multiple Data) style operation with multiple operands.

特許文献２では、１つの比較条件での評価に基づいたＳＩＭＤスタイルでの機能を実行する、多数の条件付き指示からなるコードをどのように高速化するかが記載されている。この解決策としては、コンパイラによる分岐能力を付加することである。 Patent Document 2 describes how to speed up a code composed of a large number of conditional instructions, which executes a SIMD-style function based on an evaluation under one comparison condition. The solution is to add branching capability by the compiler.

特許文献３では、ＳＩＭＤスタイルにおいて、入力ベクトルの要素を出力ベクトルの要素にマッピングする方法が開示されている。このマッピングは、１ベクトルあたり１つが出力される１つの状態ビット比較を用いて制御される。 Patent Document 3 discloses a method of mapping elements of an input vector to elements of an output vector in the SIMD style. This mapping is controlled using one status bit compare, one output per vector.

非特許文献１には、決定木に基づいたＡＣＦ（aggregated channel features）アルゴリズムが開示されている。本文献では、ＡＣＦアルゴリズムが、歩行者検出において最先端のパフォーマンスを実現していることを示している。 Non-Patent Document 1 discloses an ACF (aggregated channel features) algorithm based on a decision tree. This document shows that the ACF algorithm achieves state-of-the-art performance in pedestrian detection.

非特許文献２には、ＡＣＦアルゴリズムを用いることで、リアルタイムのマルチビュー顔検出が可能であることを示している。 Non-Patent Document 2 shows that real-time multi-view face detection is possible by using the ACF algorithm.

米国特許第７，４８０，７８７号明細書US Pat. No. 7,480,787 特開２００９−１２８９８９号公報JP, 2009-128989, A 米国特許公開第２０１３／０２１２３５５号明細書U.S. Patent Publication No. 2013/0212355

R.Zabih and J.Woodfill, "Fast Feature Pyramids for Object Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014; 36(8):1532-1545.R. Zabih and J. Woodfill, "Fast Feature Pyramids for Object Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence. 2014; 36(8):1532-1545. B.Yang, J.Yan, Z. Lei, S.Z.Li, "Aggregate Channel Features for Multi-view Face Detection" in: 2014 IEEE International Joint Conference on Biometrics (IJCB), 2014, pp.1-8.B.Yang, J.Yan, Z. Lei, S.Z.Li, "Aggregate Channel Features for Multi-view Face Detection" in: 2014 IEEE International Joint Conference on Biometrics (IJCB), 2014, pp.1-8.

非特許文献は、ＡＣＦアルゴリズム自体の内部構成と、リアルタイム用途におけるアルゴリズムの使用方法を提供している。しかし、ＡＣＦハードウェア実装の詳細については、何も提供されていない。 Non-Patent Documents provide the internal structure of the ACF algorithm itself and how to use the algorithm in real-time applications. However, nothing is provided about the details of the ACF hardware implementation.

特許文献１〜３では、ＳＩＭＤベクトルの各要素の制御パスにおいては、単一の演算出力が用いられている。非特許文献１及び２では、実装の詳細を提供することなく、決定木の概念が導入されている。問題点としては、決定木を実装するにあたり、１以上の決定について評価することである。特許文献１〜３に対する現在あり得べき解決方法では、多大なる時間を掛けて、単一の演算処理を順次行うことによってのみ処理することが可能である。（すなわち、３つの特徴ノードを有する）２階層の決定木に基づいた処理を考慮すると、この決定木のために単一の演算処理を順次実行するには、比較的多くのサイクルが必要である。 In Patent Documents 1 to 3, a single operation output is used in the control path of each element of the SIMD vector. Non-Patent Documents 1 and 2 introduce the concept of a decision tree without providing implementation details. The problem is to evaluate one or more decisions when implementing the decision tree. The currently possible solutions to Patent Documents 1 to 3 can be processed only by sequentially performing a single arithmetic process, taking a great amount of time. Considering processing based on a two-level decision tree (that is, having three feature nodes), it takes a relatively large number of cycles to sequentially perform a single arithmetic operation for this decision tree. ..

決定木演算はＡＣＦアルゴリズムの内部で行われる主処理であるので、到達可能な実行速度は、リアルタイム用途においては重要である。 The reachable execution speed is important in real-time applications, since the decision tree operation is the main processing performed inside the ACF algorithm.

一実施の形態にかかるデータ処理装置は、それぞれが複数の入力データベクトルを受け取って比較結果ベクトルを出力する、複数の比較器と、比較器から出力された前記比較結果ベクトルを結合してバイナリ出力ベクトルを出力する結合演算部と、前記バイナリ出力ベクトルに基づいてデータ処理を制御するデータコントローラと、を有するものである。 A data processing device according to an embodiment is configured to combine a plurality of comparators, which receive a plurality of input data vectors and output a comparison result vector, and the comparison result vectors output from the comparators, and to output a binary output. It has a combination operation unit that outputs a vector and a data controller that controls data processing based on the binary output vector.

一実施の形態にかかるデータ処理システムは、メモリ部と、バスと、前記バスを介して前記メモリ部と通信するデータ処理装置と、を有し、前記データ処理装置は、それぞれが複数の入力データベクトルを受け取って比較結果ベクトルを出力する、複数の比較器と、比較器から出力された前記比較結果ベクトルを結合してバイナリ出力ベクトルを出力する結合演算部と、前記バイナリ出力ベクトルに基づいてデータ処理を制御するデータコントローラと、を有するものである。 A data processing system according to an embodiment includes a memory unit, a bus, and a data processing device that communicates with the memory unit via the bus, and each of the data processing devices has a plurality of input data. A plurality of comparators that receive a vector and output a comparison result vector, a combining operation unit that combines the comparison result vectors output from the comparators and outputs a binary output vector, and data based on the binary output vector And a data controller for controlling processing.

一実施の形態にかかるデータ処理方法は、複数の入力データベクトルを受け取って比較結果ベクトルを出力し、比較器から出力された前記比較結果ベクトルを結合してバイナリ出力ベクトルを出力し、前記バイナリ出力ベクトルに基づいてデータ処理を制御するものである。 A data processing method according to an embodiment is to receive a plurality of input data vectors, output a comparison result vector, combine the comparison result vectors output from a comparator to output a binary output vector, and output the binary output vector. The data processing is controlled based on the vector.

一実施の形態にかかるデータ処理プログラムは、複数の入力データベクトルを受け取って比較結果ベクトルを出力する処理と、比較器から出力された前記比較結果ベクトルを結合してバイナリ出力ベクトルを出力する処理と、前記バイナリ出力ベクトルに基づいてデータ処理を制御する処理と、をコンピュータに実行させるものである。 A data processing program according to an embodiment includes a process of receiving a plurality of input data vectors and outputting a comparison result vector, and a process of combining the comparison result vectors output from a comparator and outputting a binary output vector. , A process for controlling data processing based on the binary output vector, and a computer.

一実施の形態によれば、データ処理装置に効率的に決定木を実装することができる。 According to one embodiment, a decision tree can be efficiently implemented in a data processing device.

データ処理システム内の実施の形態１にかかるデータ処理装置の実装を模式的に示す図である。It is a figure which shows typically mounting of the data processing apparatus concerning Embodiment 1 in a data processing system. 実施の形態１にかかるデータ処理装置への決定木の適用を示す図である。FIG. 3 is a diagram showing application of a decision tree to the data processing device according to the first exemplary embodiment. 実施の形態１にかかるデータ処理装置の構成を模式的に示す図である。FIG. 1 is a diagram schematically showing a configuration of a data processing device according to a first exemplary embodiment. 実施の形態１にかかるデータ処理装置でのデータ処理を模式的に示す図である。FIG. 3 is a diagram schematically showing data processing in the data processing device according to the first exemplary embodiment. 実施の形態２にかかるデータ処理装置の構成を模式的に示す図である。FIG. 3 is a diagram schematically showing the configuration of a data processing device according to a second exemplary embodiment. 実施の形態３にかかるデータ処理装置の構成を模式的に示す図である。It is a figure which shows typically the structure of the data processing apparatus concerning Embodiment 3.

以下、図面を参照して本発明の実施の形態について説明する。各図面においては、同一要素には同一の符号が付されており、必要に応じて重複説明は省略される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In each of the drawings, the same reference numerals are given to the same elements, and redundant description will be omitted as necessary.

実施の形態１
実施の形態１にかかるデータ処理装置について説明する。実施の形態１にかかるデータ処理装置１００は、決定木用の処理を行うものとして構成される。図１に、実施の形態１にかかるデータ処理装置１００のデータ処理システム１０００への実装を模式的に示す。 Embodiment 1
The data processing apparatus according to the first embodiment will be described. The data processing device 100 according to the first embodiment is configured to perform processing for a decision tree. FIG. 1 schematically shows the mounting of the data processing device 100 according to the first embodiment in a data processing system 1000.

図１に示すように、データ処理システム１０００は、中央演算部（ＣＰＵ：central processing unit）１０１、メモリ１０２、プロセッサ１０３及びバス１０４を有する。データ処理システム１０００は、ＳＩＭＤ（Single Instruction/Multiple Data）処理を実行可能なものとして構成される。 As shown in FIG. 1, the data processing system 1000 includes a central processing unit (CPU) 101, a memory 102, a processor 103, and a bus 104. The data processing system 1000 is configured to be capable of executing SIMD (Single Instruction/Multiple Data) processing.

データ処理システム１０００のＣＰＵ１０１、メモリ１０２、プロセッサ１０３及び他のモジュール及び構成要素は、バス１０４より接続され、これらの装置は相互にデータ、パラメータ、プログラム、アルゴリズムなどをやり取りすることができる。ＣＰＵ１０１は、データ処理システム１０００での演算を制御するための様々な処理を行う。メモリ１０２は、データ、パラメータ、プログラム、アルゴリズムなどを格納し、データ処理システム１０００のＣＰＵ１０１、メモリ１０２、プロセッサ１０３及び他のモジュール及び構成要素からの要求に応じて、これらの装置へデータ、パラメータ、プログラム、アルゴリズムなどを提供することができる。 The CPU 101, the memory 102, the processor 103, and other modules and components of the data processing system 1000 are connected via a bus 104, and these devices can mutually exchange data, parameters, programs, algorithms, and the like. The CPU 101 performs various processes for controlling the calculation in the data processing system 1000. The memory 102 stores data, parameters, programs, algorithms, etc., and sends data, parameters, and data to these devices in response to requests from the CPU 101, the memory 102, the processor 103, and other modules and components of the data processing system 1000. Programs, algorithms, etc. can be provided.

プロセッサ１０３は、データ処理装置１００を有する。例えば、プロセッサ１０３は論理演算部を有していてもよく、データ処理装置１００は論理演算部に含まれてもよい。 The processor 103 includes the data processing device 100. For example, the processor 103 may include a logical operation unit, and the data processing device 100 may be included in the logical operation unit.

本実施の形態では、データ処理装置１００は、決定木に基づいて出力すべきデータを制御する。ここで、データ処理装置１００に適用される決定木について説明する。図２に、実施の形態１にかかるデータ処理装置１００への決定木の適用を示す。 In the present embodiment, the data processing device 100 controls the data to be output based on the decision tree. Here, the decision tree applied to the data processing device 100 will be described. FIG. 2 shows application of a decision tree to the data processing device 100 according to the first exemplary embodiment.

初めに、単一の値である第１の入力データＹが、第１の閾値ＴＨ１と比較される。第１の入力データＹが第１の閾値ＴＨ１よりも小さい場合（Ｙ＜ＴＨ１）、第１の比較Ｃ１の比較結果は「真」（第１の比較Ｃ１からの出力Ｙ）となる。第１の入力データＹが第１の閾値ＴＨ１以上である場合（Ｙ≧ＴＨ１）、第１の比較Ｃ１の比較結果は「偽」（第１の比較Ｃ１からの出力Ｎ）となる。 First, the first input data Y, which is a single value, is compared with the first threshold TH1. When the first input data Y is smaller than the first threshold TH1 (Y<TH1), the comparison result of the first comparison C1 is “true” (output Y from the first comparison C1). When the first input data Y is greater than or equal to the first threshold TH1 (Y≧TH1), the comparison result of the first comparison C1 is “false” (output N from the first comparison C1).

第１の比較Ｃ１の比較結果が「真」の場合（すなわち、Ｙ＜ＴＨ１）には、単一の値である第２の入力データＵが第２の閾値ＴＨ２と比較される。第２の入力データＵが第２の閾値ＴＨ２よりも小さい場合（Ｕ＜ＴＨ２）、第２の比較Ｃ２の比較結果は「真」（第２の比較Ｃ２からの出力Ｙ）となり、「０ｘ１」が結果値Ｒとして出力される。なお、「０ｘ＊＊」は１６進数を表しており、「＊」は１〜９及びＡ〜Ｆのいずれかである。第２の入力データＵが第２の閾値ＴＨ２以上である場合（Ｕ≧ＴＨ２）、第２の比較Ｃ２の比較結果は「偽」（第２の比較Ｃ２からの出力Ｎ）となり、「０ｘ２」が結果値Ｒとして出力される。 When the comparison result of the first comparison C1 is "true" (that is, Y<TH1), the second input data U having a single value is compared with the second threshold value TH2. When the second input data U is smaller than the second threshold TH2 (U<TH2), the comparison result of the second comparison C2 is “true” (output Y from the second comparison C2) and “0x1”. Is output as the result value R. In addition, "0x**" represents the hexadecimal number, and "*" is either 1-9 or AF. When the second input data U is greater than or equal to the second threshold TH2 (U≧TH2), the comparison result of the second comparison C2 is “false” (output N from the second comparison C2) and “0x2”. Is output as the result value R.

第１の比較Ｃ１の比較結果が「偽」の場合（すなわち、Ｙ≧ＴＨ１）には、第２の入力データＵが第３の閾値ＴＨ３と比較される。第２の入力データＵが第３の閾値ＴＨ３よりも小さい場合（Ｕ＜ＴＨ３）、第３の比較Ｃ３の比較結果は「真」（第３の比較Ｃ３からの出力Ｙ）となり、「０ｘ３」が結果値Ｒとして出力される。第２の入力データＵが第３の閾値ＴＨ３以上である場合（Ｕ≧ＴＨ３）、第３の比較Ｃ３の比較結果は「偽」（第３の比較Ｃ３からの出力Ｎ）となり、「０ｘ４」が結果値Ｒとして出力される。 When the comparison result of the first comparison C1 is “false” (that is, Y≧TH1), the second input data U is compared with the third threshold value TH3. When the second input data U is smaller than the third threshold TH3 (U<TH3), the comparison result of the third comparison C3 is “true” (output Y from the third comparison C3), and “0x3”. Is output as the result value R. When the second input data U is greater than or equal to the third threshold TH3 (U≧TH3), the comparison result of the third comparison C3 is “false” (output N from the third comparison C3), and “0x4”. Is output as the result value R.

本実施の形態では、データ処理装置１００は、決定木での処理の結果に基づいて、データのムーブ演算を実行する。データ処理装置１００の構成及び動作について説明する。図３に、実施の形態１にかかるデータ処理装置１００の構成を模式的に示す。図４に、実施の形態１にかかるデータ処理装置１００のデータ処理を模式的に示す。 In the present embodiment, the data processing device 100 executes the data move operation based on the result of the processing on the decision tree. The configuration and operation of the data processing device 100 will be described. FIG. 3 schematically shows the configuration of the data processing device 100 according to the first embodiment. FIG. 4 schematically shows the data processing of the data processing device 100 according to the first embodiment.

データ処理装置１００は、比較器１１、比較器１２、結合演算部１及びデータコントローラ２を有する。 The data processing device 100 includes a comparator 11, a comparator 12, a combination operation unit 1, and a data controller 2.

比較器１１は、入力データベクトルＹｉｎを、例えばメモリ部１０２から読み込む。入力データは、複数の値を含んでいる。この場合、入力データベクトルＹｉｎは、「＊」を１６進数とした場合に、２桁の１６進数「０ｘ＊＊」によって表される、４つの８ビット値Ｙ１〜Ｙ４を含むものとする。すなわち、Ｙｉｎ＝｛Ｙ１，Ｙ２，Ｙ３，Ｙ４｝である。例えば、Ｙ１＝０ｘ２０、Ｙ２＝０ｘＡ０、Ｙ３＝０ｘ６０及びＹ４＝０ｘＣ０である場合、Ｙｉｎ＝０ｘ２０Ａ０６０Ｃ０となる。 The comparator 11 reads the input data vector Yin from the memory unit 102, for example. The input data includes multiple values. In this case, the input data vector Yin is assumed to include four 8-bit values Y1 to Y4 represented by a 2-digit hexadecimal number “0x**” when “*” is a hexadecimal number. That is, Yin={Y1, Y2, Y3, Y4}. For example, if Y1=0x20, Y2=0xA0, Y3=0x60 and Y4=0xC0, then Yin=0x20A060C0.

また、比較器１１は、閾値入力データベクトルＴ１を、例えばメモリ部１０２から読み込む。閾値入力データベクトルＴ１は、複数の値を含んでいる。この場合、閾値入力データベクトルＴ１についても、２桁の１６進数によって表される、４つの８ビット値Ｔ１１〜Ｔ１４を含むものとする。本実施の形態では、Ｔ１１〜Ｔ１４は同じ値である。すなわち、Ｔ１＝｛Ｔ１１，Ｔ１２，Ｔ１３，Ｔ１４｝であり、かつ、Ｔ１１＝Ｔ１２＝Ｔ１３＝Ｔ１４＝ＴＨ１である。例えば、ＴＨ１＝０ｘ８０である場合、Ｔ１＝０ｘ８０８０８０８０となる。 Further, the comparator 11 reads the threshold value input data vector T1 from, for example, the memory unit 102. The threshold input data vector T1 includes a plurality of values. In this case, the threshold input data vector T1 also includes four 8-bit values T11 to T14 represented by a 2-digit hexadecimal number. In the present embodiment, T11 to T14 have the same value. That is, T1={T11, T12, T13, T14} and T11=T12=T13=T14=TH1. For example, when TH1=0x80, T1=0x80808080.

比較器１１は、入力データベクトルＹｉｎを閾値入力データベクトルＴ１と比較する。すなわち、値Ｙ１〜Ｙ４は、それぞれ閾値ＴＨ１と比較される。本実施の形態では、比較器１１は、各入力値（Ｙ１〜Ｙ４のそれぞれ）が閾値ＴＨ１よりも小さいかを判定する。上述したように、Ｙｉｎ＝０ｘ２０Ａ０６０Ｃ０かつＴ１＝０ｘ８０８０８０８０であるので、Ｙ１，Ｙ３＜ＴＨ１（「真」）であり、Ｙ２，Ｙ４≧ＴＨ１（「偽」）である。この場合、比較器１１は、比較結果ベクトルＲ１として０ｘ０００００００Ａを出力する。ここで、Ａは、２進数「１０１０」で表される。 The comparator 11 compares the input data vector Yin with the threshold input data vector T1. That is, the values Y1 to Y4 are respectively compared with the threshold value TH1. In the present embodiment, the comparator 11 determines whether each input value (each of Y1 to Y4) is smaller than the threshold value TH1. As described above, since Yin=0x20A060C0 and T1=0x80808080, Y1, Y3<TH1 (“true”), and Y2, Y4≧TH1 (“false”). In this case, the comparator 11 outputs 0x0000000A as the comparison result vector R1. Here, A is represented by a binary number “1010”.

比較器１２は、入力データベクトルＵｉｎを、例えばメモリ部１０２から読み込む。この場合、入力データベクトルＵｉｎについても、２桁の１６進数で表される、４つの８ビット値Ｕ１〜Ｕ４を含むものとする。すなわち、Ｕｉｎ＝｛Ｕ１，Ｕ２，Ｕ３，Ｕ４｝である。例えば、Ｕ１＝０ｘ１０、Ｕ２＝０ｘ３０、Ｕ３＝０ｘＥ０及びＵ４＝０ｘＦ０である場合、Ｕｉｎ＝０ｘ１０３０Ｅ０Ｆ０となる。 The comparator 12 reads the input data vector Uin from, for example, the memory unit 102. In this case, the input data vector Uin also includes four 8-bit values U1 to U4 represented by a 2-digit hexadecimal number. That is, Uin={U1, U2, U3, U4}. For example, if U1=0x10, U2=0x30, U3=0xE0 and U4=0xF0, then Uin=0x1030E0F0.

比較器１２は、閾値入力データベクトルＴ２を、例えばメモリ部１０２から読み込む。この場合、閾値入力データベクトルＴ２についても、２桁の１６進数によって表される、４つの８ビット値Ｔ２１〜Ｔ２４を含むものとする。本実施の形態では、Ｔ２１〜Ｔ２４は同じ値である。すなわち、Ｔ２＝｛Ｔ２１，Ｔ２２，Ｔ２３，Ｔ２４｝であり、かつ、Ｔ２１＝Ｔ２２＝Ｔ２３＝Ｔ２４＝ＴＨ２である。例えば、ＴＨ２＝０ｘＣ０である場合、Ｔ１＝０ｘＣ０Ｃ０Ｃ０Ｃ０となる。 The comparator 12 reads the threshold value input data vector T2 from, for example, the memory unit 102. In this case, the threshold value input data vector T2 also includes four 8-bit values T21 to T24 represented by a 2-digit hexadecimal number. In the present embodiment, T21 to T24 have the same value. That is, T2={T21, T22, T23, T24} and T21=T22=T23=T24=TH2. For example, when TH2=0xC0, T1=0xC0C0C0C0.

比較器１２は、入力データベクトルＵｉｎを閾値入力データベクトルＴ２と比較する。すなわち、値Ｕ１〜Ｕ４は、それぞれ閾値ＴＨ２と比較される。本実施の形態では、比較器１２は、各入力値（Ｕ１〜Ｕ４のそれぞれ）が閾値ＴＨ２よりも小さいかを判定する。上述したように、Ｕｉｎ＝０ｘ１０３０Ｅ０Ｆ０かつＴ２＝０ｘＣ０Ｃ０Ｃ０Ｃ０であるので、Ｕ１，Ｕ１＜ＴＨ２（「真」）であり、Ｕ３，Ｕ４≧ＴＨ２（「偽」）である。この場合、比較器１２は、比較結果ベクトルＲ２として０ｘ０００００００Ｃを出力する。ここで、Ｃは、２進数「１１００」で表される。 The comparator 12 compares the input data vector Uin with the threshold input data vector T2. That is, the values U1 to U4 are respectively compared with the threshold value TH2. In the present embodiment, the comparator 12 determines whether each input value (each of U1 to U4) is smaller than the threshold value TH2. As described above, since Uin=0x1030E0F0 and T2=0xC0C0C0C0, U1, U1<TH2 (“true”) and U3, U4≧TH2 (“false”). In this case, the comparator 12 outputs 0x00000000C as the comparison result vector R2. Here, C is represented by a binary number "1100".

結合演算部１は、比較結果ベクトルＲ１及びＲ２に対して論理和演算（ＡＮＤ）を行い、得られたデータをバイナリ出力ベクトルＢＩＮとして出力する。この場合、結合演算部１は、Ｒ１＝１０１０及びＲ２＝１１００に対して論理和演算（ＡＮＤ）を行い、バイナリ出力ベクトルＢＩＮ＝１０００を出力する。 The combination operation unit 1 performs a logical sum operation (AND) on the comparison result vectors R1 and R2, and outputs the obtained data as a binary output vector BIN. In this case, the join operation unit 1 performs a logical sum operation (AND) on R1=1010 and R2=1100 and outputs a binary output vector BIN=1000.

データコントローラ２は、バイナリ出力ベクトルＢＩＮに応じて、入力データベクトルＤＡ及びＤＢを用いることで、出力データベクトルＤＣを制御する。入力データベクトルＤＡ、入力データベクトルＤＢ及び出力データベクトルＤＣは、入力データベクトルＹｉｎ及びＵｉｎ、閾値入力データベクトルＴ１及びＴ２と同様に、２桁の１６進数で表される８ビット値である。すなわち、入力データベクトルＤＡ＝｛ＤＡ１，ＤＡ２，ＤＡ３，ＤＡ４｝、入力データベクトルＤＢ＝｛ＤＢ１，ＤＢ２，ＤＢ３，ＤＢ４｝、出力データベクトルＤＣ＝｛ＤＣ１，ＤＣ２，ＤＣ３，ＤＣ４｝である。但し、ＤＡ１〜ＤＡ４、ＤＢ１〜ＤＢ４及びＤＣ１〜ＤＣ４は２桁の１６進数である。 The data controller 2 controls the output data vector DC by using the input data vectors DA and DB according to the binary output vector BIN. The input data vector DA, the input data vector DB, and the output data vector DC are 8-bit values represented by two-digit hexadecimal numbers like the input data vectors Yin and Uin and the threshold input data vectors T1 and T2. That is, input data vector DA={DA1, DA2, DA3, DA4}, input data vector DB={DB1, DB2, DB3, DB4}, output data vector DC={DC1, DC2, DC3, DC4}. However, DA1 to DA4, DB1 to DB4, and DC1 to DC4 are two-digit hexadecimal numbers.

バイナリ出力ベクトルＢＩＮのビットが「１」である場合、入力データベクトルＤＡの対応する位置のビットの値が、出力データベクトルＤＣの対応する位置のビットへ転送される。一方、バイナリ出力ベクトルＢＩＮのビットが「０」である場合、入力データベクトルＤＢの対応する位置のビットの値が、出力データベクトルＤＣの対応する位置のビットへ転送される。 When the bit of the binary output vector BIN is "1", the value of the bit at the corresponding position of the input data vector DA is transferred to the bit at the corresponding position of the output data vector DC. On the other hand, when the bit of the binary output vector BIN is “0”, the value of the bit at the corresponding position of the input data vector DB is transferred to the bit at the corresponding position of the output data vector DC.

この場合、バイナリ出力ベクトルＢＩＮは「１０００」であるので、入力データベクトルＤＡの最初の位置「ＤＡ１」の値が出力データベクトルＤＣの最初の位置「ＤＣ１」へ転送され、入力データベクトルＤＢの２〜４番目の位置「ＤＡ２」、「ＤＡ３」及び「ＤＡ４」の値がそれぞれ出力データベクトルＤＣの２〜４番目の位置「ＤＣ２」、「ＤＣ３」及び「ＤＣ４」へ転送される。よって、データコントローラ２は、出力データＤＣ＝｛ＤＣ１，ＤＣ２，ＤＣ３，ＤＣ４｝＝｛ＤＡ１，ＤＢ２，ＤＢ３，ＤＢ４｝を出力する。 In this case, since the binary output vector BIN is "1000", the value of the first position "DA1" of the input data vector DA is transferred to the first position "DC1" of the output data vector DC, and 2 of the input data vector DB is transferred. The values of the fourth to fourth positions "DA2", "DA3" and "DA4" are transferred to the second to fourth positions "DC2", "DC3" and "DC4" of the output data vector DC, respectively. Therefore, the data controller 2 outputs the output data DC={DC1, DC2, DC3, DC4}={DA1, DB2, DB3, DB4}.

すなわち、バイナリ出力ベクトルＢＩＮのビット「１」は、出力データベクトルＤＣの対応する位置に転送される入力データベクトルＤＢの値を、入力データベクトルＤＡの対応する位置の値に置き換えるフラグとして機能する。バイナリ出力ベクトルＢＩＮがビット「１」を１つも含まない場合（すなわち、ＢＩＮ＝００００）、入力データベクトルＤＢの値（ＤＢ１〜ＤＢ４）の全てが何ら変化することなく出力データベクトルＤＣへ転送される（すなわち、ＤＣ＝｛ＤＣ１，ＤＣ２，ＤＣ３，ＤＣ４｝＝｛ＤＢ１，ＤＢ２，ＤＢ３，ＤＢ４｝）ことは、言うまでもない。 That is, the bit “1” of the binary output vector BIN functions as a flag that replaces the value of the input data vector DB transferred to the corresponding position of the output data vector DC with the value of the corresponding position of the input data vector DA. When the binary output vector BIN does not include any bit "1" (that is, BIN=0000), all the values (DB1 to DB4) of the input data vector DB are transferred to the output data vector DC without any change. It goes without saying that (that is, DC={DC1, DC2, DC3, DC4}={DB1, DB2, DB3, DB4}).

また、入力データベクトルＤＡはベクトルであるものとしたが、入力データＤＡはスカラー値であってもよい。この場合、バイナリ出力ベクトルＢＩＮがビット「１」であるときには、スカラー値ＤＡ０が出力データベクトルＤＣの対応する位置に転送される。すなわち、バイナリ出力ベクトルＢＩＮが例えば「１０１０」である場合、出力データベクトルＤＣは｛ＤＡ、ＤＢ２、ＤＡ、ＤＢ４｝であってもよい。 Further, although the input data vector DA is assumed to be a vector, the input data DA may be a scalar value. In this case, when the binary output vector BIN is bit "1", the scalar value DA0 is transferred to the corresponding position of the output data vector DC. That is, when the binary output vector BIN is "1010", for example, the output data vector DC may be {DA, DB2, DA, DB4}.

本構成では、出力データベクトルＤＣを、入力データベクトルＤＢに戻してもよい。換言すれば、入力データベクトルＤＢは、前のサイクルで生成された出力データベクトルＤＣによって、連続的に更新されてもよい。 In this configuration, the output data vector DC may be returned to the input data vector DB. In other words, the input data vector DB may be continuously updated by the output data vector DC generated in the previous cycle.

データ処理装置１００は、１サイクルで、各値（すなわち、値のセット（Ｙｋ，Ｕｋ，Ｔ１ｋ，Ｔ２ｋ）、但し１≦ｋ≦４のそれぞれ）について、上述のデータムーブ（条件付きムーブ）演算を行うことができる。これにより、データ処理装置１００は、４サイクルだけで、同時に読み込まれた４つの値のセットの全てに対して、データムーブ演算を行うことができる。 The data processing device 100 performs the above-described data move (conditional move) operation for each value (that is, a set of values (Yk, Uk, T1k, T2k), where 1≦k≦4) in one cycle. It can be carried out. As a result, the data processing device 100 can perform the data move operation on all of the sets of four values read at the same time in only four cycles.

一方で、特許文献や非特許文献などに基づく一般的構成によれば、図２に示すように、２階層の決定木処理は、３つの比較ノード（Ｃ１〜Ｃ３）が必要である。この一般的構成では、Ｙの読み込み、Ｕの読み込み及び比較ノードＣ１〜Ｃ３のそれぞれについて少なくとも１つのサイクルを必要とする１つの指示が必要となるので、このような決定木では、決定分岐に応じて５つ又は６つのサイクルが必要となる。また、比較ノードＣ１〜Ｃ３において、比較及び分岐の指示に伴う停止サイクルが必要である場合、これらの比較ノードのそれぞれは、少なくとも２サイクルを必要とする。結果として、一般的な構成では、２階層の決定木処理を実行するのに、１つの値セット（Ｙ及びＵ）あたりで、８つ、９つ又はより多くのサイクルを要する。ここでは、一般的構成は、２階層の決定木処理について、１つの値セットあたりで１０サイクルを要するものとする。この場合、データ処理装置１００におけるように４つの値のセットに対して同じ演算を行おうとすると、データ処理装置１００が４サイクルだけしか要しないのに対し、一般的構成では１０×４＝４０サイクルが必要となる。 On the other hand, according to a general configuration based on patent documents, non-patent documents, etc., as shown in FIG. 2, the decision tree process of two layers requires three comparison nodes (C1 to C3). This general configuration requires one instruction that requires at least one cycle for each of Y reading, U reading, and each of the comparison nodes C1 to C3. Therefore, in such a decision tree, according to the decision branch, 5 or 6 cycles are required. Further, in the comparison nodes C1 to C3, when a stop cycle associated with a comparison and branch instruction is required, each of these comparison nodes requires at least two cycles. As a result, typical configurations require eight, nine, or more cycles per value set (Y and U) to perform a two-tier decision tree process. Here, it is assumed that the general configuration requires 10 cycles for each value set for two-layer decision tree processing. In this case, if the same operation is performed on a set of four values as in the data processing device 100, the data processing device 100 requires only 4 cycles, whereas in the general configuration, 10×4=40 cycles. Is required.

すなわち、それぞれ複数の値を含んでいる複数の入力データベクトル及び閾値入力データベクトルを少ないサイクルで処理することができるので、データムーブ演算の処理速度を向上させることができる。 That is, since it is possible to process a plurality of input data vectors and threshold input data vectors each containing a plurality of values in a small number of cycles, it is possible to improve the processing speed of the data move calculation.

なお、同時に読み込まれる各ベクトルの値の数は、４に限られない。同時にロードされる各ベクトルの値の数は、２、３、５又はより大きな値であってもよい。 The number of values of each vector read simultaneously is not limited to four. The number of values in each vector loaded at the same time may be 2, 3, 5 or higher.

本実施の形態では、結合演算部１は、論理和演算（ＡＮＤ）を実行するが、結合演算部１は、論理積演算（ＯＲ）、排他的論理和演算（ＸＯＲ）、否定論理積演算（ＮＡＮＤ）、否定論理和演算（ＮＯＲ）又は否定排他的論理和演算（ＸＮＯＲ）を実行してもよい。 In the present embodiment, the join operation unit 1 executes the logical sum operation (AND), but the join operation unit 1 performs the logical product operation (OR), the exclusive OR operation (XOR), and the negative logical product operation ( NAND), NOR operation (NOR), or exclusive NOR operation (XNOR) may be executed.

本実施の形態では、データ処理装置１００の動作は図２に示す２つのマスクを提供する２階層の決定木に基づいているが、これは例示に過ぎない。データ処理装置１００の動作は、２以上のマスクを提供するより深い階層の決定木に基づいていてもよい。 In the present embodiment, the operation of the data processing device 100 is based on the two-layer decision tree that provides the two masks shown in FIG. 2, but this is merely an example. The operation of the data processing device 100 may be based on a deeper hierarchy of decision trees that provide more than one mask.

本実施の形態では、入力データベクトル及び出力データベクトルは、４つの８ビット値を含んでいる。しかし、入力データベクトル及び出力データベクトルに含まれる値は、８ビット値に限定されるものではない。入力データベクトル及び出力データベクトルは、４つの１６ビット値を含んでもよい。入力データベクトル及び出力データベクトルに含まれる値のそれぞれは、４桁の１６進数「０ｘ＊＊＊＊」である。但し、「＊」は１６進数である。 In the present embodiment, the input data vector and the output data vector include four 8-bit values. However, the values contained in the input data vector and the output data vector are not limited to 8-bit values. The input data vector and output data vector may include four 16-bit values. Each of the values included in the input data vector and the output data vector is a 4-digit hexadecimal number “0x***”. However, "*" is a hexadecimal number.

入力データ及び閾値入力データは、８ビット値又は１６ビット値に限定されるものではない。２ビット、４ビット、３２ビットなど他の多ビット値によって入力データ、閾値入力データ及び出力データを構成してもよい。 The input data and threshold input data are not limited to 8-bit values or 16-bit values. The input data, the threshold input data, and the output data may be configured by other multi-bit values such as 2 bits, 4 bits, and 32 bits.

実施の形態２
次に、実施の形態２にかかるデータ処理装置について説明する。図５に、実施の形態２にかかるデータ処理装置２００の構成を模式的に示す。データ処理装置２００は、実施の形態１にかかるデータ処理装置１００に、パイプラインレジスタ部３を追加した構成を有する。 Embodiment 2
Next, a data processing device according to the second embodiment will be described. FIG. 5 schematically shows the configuration of the data processing device 200 according to the second embodiment. The data processing device 200 has a configuration in which the pipeline register unit 3 is added to the data processing device 100 according to the first embodiment.

パイプラインレジスタ部３は、比較器１１及び１２と結合演算部１との間に挿入される。パイプラインレジスタ部３は、コンパレータ１１及び１２から出力される比較結果を連続的かつ一時的に格納する。パイプラインレジスタ部３は、同じサイクルにて比較器１１及び１２からデータコントローラ２へ出力される比較結果Ｒ１及びＲ２を連続的に出力する。 The pipeline register unit 3 is inserted between the comparators 11 and 12 and the combining operation unit 1. The pipeline register unit 3 stores the comparison results output from the comparators 11 and 12 continuously and temporarily. The pipeline register unit 3 continuously outputs the comparison results R1 and R2 output from the comparators 11 and 12 to the data controller 2 in the same cycle.

よって、本構成によれば、データ処理装置３００はパイプライン処理によって高い許容動作周波数を実現できるので、データ処理装置３００は実施の形態１よりも高いスループットにて連続的なデータムーブ演算を実行できる。 Therefore, according to this configuration, the data processing device 300 can realize a high allowable operating frequency by the pipeline processing, and therefore the data processing device 300 can execute continuous data move calculation with higher throughput than that in the first embodiment. ..

実施の形態３
次に、実施の形態３にかかるデータ処理装置について説明する。図６に、実施の形態３にかかるデータ処理装置３００の構成を模式的に示す。データ処理装置３００は、実施の形態１にかかるデータ処理装置１００に、比較器１３及び１４を追加した構成を有する。 Embodiment 3
Next, a data processing device according to the third embodiment will be described. FIG. 6 schematically shows the configuration of the data processing device 300 according to the third embodiment. The data processing device 300 has a configuration in which comparators 13 and 14 are added to the data processing device 100 according to the first embodiment.

比較器１３は、入力データベクトルＸｉｎ及び閾値データベクトルＴ３を受け取って比較し、比較結果Ｒ３を出力する。入力データベクトルＸｉｎ及び閾値データベクトルＴ３は、それぞれ入力データベクトルＹｉｎ及び閾値データベクトルＴ１に対応する。 The comparator 13 receives and compares the input data vector Xin and the threshold data vector T3, and outputs the comparison result R3. The input data vector Xin and the threshold data vector T3 correspond to the input data vector Yin and the threshold data vector T1, respectively.

比較器１４は、入力データベクトルＺｉｎ及び閾値データベクトルＴ４を受け取って比較し、比較結果Ｒ４を出力する。入力データベクトルＺｉｎ及び閾値データベクトルＴ４は、それぞれ入力データベクトルＵｉｎ及び閾値データベクトルＴ２に対応する。 The comparator 14 receives and compares the input data vector Zin and the threshold data vector T4, and outputs the comparison result R4. The input data vector Zin and the threshold data vector T4 correspond to the input data vector Uin and the threshold data vector T2, respectively.

結合演算部１は、比較結果ベクトルＲ１〜Ｒ４に対して論理和演算（ＡＮＤ）を行う。これにより、比較結果ベクトルＲ１〜Ｒ４の中の同じ位置の値が「１」（真）となり、バイナリ出力ベクトルＢＩＮの対応するビットが「１」となる。 The combination operation unit 1 performs a logical sum operation (AND) on the comparison result vectors R1 to R4. As a result, the value at the same position in the comparison result vectors R1 to R4 becomes "1" (true), and the corresponding bit in the binary output vector BIN becomes "1".

本構成によれば、入力ベクトルの４つのペアに対応可能である。この場合、データ処理装置３００は、４階層の決定木に基づくデータ処理に対応できる。 According to this configuration, it is possible to handle four pairs of input vectors. In this case, the data processing device 300 can deal with data processing based on the decision tree of four layers.

なお、入力ベクトルのペアの数は２又は４に限定されるものではなく、入力ベクトルのペアの数は、適宜、３つ、５つ又はより多くてもよい。 The number of input vector pairs is not limited to two or four, and the number of input vector pairs may be three, five, or more as appropriate.

その他の実施の形態
なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上述の実施の形態では、データコントローラでの演算は、条件付ムーブ演算であるものとして説明した。しかし、これは例示に過ぎない。データコントローラは、加算（ＡＤＤ）又は減算（ＳＵＢ）を行う、条件付き加算・減算処理のような他の演算を実行してもよい。 Other Embodiments It should be noted that the present invention is not limited to the above-described embodiments, and can be modified as appropriate without departing from the spirit of the invention. For example, in the above-described embodiment, the operation in the data controller has been described as the conditional move operation. However, this is merely an example. The data controller may perform other operations such as conditional add/subtract operations that perform addition (ADD) or subtraction (SUB).

上述の実施の形態では、比較器１１及び１２のそれぞれは、同じ閾値を用いて入力データベクトルの値を比較している。しかしながら、データ処理の目的に対応するために、閾値は適宜変更してもよい。また、コンパレータは、各入力値が閾値以下であるか、各入力値が閾値と等しいか、各入力値が閾値と等しくないか、各入力値が閾値よりも大きいか、又は、各入力値が閾値以上であるか、を判定してもよい。 In the above embodiment, each of the comparators 11 and 12 uses the same threshold value to compare the values of the input data vector. However, the threshold value may be appropriately changed to correspond to the purpose of data processing. In addition, the comparator, each input value is less than or equal to the threshold value, each input value is equal to the threshold value, each input value is not equal to the threshold value, each input value is greater than the threshold value, or each input value is You may judge whether it is more than a threshold value.

１結合演算部
２データコントローラ
３パイプラインレジスタ部
１１〜１４比較器
１００、３００データ処理装置
１０１中央演算部（ＣＰＵ）
１０２メモリ
１０３プロセッサ
１０４バス
１０００データ処理システム 1 Combined Operation Unit 2 Data Controller 3 Pipeline Register Units 11-14 Comparators 100, 300 Data Processing Device 101 Central Processing Unit (CPU)
102 memory 103 processor 104 bus 1000 data processing system

Claims

A plurality of comparators, each of which receives a plurality of input data vectors and outputs a comparison result vector;
A combining operation unit that combines the comparison result vectors output from the comparator and outputs a binary output vector,
A data controller that controls data processing based on the binary output vector,
Data processing device.

Each comparator compares two input vectors with each other,
The value contained in one input vector is compared with the corresponding value contained in the other input vector,
The comparison result of each comparator includes a value indicating a comparison result of two values,
The data processing device according to claim 1.

The join operation unit performs an OR operation on the comparison result vector,
If the value at the same position in the comparison result vector indicates that the comparison result is true, the corresponding bit in the binary output vector is "1",
If the value at the same position of the comparison result vector indicates that at least one of the comparison results is false, the corresponding bit of the binary output vector is "0".
The data processing device according to claim 1.

The data controller receives a first input data vector and a second input data vector and outputs an output data vector,
If the bit of the binary output vector is “1”, the value of the corresponding position in the first input data vector is passed to the corresponding position in the output data vector,
If the bit of the binary output vector is "0", the value of the corresponding position in the second input data vector is passed to the corresponding position in the output data vector,
The data processing device according to claim 3.

The data controller receives the first input data and the second input data vector, which are scalar values, and outputs an output data vector,
If the bit of the binary output vector is "1", the scalar value of the first input data is passed to the corresponding position in the output data vector,
If the bit of the binary output vector is "0", the value of the corresponding position in the second input data vector is passed to the corresponding position in the output data vector,
The data processing device according to claim 3.

Further comprising a register unit for continuously storing the value at the same position of the comparison result vector and continuously outputting the stored value output from the comparator in the same cycle,
The data processing device according to claim 2.

Memory part,
A bus
The data processing device according to claim 1,
Data processing system.

Receives multiple input data vectors and outputs a comparison result vector,
Combining the comparison result vectors output from the comparator to output a binary output vector,
Controlling data processing based on the binary output vector,
Data processing method.

A process of receiving a plurality of input data vectors and outputting a comparison result vector,
A process of combining the comparison result vectors output from the comparator and outputting a binary output vector;
Causing a computer to perform a process of controlling data processing based on the binary output vector,
Data processing program.