JP6390388B2

JP6390388B2 - Data approximation apparatus and data approximation method

Info

Publication number: JP6390388B2
Application number: JP2014243104A
Authority: JP
Inventors: 亮根山
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2014-12-01
Filing date: 2014-12-01
Publication date: 2018-09-19
Anticipated expiration: 2034-12-01
Also published as: JP2016105240A

Description

本発明は、時系列形式のデータを近似する装置に関する。 The present invention relates to an apparatus for approximating time-series data.

近年、自動車には様々なセンサが搭載され、速度、前後加速度、操舵角、ヨーレート、左右加速度など、車両の走行に関する様々な情報を取得することができるようになってきている。また、これらの情報をビッグデータとして収集し、様々な分野の研究に役立てることが期待されている。 2. Description of the Related Art In recent years, various sensors are mounted on automobiles, and various information relating to vehicle travel such as speed, longitudinal acceleration, steering angle, yaw rate, and lateral acceleration can be acquired. In addition, it is expected that this information will be collected as big data and used for research in various fields.

車両から取得したデータを収集し、解析することを考えた場合、データ量の圧縮が課題となる。走行中の車両から発生する全てのデータを無加工でそのまま保持することは、容量的に現実的ではないためである。 When considering collecting and analyzing data acquired from a vehicle, compression of the data amount becomes an issue. This is because it is not realistic in terms of capacity to hold all data generated from a running vehicle as it is without being processed.

この問題に関連する発明として、例えば、特許文献１に記載の関数近似装置がある。当該装置は、入力データを単位時間に分割してクラスタリングを行い、作成されたクラスタごとに近似式を求めるという特徴を有している。入力データ（車両に適用した場合、車両から取得したセンサデータ）を、類似したもの同士に分類し、近似式に変換することで、生のデータを保持する場合と比べて、容量を大きく圧縮することができる。 As an invention related to this problem, for example, there is a function approximation device described in Patent Document 1. The apparatus has a feature of performing clustering by dividing input data into unit times and obtaining an approximate expression for each created cluster. The input data (sensor data acquired from the vehicle when applied to a vehicle) is classified into similar data and converted into an approximate expression, so that the capacity is greatly compressed compared to the case of holding raw data. be able to.

特開平７−０６５１６８号公報JP-A-7-065168

一方、特許文献１に記載されたような方法によって求めた近似式は、あくまで値の変化を近似するものであるため、時間の経過とともに元データに対する誤差が大きくなるという課題がある。誤差を小さくするためには、入力データを分割する単位を小さくすればよいが、単位を小さくしすぎると、クラスタごとの特徴が失われてしまい、データ圧縮効果が薄れてしまう。 On the other hand, since the approximate expression obtained by the method described in Patent Document 1 approximates a change in value, there is a problem that an error with respect to the original data increases with time. In order to reduce the error, the unit for dividing the input data may be reduced. However, if the unit is made too small, the feature for each cluster is lost and the data compression effect is diminished.

本発明は上記の課題を考慮してなされたものであり、時系列データを近似するデータ近似装置であって、対象の時系列データの特性を精度よく表した情報を生成するデータ近似装置を提供することを目的とする。 The present invention has been made in consideration of the above-described problems, and provides a data approximation device that approximates time-series data, and that generates information that accurately represents the characteristics of target time-series data. The purpose is to do.

上記課題を解決するため、本発明に係るデータ近似装置は、
複数の値からなる時系列データが含まれるクラスタを近似するデータとして、当該時系列データを表す線形動的システムの係数と、１つ以上の時点における基準値とを出力するデータ近似装置であって、クラスタに含まれる１つ以上の時系列データを取得するデータ取得手段と、前記時系列データを近似する線形動的システムの係数を算出する第一の算出手段と、前記線形動的システムを用いて前記時系列データを近似する際に用いられる、時刻と値とからなる１つ以上の基準値を決定する第二の算出手段と、前記線形動的システムと、前記１つ以上の基準値とを関連付けて、対象の時系列データを表現するパラメータとして出力する出力手段と、を有することを特徴とする。 In order to solve the above-described problem, a data approximation device according to the present invention includes:
A data approximation device that outputs a coefficient of a linear dynamic system representing time series data and a reference value at one or more time points as data approximating a cluster including time series data composed of a plurality of values. Using data acquisition means for acquiring one or more time series data included in a cluster, first calculation means for calculating a coefficient of a linear dynamic system approximating the time series data, and using the linear dynamic system Second calculating means for determining one or more reference values composed of time and value, used when approximating the time series data, the linear dynamic system, and the one or more reference values And output means for outputting as a parameter expressing the target time-series data.

本発明に係るデータ近似装置は、クラスタに含まれる時系列データを近似するための情
報を出力する装置である。
近似対象となるクラスタは、時系列で発生するデータ（時系列データ）を一つ以上含むものであれば、どのようなものであってもよい。例えば、車両に搭載されたセンサが周期的に出力する値からなる時系列データの集合などであってもよい。
なお、対象のクラスタに時系列データが複数含まれる場合、いずれかの時系列データを近似してもよいし、当該複数の時系列データを代表する時系列データ（例えば窓内の相対時刻ごとの平均値や中央値の集合）を近似してもよい。当該複数の時系列データを代表する時系列データも、当該クラスタに含まれる時系列データとして扱うことができる。 The data approximation apparatus according to the present invention is an apparatus that outputs information for approximating time series data included in a cluster.
The cluster to be approximated may be any cluster as long as it includes one or more data generated in time series (time series data). For example, it may be a set of time series data composed of values periodically output by sensors mounted on the vehicle.
When the target cluster includes a plurality of time series data, any time series data may be approximated, or time series data representing the plurality of time series data (for example, for each relative time in the window) An average value or a set of median values) may be approximated. Time series data representing the plurality of time series data can also be handled as time series data included in the cluster.

第一の算出手段は、対象のクラスタを近似、すなわち、当該クラスタに含まれるいずれかの時系列データを近似するための線形動的システムを算出する手段である。線形動的システムは、時間の経過と共に状態が変化するシステムであって、ある時刻における状態を用いて、他の時刻における状態を表すことができるシステムである。
第二の算出手段は、線形動的システムを用いて、クラスタに含まれる時系列データを近似するための基準値を決定する手段である。第一の算出手段が算出した線形動的システムは、隣接するタイムステップからの変化量を数式によって表すものである。よって、時系列データを近似するためには、基準となる値を与える必要がある。第二の算出手段が決定した基準値を、第一の算出手段が算出した線形動的システムに適用することで、時系列データの近似が可能になる。 The first calculation means is a means for calculating a linear dynamic system for approximating a target cluster, that is, approximating any time series data included in the cluster. A linear dynamic system is a system in which a state changes with the passage of time, and a state at another time can be expressed using a state at a certain time.
The second calculation means is means for determining a reference value for approximating time series data included in a cluster using a linear dynamic system. The linear dynamic system calculated by the first calculation means represents the amount of change from an adjacent time step by a mathematical expression. Therefore, in order to approximate the time series data, it is necessary to give a reference value. By applying the reference value determined by the second calculation means to the linear dynamic system calculated by the first calculation means, time series data can be approximated.

本発明に係るデータ近似装置は、時系列データに対応する線形動的システムに、当該時系列データに対応する１つ以上の基準値を関連付けて出力する。これにより、単一の近似式を用いて元の時系列データを近似する場合と比較して、近似の精度をより向上させることができる。 The data approximation apparatus according to the present invention outputs one or more reference values corresponding to the time series data in association with the linear dynamic system corresponding to the time series data. Thereby, compared with the case where the original time series data is approximated using a single approximate expression, the accuracy of approximation can be further improved.

また、前記第二の算出手段は、仮に決定した１つ以上の基準値を始点として対象の線形動的システムを適用し、前記時系列データを近似する近似データを生成する手段と、前記近似データの、前記時系列データに対する近似誤差を取得し、当該近似誤差に基づいて、前記仮に決定した基準値を、出力する基準値として決定する手段と、からなることを特徴としてもよい。 Further, the second calculation means applies a target linear dynamic system with one or more temporarily determined reference values as starting points, and generates approximate data that approximates the time series data; and the approximate data And a means for obtaining an approximation error for the time series data and determining the tentatively determined reference value as an output reference value based on the approximation error.

線形動的システムを用いて時系列データを近似する場合、基準値の取り方によって近似誤差が大きく変化する。そこで、基準値を仮に置いて近似データを生成し、得られた近似誤差に基づいて、出力する基準値を決定する。これにより、より精度のよい近似が可能となる。 When approximating time-series data using a linear dynamic system, the approximation error varies greatly depending on how the reference value is taken. Therefore, approximate data is generated by temporarily setting a reference value, and a reference value to be output is determined based on the obtained approximation error. Thereby, a more accurate approximation is possible.

また、前記時系列データは、ｎ個のタイムステップからなり、前記第二の算出手段は、前記時系列データが有する前記ｎ個のタイムステップの中から、時刻と値の組み合わせを仮にｍ個選択し、前記近似誤差が最小となる組み合わせを探索することで、出力する基準値を決定し、かつ、前記ｍは、前記近似誤差が所定の閾値を下回る最小値であることを特徴としてもよい。 The time series data includes n time steps, and the second calculation unit temporarily selects m time / value combinations from the n time steps of the time series data. Then, a reference value to be output is determined by searching for a combination that minimizes the approximation error, and m is a minimum value that causes the approximation error to fall below a predetermined threshold value.

基準値は、その取り方によって近似の精度が大きく変動する。そこで、ｍ個の基準値が取る組み合わせの候補を生成したうえで、実際に近似を行った際の誤差が最小になるような組み合わせを探索する。これにより、近似精度を最大限に高めることができる。
また、ｍの数は、実際に近似を行った際の誤差が所定の閾値を下回る最少の数とする。これにより、近似精度とデータ量とのバランスを取ることができる。 The accuracy of approximation of the reference value varies greatly depending on how it is determined. Therefore, after generating candidates for combinations that m reference values take, a combination that minimizes the error when actually approximating is searched. As a result, the approximation accuracy can be maximized.
Further, the number of m is the smallest number in which the error when actually approximating is below a predetermined threshold. Thereby, it is possible to balance the approximation accuracy and the data amount.

また、前記近似誤差は、前記時系列データが有する値と、前記近似データが有する値との差を、１つ以上の時点において取得した場合における、当該差の絶対値の、合計、平均
、二乗和のうちのいずれかであることを特徴としてもよい。 Further, the approximation error is the sum, average, square of absolute values of the difference when the difference between the value of the time series data and the value of the approximation data is acquired at one or more time points. It may be one of the sums.

近似誤差は、ある時刻における近似値（すなわち、基準値に線形動的システムを適用した結果得られる近似値）と、当該時点における前記時系列データが有する値との誤差（差の絶対値）の合計、平均、二乗和などによって得ることができる。 The approximate error is an error (absolute value of difference) between an approximate value at a certain time (that is, an approximate value obtained as a result of applying a linear dynamic system to a reference value) and a value of the time-series data at the time. It can be obtained by sum, average, sum of squares, and the like.

また、前記時系列データを構成する複数の値は、それぞれ複数次元のベクトルであることを特徴としてもよい。例えば、時系列データが、車両から取得したセンサデータである場合、速度、前後加速度、操舵角、ヨーレート、左右加速度などを含む、複数次元のベクトルとすることができる。 The plurality of values constituting the time series data may each be a multi-dimensional vector. For example, when the time-series data is sensor data acquired from a vehicle, it can be a multi-dimensional vector including speed, longitudinal acceleration, steering angle, yaw rate, lateral acceleration, and the like.

また、本発明に係るデータ近似装置は、前記データ取得手段が取得した複数の時系列データをクラスタリングするクラスタリング手段をさらに有し、前記第一および第二の算出手段は、同じクラスタに分類された複数の時系列データに対して処理を行い、前記線形動的システムおよび前記基準値は、それぞれ対応するクラスタに関連付けられることを特徴としてもよい。 The data approximation device according to the present invention further includes a clustering unit that clusters a plurality of time-series data acquired by the data acquisition unit, and the first and second calculation units are classified into the same cluster. Processing may be performed on a plurality of time-series data, and the linear dynamic system and the reference value may be associated with corresponding clusters.

このように、クラスタに含まれる時系列データは、クラスタリングによって分類された、互いに類似する時系列データであることが好ましい。 As described above, the time series data included in the cluster is preferably time series data similar to each other classified by clustering.

また、本発明に係るデータ近似装置は、クラスタリングによって生成された複数のクラスタのうち、類似するクラスタ同士を結合する結合手段をさらに有することを特徴としてもよい。 The data approximation apparatus according to the present invention may further include a combining unit that combines similar clusters among a plurality of clusters generated by clustering.

類似するクラスタ同士を結合することで、生成されるデータの量をさらに圧縮することができる。例えば、階層的手法によってクラスタを生成した場合、下位層のクラスタ同士を、類似するクラスタとして結合することができる。また、分割最適化手法によってクラスタを生成した場合、クラスタ同士の距離などによって類似度を判定したうえで、結合することができる。 By joining similar clusters together, the amount of data generated can be further compressed. For example, when clusters are generated by a hierarchical method, lower-layer clusters can be combined as similar clusters. Further, when clusters are generated by the division optimization method, they can be combined after determining the similarity based on the distance between the clusters.

また、前記結合手段がクラスタを結合した際に、前記第一の算出手段は、結合後のクラスタに属する時系列データを用いて線形動的システムを再算出し、前記出力手段は、再算出された線形動的システムと、結合前の各クラスタにそれぞれ対応する基準値とを関連付けて出力することを特徴としてもよい。 When the combining unit combines clusters, the first calculating unit recalculates the linear dynamic system using time series data belonging to the cluster after combining, and the output unit recalculates. The linear dynamic system may be output in association with the reference value corresponding to each cluster before combination.

クラスタが結合された場合、各クラスタに対応付いた線形動的システムも共通化することができる。しかし、このような手法を用いると、近似精度の低下が発生してしまう。そこで、本発明では、クラスタを結合した際に、線形動的システムのみ共通化し、各クラスタに属する基準値はそのまま各クラスタに関連付けたままで保持する。このようにすることで、クラスタを結合することによる近似精度の低下を最小限に抑えることができる。 When clusters are combined, a linear dynamic system associated with each cluster can be shared. However, when such a method is used, the approximation accuracy is degraded. Therefore, in the present invention, when the clusters are combined, only the linear dynamic system is shared, and the reference value belonging to each cluster is held as it is associated with each cluster. By doing in this way, the fall of the approximation precision by combining a cluster can be suppressed to the minimum.

また、前記第一の算出手段は、対象の時系列データが有する値と、線形動的システムによって表された近似値との誤差の二乗和が最小となるように、前記線形動的システムの係数を決定することを特徴としてもよい。 In addition, the first calculation means may calculate the coefficient of the linear dynamic system so that the sum of squares of errors between the value of the target time series data and the approximate value represented by the linear dynamic system is minimized. May be determined.

線形動的システムは、最小二乗法によって好適に求めることができる。すなわち、近似対象の時系列データが有する値と、近似された値との誤差の二乗和が最小となるようなパラメータを求めればよい。 The linear dynamic system can be suitably obtained by the least square method. That is, it is only necessary to obtain a parameter that minimizes the sum of squares of errors between the value of the time series data to be approximated and the approximated value.

また、前記時系列データは、車両の速度、前後加速度、操舵角、ヨーレート、左右加速
度のうち少なくとも一つ以上であることを特徴としてもよい。 The time series data may be at least one of a vehicle speed, longitudinal acceleration, steering angle, yaw rate, and lateral acceleration.

本発明に係るデータ近似装置は、車両から発生したセンサデータを近似する装置に好適に適用することができる。 The data approximation apparatus according to the present invention can be suitably applied to an apparatus that approximates sensor data generated from a vehicle.

なお、本発明は、上記手段の少なくとも一部を含むデータ近似装置として特定することができる。また、前記データ近似装置が行うデータ近似方法として特定することもできる。また、前記データ近似装置と、センサ情報を当該データ近似装置に送信する車両からなる車両挙動分類システムとして特定することもできる。上記処理や手段は、技術的な矛盾が生じない限りにおいて、自由に組み合わせて実施することができる。 In addition, this invention can be specified as a data approximation apparatus containing at least one part of the said means. It can also be specified as a data approximation method performed by the data approximation device. Further, it can be specified as a vehicle behavior classification system including the data approximation device and a vehicle that transmits sensor information to the data approximation device. The above processes and means can be freely combined and implemented as long as no technical contradiction occurs.

本発明によれば、時系列データを近似するデータ近似装置であって、対象の時系列データの特性を精度よく表した情報を生成するデータ近似装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, it is a data approximation apparatus which approximates time series data, Comprising: The data approximation apparatus which produces | generates the information which represented the characteristic of the target time series data accurately can be provided.

第一の実施形態に係る車両挙動分類システムのシステム構成図である。1 is a system configuration diagram of a vehicle behavior classification system according to a first embodiment. センサ情報取得部が取得する時系列データを説明する図である。It is a figure explaining the time series data which a sensor information acquisition part acquires. 車両挙動記号の生成を説明する図である。It is a figure explaining the production | generation of a vehicle behavior symbol. クラスタに含まれる時系列データを説明する図である。It is a figure explaining the time series data contained in a cluster. 近似誤差を説明する図である。It is a figure explaining an approximation error. 第一の実施形態における、第二の処理のフローチャート図である。It is a flowchart figure of the 2nd process in 1st embodiment. 第二の実施形態に係る分類装置のシステム構成図である。It is a system configuration figure of the classification device concerning a second embodiment. 第二の実施形態における、第三の処理のフローチャート図である。It is a flowchart figure of the 3rd process in 2nd embodiment. 第二の実施形態において、クラスタを結合する例である。It is an example which joins a cluster in 2nd embodiment.

（第一の実施形態）
<システム構成>
以下、本発明の好ましい実施形態について図面を参照しながら説明する。
第一の実施形態に係る車両挙動分類システムは、車両に搭載された車載装置１０と、分類装置２０からなるシステムであって、車載装置から送信された情報に基づいて、車両の挙動を分類し、蓄積するシステムである。 (First embodiment)
<System configuration>
Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.
The vehicle behavior classification system according to the first embodiment is a system including an in-vehicle device 10 mounted on a vehicle and a classification device 20, and classifies the behavior of the vehicle based on information transmitted from the in-vehicle device. , A system to accumulate.

図１は、本実施形態に係る車載装置１０および分類装置２０のシステム構成図である。
まず、車載装置１０について説明する。車載装置１０は、自装置が搭載された車両の挙動についての情報を分類装置２０に送信する装置である。車載装置１０は、センサ情報取得部１１と、通信部１２から構成される。 FIG. 1 is a system configuration diagram of the in-vehicle device 10 and the classification device 20 according to the present embodiment.
First, the in-vehicle device 10 will be described. The in-vehicle device 10 is a device that transmits information about the behavior of the vehicle on which the in-vehicle device is mounted to the classification device 20. The in-vehicle device 10 includes a sensor information acquisition unit 11 and a communication unit 12.

センサ情報取得部１１は、車両に搭載された複数のセンサから値（以下、センサ値）を取得する手段である。車両に搭載されたセンサとは、当該車両の挙動を表す値を取得するセンサであり、例えば、速度センサ、加速度センサ、ヨーレートセンサ、操舵角センサなどであるが、これに限定されない。センサ値は、所定の時間ごとに、後述の通信部１２を介して、時系列のデータとして分類装置２０に送信される。以降、センサ情報取得部１１から送信されるデータをセンサ情報と称し、特に、時系列順に発生した複数のセンサ値からなるデータを時系列データと称する。 The sensor information acquisition unit 11 is a means for acquiring values (hereinafter referred to as sensor values) from a plurality of sensors mounted on the vehicle. The sensor mounted on the vehicle is a sensor that acquires a value representing the behavior of the vehicle, and includes, for example, a speed sensor, an acceleration sensor, a yaw rate sensor, and a steering angle sensor, but is not limited thereto. The sensor value is transmitted to the classification device 20 as time-series data via the communication unit 12 described later at every predetermined time. Hereinafter, data transmitted from the sensor information acquisition unit 11 is referred to as sensor information, and in particular, data including a plurality of sensor values generated in time series is referred to as time series data.

通信部１２は、センサ情報取得部１１によって取得されたセンサ情報を、分類装置２０に送信する手段である。無線通信によって情報を送受信することができれば、使用するプロトコルおよび通信方法は特に限定されない。 The communication unit 12 is a means for transmitting the sensor information acquired by the sensor information acquisition unit 11 to the classification device 20. A protocol and a communication method to be used are not particularly limited as long as information can be transmitted and received by wireless communication.

次に、分類装置２０について説明する。分類装置２０は、車載装置１０から送信されたデータを受信し、蓄積する装置である。また、データを蓄積する際には、本明細書に開示する方法によって、データの圧縮を行う。
分類装置２０は、通信部２１、クラスタ生成部２２、近似式生成部２３、記憶部２４から構成される。 Next, the classification device 20 will be described. The classification device 20 is a device that receives and accumulates data transmitted from the in-vehicle device 10. Further, when storing data, the data is compressed by the method disclosed in this specification.
The classification device 20 includes a communication unit 21, a cluster generation unit 22, an approximate expression generation unit 23, and a storage unit 24.

通信部２１は、車載装置１０から送信されたセンサ情報を受信する手段である。使用するプロトコルおよび通信方法は通信部１２と同様である。 The communication unit 21 is means for receiving sensor information transmitted from the in-vehicle device 10. The protocol and communication method used are the same as those of the communication unit 12.

クラスタ生成部２２は、車両から取得した時系列データを用いて、単位時間ごとの車両の挙動を表す記号を生成する手段である。車両の挙動を記号化したものを、車両挙動記号と称する。本実施形態では、車両から取得し、蓄積された時系列データを、単位時間ごとに分割してクラスタリングを行うことで、車両挙動記号を生成する。クラスタリングの結果は、後述する記憶部２４に記憶される。なお、クラスタリングの具体的な例については後述する。 The cluster generation unit 22 is means for generating a symbol representing the behavior of the vehicle for each unit time using time series data acquired from the vehicle. A symbol representing the behavior of the vehicle is referred to as a vehicle behavior symbol. In the present embodiment, the vehicle behavior symbol is generated by performing clustering by dividing the time-series data acquired from the vehicle and accumulated for each unit time. The clustering result is stored in the storage unit 24 described later. A specific example of clustering will be described later.

近似式生成部２３は、クラスタ生成部２２が生成したクラスタに対応する近似式を生成する手段である。具体的には、クラスタごとに、関連付いた時系列データを近似する線形動的システムを生成する。生成された線形動的システムは、クラスタと関連付けられて後述する記憶部２４に記憶される。なお、線形動的システムの詳細と、線形動的システムを生成する方法については後述する。 The approximate expression generation unit 23 is a means for generating an approximate expression corresponding to the cluster generated by the cluster generation unit 22. Specifically, for each cluster, a linear dynamic system that approximates related time series data is generated. The generated linear dynamic system is associated with the cluster and stored in the storage unit 24 described later. Details of the linear dynamic system and a method for generating the linear dynamic system will be described later.

記憶部２４は、取得した時系列データ、車両挙動記号（クラスタ）、線形動的システムなどが格納される不揮発性の記憶媒体である。記憶部２４には、高速に読み書きでき、かつ、大容量な記憶媒体を用いることが好ましい。例えば、フラッシュメモリなどを好適に用いることができる。 The storage unit 24 is a nonvolatile storage medium that stores acquired time-series data, vehicle behavior symbols (clusters), linear dynamic systems, and the like. The storage unit 24 is preferably a storage medium that can read and write at high speed and has a large capacity. For example, a flash memory can be suitably used.

以上に説明した各手段の制御は、制御プログラムをＣＰＵなどの処理装置（不図示）が実行することによって実現される。また、当該機能は、ＦＰＧＡ（Field-programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）などによって実現されてもよいし、これらの組み合わせによって実現されてもよい。 Control of each means demonstrated above is implement | achieved when processing apparatuses (not shown), such as CPU, run a control program. Further, the function may be realized by a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like, or may be realized by a combination thereof.

<処理の概要>
本実施形態に係る分類装置２０が行う処理は、主に、車載装置から時系列データを収集して蓄積する処理（第一の処理）と、蓄積された時系列データをクラスタリングし、生成したクラスタを数式によって近似する処理（第二の処理）の二つに分けられる。各処理の概要について説明する。 <Outline of processing>
The processing performed by the classification device 20 according to the present embodiment mainly includes processing for collecting and storing time series data from the in-vehicle device (first processing), clustering the accumulated time series data, and generating a cluster Can be divided into two processes (second process) that approximate by a mathematical expression. An overview of each process will be described.

まず、第一の処理について、車載装置１０から送信されるデータを説明する図である図２を参照しながら説明する。 First, the first process will be described with reference to FIG. 2 which is a diagram for explaining data transmitted from the in-vehicle device 10.

<<センサ情報の送信処理>>
センサ情報取得部１１は、車両が有する複数のセンサ（不図示）から、所定のサンプリングレート（例えば１０Ｈｚ）でセンサ値を取得する。なお、センサ値は、目標のサンプリングレートよりも高いサンプリングレートで取得したのち、フィルタによって平滑化したものであってもよい。例えば、１００Ｈｚでサンプリングを行ったのち、ガウシアンフィルタ等によって１０Ｈｚにダウンサンプルしてもよい。
なお、本例では、操舵角、速度、前後加速度の三つのセンサを用いる。すなわち、３つのセンサそれぞれについて、毎秒１０個のセンサ値が得られるため、１秒あたり３０個の
センサ値が時系列データとして分類装置２０に送信される（符号２０１）。なお、送信の周期は必ずしも固定でなくてもよく、送信の単位も必ずしも固定でなくてもよい。
分類装置２０は、通信部２１を介して車両から時系列データを受信すると、記憶部２４に逐次記憶する。本例ではセンサが３つであるため、時系列データは、３次元のベクトルとして記憶される。 << Sensor information transmission process >>
The sensor information acquisition unit 11 acquires sensor values at a predetermined sampling rate (for example, 10 Hz) from a plurality of sensors (not shown) included in the vehicle. The sensor value may be acquired at a sampling rate higher than the target sampling rate and then smoothed by a filter. For example, after sampling at 100 Hz, it may be downsampled to 10 Hz by a Gaussian filter or the like.
In this example, three sensors of steering angle, speed, and longitudinal acceleration are used. That is, since ten sensor values are obtained for each of the three sensors, 30 sensor values per second are transmitted to the classification device 20 as time-series data (reference numeral 201). The transmission cycle is not necessarily fixed, and the transmission unit is not necessarily fixed.
When receiving the time series data from the vehicle via the communication unit 21, the classification device 20 sequentially stores the data in the storage unit 24. In this example, since there are three sensors, time-series data is stored as a three-dimensional vector.

<<クラスタリング処理>>
次に、分類装置２０が行う第二の処理について説明する。第二の処理は、時系列データをクラスタリングする処理と、クラスタごとに線形動的システムを生成する処理に分けられる。まず、クラスタ生成部２２が行うクラスタリング処理について、図３を参照しながら説明する。 << Clustering process >>
Next, the second process performed by the classification device 20 will be described. The second processing is divided into processing for clustering time series data and processing for generating a linear dynamic system for each cluster. First, clustering processing performed by the cluster generation unit 22 will be described with reference to FIG.

車両挙動記号は、単位時間（例えば５秒間）における車両の挙動を表す記号である。車両挙動記号は、車両で生成された時系列データ（単位時間分の情報を持つ３次元のベクトル）を単位時間に分割し、分割されたデータをクラスタリングすることで得ることができる。この結果、符号３０１で示したような、車両挙動記号の列が得られる。符号３０１は、８０秒（５秒×１６個）分の車両挙動記号を表している。なお、クラスタリングは、例えば、Ｋ平均法（K-means）や、スペクトラルクラスタリングなど、任意の手法を用いる
ことができる。また、時系列データを入力として、分類結果を得ることができれば、他の手法を用いて分類を行ってもよい。また、分類とクラスタリングの組み合わせを用いてもよい。例えば、サポートベクタマシン（ＳＶＭ）で処理した余りをK-meansによって処理
するようにしてもよい。 The vehicle behavior symbol is a symbol representing the behavior of the vehicle in a unit time (for example, 5 seconds). The vehicle behavior symbol can be obtained by dividing time-series data generated by the vehicle (a three-dimensional vector having information for unit time) into unit times and clustering the divided data. As a result, a train of vehicle behavior symbols as indicated by reference numeral 301 is obtained. Reference numeral 301 represents a vehicle behavior symbol for 80 seconds (5 seconds × 16). For the clustering, for example, an arbitrary method such as K-means or spectral clustering can be used. Further, classification may be performed using other methods as long as the classification result can be obtained by using time-series data as input. A combination of classification and clustering may be used. For example, the remainder processed by the support vector machine (SVM) may be processed by K-means.

なお、図３では、一台の車両から取得した時系列データを示したが、クラスタリングは、複数台の車両から取得した時系列データを用いて行ってもよい。また、図３では、時系列データを５秒ごとに分割したが、単位時間は５秒以外であってもよい。また、分割後のデータは、一部が重なり合っていてもよい。
生成されたクラスタについての情報は、記憶部２４に一時的に記憶される。 Although FIG. 3 shows time-series data acquired from one vehicle, clustering may be performed using time-series data acquired from a plurality of vehicles. In FIG. 3, the time series data is divided every 5 seconds, but the unit time may be other than 5 seconds. In addition, the divided data may partially overlap.
Information about the generated cluster is temporarily stored in the storage unit 24.

<<線形動的システムの生成>>
次に、生成されたクラスタに対応する線形動的システムを生成する処理について説明する。図４は、クラスタ生成部２２によって生成されたあるクラスタに属する、複数の時系列データを表す図である。本例では、単位時間が５秒であり、センサ値のサンプリングレートが１０Ｈｚであるため、各時系列データには、それぞれ５０個のセンサ値が含まれる。
ここでは、符号４０１〜４０３によって表された３つの時系列データが、当該クラスタに分類されたものとする。なお、時系列データは、複数の次元を持つベクトルであるが、図４では、説明を簡単にするために一つの次元のみを示している。 << Generation of linear dynamic system >>
Next, processing for generating a linear dynamic system corresponding to the generated cluster will be described. FIG. 4 is a diagram illustrating a plurality of time-series data belonging to a certain cluster generated by the cluster generation unit 22. In this example, since the unit time is 5 seconds and the sensor value sampling rate is 10 Hz, each time-series data includes 50 sensor values.
Here, it is assumed that three time-series data represented by reference numerals 401 to 403 are classified into the cluster. Note that the time-series data is a vector having a plurality of dimensions, but FIG. 4 shows only one dimension for the sake of simplicity.

ここでは、これらの時系列データを近似するための線形動的システムを生成する。なお、一つのクラスタに複数の時系列データが割り当たっている場合、例えば、当該複数の時系列データを時刻順に連結して、近似対象の時系列データとしてもよい。また、当該複数の時系列データを代表する新たな時系列データを生成して、近似対象の時系列データとしてもよい。例えば、符号４０１〜４０３の３つの時系列データの平均値や中央値を時刻ごとに求め、新規の時系列データを生成してもよい。このようにして求めた時系列データも、対象のクラスタに属する時系列データとして扱うことができる。 Here, a linear dynamic system for approximating these time series data is generated. Note that when a plurality of time-series data are assigned to one cluster, for example, the plurality of time-series data may be connected in order of time to be approximated time-series data. Also, new time series data representing the plurality of time series data may be generated and used as approximation target time series data. For example, new time series data may be generated by obtaining an average value or median value of the three time series data 401 to 403 for each time. The time series data obtained in this way can also be handled as time series data belonging to the target cluster.

線形動的システムは、例えば、式（１）で表される。式（１）中、ｘ_tは時刻ｔにおけ
るシステムの状態（すなわち、センサ値の集合を表す多次元ベクトル）であり、ｘ_t-1は
時刻ｔ−１におけるシステムの状態である。また、Ａおよびｂは、時刻の経過に伴う状態
変化を表すパラメータである。また、ｖ_tはノイズに起因する誤差を表す項である。パラ
メータＡおよびｂは行列で表され、多次元ベクトルに対する演算を行うことができる。

A linear dynamic system is represented by Formula (1), for example. In Expression (1), x _t is a system state at time t (that is, a multidimensional vector representing a set of sensor values), and x _t−1 is a system state at time t−1. A and b are parameters representing state changes with the passage of time. Further, v _t is a term representing an error caused by noise. The parameters A and b are expressed as a matrix, and operations on multidimensional vectors can be performed.

ある時系列データを近似する線形動的システムを求める方法として、最小二乗法がある。
具体的には、対象の時系列データが持つ値と、近似値との差の二乗和が最小となるように、線形動的システムのパラメータＡおよびｂを決定する。 As a method for obtaining a linear dynamic system that approximates certain time series data, there is a least square method.
Specifically, the parameters A and b of the linear dynamic system are determined so that the sum of squares of the difference between the value of the target time-series data and the approximate value is minimized.

<<パラメータ決定方法>>
パラメータＡおよびｂを求める具体的な方法について説明する。
まず、対象の時系列データに基づいて、入力ベクトルおよび出力ベクトルをタイムステップ（センサ値の取得間隔。本実施形態では０．１秒）ごとに生成する。入力ベクトルとは、時刻ｔ−１におけるセンサ値を表すベクトルであり、出力ベクトルとは、時刻ｔにおけるセンサ値を表すベクトルである。式（２）が入力ベクトルであり、式（３）が出力ベクトルである。なお、ｎは次元数である。センサが３種類である場合、ｎ＝３となる。

<< Parameter determination method >>
A specific method for obtaining the parameters A and b will be described.
First, based on the target time-series data, an input vector and an output vector are generated every time step (sensor value acquisition interval, 0.1 seconds in this embodiment). The input vector is a vector that represents a sensor value at time t−1, and the output vector is a vector that represents a sensor value at time t. Expression (2) is an input vector, and Expression (3) is an output vector. Note that n is the number of dimensions. When there are three types of sensors, n = 3.

入力ベクトルおよび出力ベクトルは、２≦ｔ≦Ｔの範囲で生成される。Ｔは、対象のクラスタに含まれるタイムステップの数（すなわち、センサ値の数）である。 The input vector and the output vector are generated in the range of 2 ≦ t ≦ T. T is the number of time steps (that is, the number of sensor values) included in the target cluster.

次に、入力ベクトルおよび出力ベクトルから、平均を求める。平均値は、式（４）および式（５）によって求めることができる。

Next, an average is obtained from the input vector and the output vector. The average value can be obtained by Equation (4) and Equation (5).

そして、入力ベクトルおよび出力ベクトルから平均を減算することで、各ベクトルを正規化する。正規化した結果は、式（６）および式（７）で表される。

Then, each vector is normalized by subtracting the average from the input vector and the output vector. The normalized result is expressed by Expression (6) and Expression (7).

次に、式（８）で表せられる入力行列と、式（９）で表せられる出力行列を生成する。

Next, an input matrix represented by Expression (8) and an output matrix represented by Expression (9) are generated.

最終的に、式（１０）によってパラメータＡを、式（１１）によってパラメータｂを求める。なお、Ｘ^-1は、Ｘの逆行列を表す。

Finally, parameter A is obtained by equation (10), and parameter b is obtained by equation (11). X ⁻¹ represents an inverse matrix of X.

以上の説明によって生成した線形動的システムは、ある時点からのセンサ値の変化量を表したものである。すなわち、センサ値を近似するためには、基準となるセンサ値を最低ひとつ与えなければならない。そこで、本実施形態では、近似式生成部２３が、基準となるセンサ値を選択し、クラスタと関連付けて記憶する。以降の説明では、近似式生成部２３が選択した、基準となるセンサ値を基準値と称する。
基準値の具体的な選択方法は、後述するフローチャートにて説明する。 The linear dynamic system generated by the above description represents the amount of change in sensor value from a certain point in time. That is, in order to approximate the sensor value, at least one reference sensor value must be given. Therefore, in the present embodiment, the approximate expression generation unit 23 selects a reference sensor value and stores it in association with the cluster. In the following description, the reference sensor value selected by the approximate expression generation unit 23 is referred to as a reference value.
A specific method for selecting the reference value will be described with reference to a flowchart described later.

基準値をクラスタ（すなわち線形動的システム）と関連付けることの効果について、図５を参照して説明する。
符号５０１は、近似対象の時系列データであり、符号５０２は、線形動的システムによって近似した時系列データ（以下、近似データ）である。また、図中の黒丸は基準となるセンサ値、すなわち基準値である。
なお、実際の基準値の数は図示したものよりも多いが、説明を簡単にするため、ここでは基準値が一つおよび二つであるものとして説明する。 The effect of associating a reference value with a cluster (ie, a linear dynamic system) will be described with reference to FIG.
Reference numeral 501 is time series data to be approximated, and reference numeral 502 is time series data approximated by a linear dynamic system (hereinafter, approximate data). The black circles in the figure are reference sensor values, that is, reference values.
Although the actual number of reference values is larger than that shown in the figure, the description here assumes that the reference values are one and two for ease of explanation.

図５（Ａ）は、基準値が一つである場合の例である。本例の場合、図からもわかるように、時間の経過とともに誤差が累積するため、タイムスロットの後半になると近似精度が落ちてしまう。
図５（Ｂ）は、基準値が二つである場合の例である。本例の場合、二つ目の基準値によって近似値を補正することができる。このように、基準値が複数個ある場合、値を適宜補正することができるため、同一の線形動的システムを用いた場合であっても、近似精度を高めることができる。 FIG. 5A shows an example where there is one reference value. In the case of this example, as can be seen from the figure, the error accumulates with the passage of time, so that the approximation accuracy falls in the second half of the time slot.
FIG. 5B shows an example in which there are two reference values. In this example, the approximate value can be corrected by the second reference value. As described above, when there are a plurality of reference values, the values can be appropriately corrected. Therefore, even when the same linear dynamic system is used, the approximation accuracy can be improved.

このように、本実施形態に係る分類装置は、第一に、線形動的システムによって、クラスタに属する時系列データを近似し、第二に、基準値をクラスタと関連付け、クラスタの特徴を表す情報として記憶する。このようにすることで、クラスタの持つ特徴を小さいデータ量、かつ、高い精度で表せるようになる。 As described above, the classification device according to the present embodiment firstly approximates time-series data belonging to a cluster by a linear dynamic system, and secondly associates a reference value with the cluster, and represents the characteristics of the cluster. Remember as. By doing so, the features of the cluster can be expressed with a small amount of data and high accuracy.

<処理フローチャート>
次に、基準値の個数および配置位置を決定する方法について、第二の処理のフローチャートである図６を参照しながら説明する。なお、第一の処理は、時系列データを車載装置から周期的に受信して記憶部２４に蓄積する処理であるため、説明は省略する。
図６に示した処理は、記憶部２４に時系列データが蓄積されている状態において、任意のタイミングで実行される。なお、ステップＳ１１〜Ｓ１２が、クラスタ生成部２２が行
う処理であり、ステップＳ１３以降が、近似式生成部２３が行う処理である。 <Process flowchart>
Next, a method of determining the number of reference values and the arrangement position will be described with reference to FIG. 6 which is a flowchart of the second process. Note that the first process is a process of periodically receiving time-series data from the in-vehicle device and accumulating it in the storage unit 24, and thus the description thereof is omitted.
The process shown in FIG. 6 is executed at an arbitrary timing in a state where time-series data is accumulated in the storage unit 24. Note that steps S11 to S12 are processes performed by the cluster generation unit 22, and steps after step S13 are processes performed by the approximate expression generation unit 23.

まず、ステップＳ１１で、記憶部２４に記憶された全ての時系列データを取得する。
次に、ステップＳ１２で、取得した時系列データを単位時間に分割し、特徴量ベクトルに変換したうえでクラスタリングを行う。 First, in step S11, all time-series data stored in the storage unit 24 is acquired.
Next, in step S12, the acquired time series data is divided into unit times, converted into feature quantity vectors, and then clustered.

ステップＳ１３〜Ｓ１９の処理は、生成されたクラスタのそれぞれに対して、線形動的システムと、基準値を対応付ける処理である。
ステップＳ１３では、全てのクラスタに対して処理が完了しているかを判定し、完了していない場合、処理をステップＳ１４へ遷移させる。 The processes in steps S13 to S19 are processes for associating a linear dynamic system with a reference value for each of the generated clusters.
In step S13, it is determined whether or not processing has been completed for all clusters. If not, the processing is shifted to step S14.

ステップＳ１４では、処理対象のクラスタを決定したうえで、前述した方法で、当該クラスタに対応する線形動的システムを求める。
ステップＳ１５では、変数（Ｎおよびεの二種類）に初期値を設定する。なお、Ｎは、処理対象のクラスタに関連付ける基準値の個数を表す変数であり、初期値は１である。また、εは、誤差を表す変数であり、初期値は十分に大きな自然数である。 In step S14, after determining a cluster to be processed, a linear dynamic system corresponding to the cluster is obtained by the method described above.
In step S15, initial values are set for variables (two types, N and ε). N is a variable representing the number of reference values associated with the cluster to be processed, and the initial value is 1. Ε is a variable representing an error, and the initial value is a sufficiently large natural number.

ステップＳ１６では、誤差εが所定の閾値Ｅよりも小さいか否かを判定する。誤差が閾値Ｅよりも大きい場合、処理をステップＳ１７へ遷移させる。 In step S16, it is determined whether or not the error ε is smaller than a predetermined threshold value E. If the error is larger than the threshold value E, the process proceeds to step S17.

ステップＳ１７〜Ｓ１９は、基準値の個数がＮ個である場合における、基準値の最適な配置を探索するステップである。具体的には、対象のクラスタがｎ個のタイムステップを有していた場合、ｎ個ある時刻とセンサ値の組み合わせの中から、Ｎ個を抽出し、仮の基準値として線形動的システムを適用して近似データを生成し、近似対象の時系列データとの誤差を算出する。これを、基準値の配置を変えながら全パターン（すなわち、_nＣ_N個）について行い、誤差が最小となる組み合わせを探索する。 Steps S <b> 17 to S <b> 19 are steps for searching for an optimal arrangement of reference values when the number of reference values is N. Specifically, when the target cluster has n time steps, N is extracted from n combinations of time and sensor value, and a linear dynamic system is used as a temporary reference value. Apply to generate approximate data, and calculate the error from the time series data to be approximated. This, while changing the position of the reference value all patterns (i.e., _n C _N pieces) is performed for, searching for a combination of errors is minimized.

ステップＳ１７では、Ｎ個の基準値の組み合わせについて、全パターンの検討が完了したかを判定し、完了していない場合、処理をステップＳ１８へ遷移させる。
ステップＳ１８では、未検討の基準値の組み合わせを仮に選択し、ステップＳ１４で求めた線形動的システムを、仮に選択した基準値に適用して、近似データを得る。そして、当該近似データと、時系列データとの誤差の平均値ε_１を求める。ここで求めたε_１は、例えば、図５（Ｂ）における、実線と点線との乖離量の平均値である。ここで、ε_１がεよりも小さい場合、ステップＳ１９でεを更新し、処理をステップＳ１７へ遷移させる。
なお、誤差を求める際は、値の正規化を行ってもよい。また、誤差の平均ではなく、誤差の合計や二乗和などを用いてもよい。 In step S17, it is determined whether or not all patterns have been examined for the combination of N reference values. If not, the process proceeds to step S18.
In step S18, a combination of unexamined reference values is temporarily selected, and the linear dynamic system obtained in step S14 is applied to the temporarily selected reference value to obtain approximate data. Then, an average value ε ₁ of errors between the approximate data and the time series data is obtained. Ε ₁ obtained here is, for example, the average value of the deviation amount between the solid line and the dotted line in FIG. Here, if epsilon ₁ is smaller than epsilon, then updates the epsilon in step S19, shifts the process to step S17.
In addition, when calculating | requiring an error, you may normalize a value. Further, instead of the average of errors, a total of errors or a sum of squares may be used.

この処理を、Ｎ個の基準値がとりうる組み合わせの数だけ繰り返すと、Ｎ個の基準値を用いる場合における、誤差の平均が最も小さくなる基準値の組み合わせを特定することができる。 If this process is repeated for the number of combinations that can be taken by the N reference values, the combination of the reference values that minimizes the average error when N reference values are used can be specified.

Ｎ個の基準値がとりうる組み合わせが全て検討された場合（ステップＳ１７−Ｙｅｓ）、Ｎをインクリメントし、処理をステップＳ１６へ遷移させる。ここで、誤差εが所定の閾値Ｅよりも小さくなった場合、ステップＳ２０で、線形動的システムと、ステップＳ１８で仮に選択したＮ個の基準値を、処理対象のクラスタと関連付けて記憶部２４に記憶させる。この際、元の時系列データは削除する。 When all the combinations that can be taken by the N reference values are examined (step S17—Yes), N is incremented, and the process proceeds to step S16. Here, when the error ε is smaller than the predetermined threshold E, the storage unit 24 associates the linear dynamic system and the N reference values temporarily selected in step S18 with the cluster to be processed in step S20. Remember me. At this time, the original time series data is deleted.

以上に説明した処理は、クラスタの数だけ繰り返し実行される。最終的に、ステップＳ１３にて、全てのクラスタに対する処理が完了したと判断されると、処理は終了する。 The process described above is repeatedly executed for the number of clusters. Finally, when it is determined in step S13 that the processing for all clusters has been completed, the processing ends.

このように、本実施形態に係る分類装置は、元の時系列データを近似するための線形動的システムを求めたうえで、近似誤差が条件を満たすような基準値の組み合わせを探索し、当該基準値を関連付けて記憶する。これにより、生の時系列データをそのまま記憶する場合と比較して、データ量を大きく圧縮することができる。また、単一の近似式を記憶する場合と比較して、近似精度を向上させることができる。 As described above, the classification device according to the present embodiment obtains a linear dynamic system for approximating the original time-series data, searches for a combination of reference values that satisfy the approximation error, A reference value is associated and stored. Thereby, compared with the case where raw time-series data is stored as it is, the data amount can be greatly compressed. Also, the approximation accuracy can be improved as compared with the case of storing a single approximate expression.

なお、本実施形態では、生成した線形動的システムと基準値を記憶部に蓄積する例を挙げたが、線形動的システムと基準値は外部に出力してもよい。また、蓄積された情報に基づいて実際に値の近似を行い、結果を外部に出力するようにしてもよい。 In the present embodiment, the generated linear dynamic system and the reference value are stored in the storage unit. However, the linear dynamic system and the reference value may be output to the outside. Alternatively, the value may be approximated based on the accumulated information and the result may be output to the outside.

（第二の実施形態）
第二の実施形態は、生成したクラスタのうち、類似するクラスタ同士を結合することで、データの圧縮率をさらに向上させる実施形態である。
図７は、第二の実施形態に係る分類装置３０のシステム構成図である。第二の実施形態に係る分類装置３０は、クラスタ結合部３５を有しているという点において、第一の実施形態と相違する。その他の手段については第一の実施形態と同様であるため、説明は省略する。また、分類装置３０と通信を行う車載装置１０についても、第一の実施形態と同様であるため、説明は省略する。 (Second embodiment)
The second embodiment is an embodiment that further improves the data compression rate by combining similar clusters among the generated clusters.
FIG. 7 is a system configuration diagram of the classification device 30 according to the second embodiment. The classification device 30 according to the second embodiment is different from the first embodiment in that it includes a cluster coupling unit 35. Since other means are the same as those in the first embodiment, description thereof will be omitted. The in-vehicle device 10 that communicates with the classification device 30 is the same as that in the first embodiment, and thus the description thereof is omitted.

第二の実施形態では、クラスタ結合部３５が、第二の処理を終えた後に、生成済みのクラスタを結合することで更にデータ圧縮を行う処理（第三の処理）を実行する。
図８は、クラスタ結合部３５が行う第三の処理のフローチャートである。当該処理は、図６に示した処理が完了した後で実行される。 In the second embodiment, after completing the second process, the cluster combining unit 35 executes a process (third process) for further data compression by combining the generated clusters.
FIG. 8 is a flowchart of the third process performed by the cluster combining unit 35. This process is executed after the process shown in FIG. 6 is completed.

まず、ステップＳ３１で、分類された複数のクラスタのうち、最も類似するクラスタ同士を仮に結合し、各クラスタに含まれる時系列データを用いて、ステップＳ１４と同様の処理によって、線形動的システムを求め直す。例えば、結合前のクラスタに含まれる各時系列データを代表する新たな時系列データを生成して、近似対象の時系列データとし、線形動的システムを求める。
なお、最も類似するクラスタは、例えば、クラスタリングに階層的手法を用いている場合、階層に基づいて決定してもよいし、分割最適化手法を用いている場合、クラスタ間の距離に基づいて決定してもよい。また、その他の手法を用いて決定してもよい。また、類似度に閾値を設定し、ある程度類似したクラスタ同士に対してのみ処理を行ってもよい。 First, in step S31, the most similar clusters are temporarily combined among the plurality of classified clusters, and the linear dynamic system is processed by the same processing as in step S14 using the time-series data included in each cluster. Ask again. For example, new time series data representing each time series data included in the cluster before the combination is generated and used as the approximation target time series data, and a linear dynamic system is obtained.
Note that the most similar cluster may be determined based on the hierarchy, for example, when a hierarchical method is used for clustering, or determined based on the distance between the clusters, when a division optimization method is used. May be. Moreover, you may determine using another method. Alternatively, a threshold may be set for the degree of similarity, and processing may be performed only for clusters that are somewhat similar.

次に、ステップＳ３２で、結合対象の各クラスタに関連付いた基準値と、ステップＳ３１で求め直した線形動的システムとを用いて、ステップＳ１８と同様の処理によって、誤差をクラスタごとに求め、当該誤差の平均値ε_２を算出する。ここで算出した誤差が十分小さければ、線形動的システムを共通化しても近似誤差が大きくならないことがわかる。 Next, in step S32, using the reference value associated with each cluster to be combined and the linear dynamic system recalculated in step S31, an error is obtained for each cluster by the same process as in step S18. The average value ε _{2 of the} error is calculated. If the error calculated here is sufficiently small, it can be seen that the approximation error does not increase even if the linear dynamic system is shared.

次に、ステップＳ３３で、誤差平均ε_２が閾値Ｅを上回っているか否かを判定する。ここで、ε_２が閾値Ｅを上回っていなければ、処理をステップＳ３４へ遷移させ、対象のクラスタを正式に結合する。具体的には、ステップＳ３１で求めた線形動的システムを、結合後のクラスタに関連付けて記憶部２４に記憶させる。この際、結合前のクラスタについては、サブクラスタとして、結合後のクラスタに含ませる。そして、結合前のクラスタにそれぞれ関連付いていた基準値については、結合前のクラスタ（サブクラスタ）に関連付いた状態のままで、そのまま保持する。 Next, it is determined at step S33, whether the average error epsilon ₂ is above the threshold E. Here, if not exceed the epsilon ₂ threshold E, processing to transition to step S34, formally bind the target cluster. Specifically, the linear dynamic system obtained in step S31 is stored in the storage unit 24 in association with the combined cluster. At this time, the cluster before joining is included in the cluster after joining as a sub-cluster. The reference values associated with the clusters before joining are held as they are while being associated with the clusters (subclusters) before joining.

図９は、クラスタを結合した例である。図９（Ａ）が結合前の状態を表し、図９（Ｂ）が結合後の状態を表す。本例では、クラスタＡとクラスタＢを含むクラスタＣが生成され、クラスタＡおよびＢは、クラスタＣに含まれるサブクラスタとなる。 FIG. 9 shows an example in which clusters are combined. FIG. 9A shows a state before joining, and FIG. 9B shows a state after joining. In this example, cluster C including cluster A and cluster B is generated, and clusters A and B are sub-clusters included in cluster C.

ステップＳ３３で、誤差平均ε_２が閾値Ｅを上回っていた場合、対象のクラスタについては、結合が不可能であると判定し、仮に結合させたクラスタを元に戻す。すなわち、記憶部２４に記憶されている情報が維持される。
以上の処理は、全てのクラスタについて処理が完了するまで繰り返し実行される（ステップＳ３５）。 In step S33, if the average error epsilon ₂ was above the threshold E, the subjects of the cluster, coupling is determined to be impossible, undo cluster is tentatively bonded. That is, the information stored in the storage unit 24 is maintained.
The above processing is repeatedly executed until the processing is completed for all clusters (step S35).

以上説明したように、第二の実施形態では、誤差が許容値を満たす範囲で、類似するクラスタ同士を結合し、線形動的システムを共通化することでデータ量をさらに圧縮する。一方、各クラスタに関連付いた基準値については共通化を行わず、そのままで保持する。これにより、近似精度を大きく悪化させることなく、さらなるデータ量の圧縮を実現することができる。 As described above, in the second embodiment, the amount of data is further compressed by combining similar clusters and sharing a linear dynamic system within a range where the error satisfies an allowable value. On the other hand, the reference value related to each cluster is not shared but is held as it is. Thereby, further compression of the data amount can be realized without greatly deteriorating the approximation accuracy.

（変形例）
上記の実施形態はあくまでも一例であって、本発明はその要旨を逸脱しない範囲内で適宜変更して実施しうる。 (Modification)
The above embodiment is merely an example, and the present invention can be implemented with appropriate modifications within a range not departing from the gist thereof.

例えば、実施形態の説明では、線形動的システムとして、あるタイムステップにおける状態を、一つ前のタイムステップにおける状態を用いて表したもの（前方向近似）を例示したが、これ以外のものを用いてもよい。例えば、あるタイムステップにおける状態を、一つ後のタイムステップにおける状態を用いて表してもよい（後方向近似）。また、あるタイムステップにおける状態を、前後双方の隣接する各タイムステップにおける状態を用いて表してもよい。 For example, in the description of the embodiment, as the linear dynamic system, the state at a certain time step is represented by using the state at the previous time step (forward approximation). It may be used. For example, the state at a certain time step may be represented using the state at the next time step (rearward approximation). Further, the state at a certain time step may be expressed by using the state at each adjacent time step both before and after.

また、実施形態の説明では、車載装置１０から無線通信によって周期的にセンサ情報を送信する例を挙げたが、センサ情報は、どのような方法で送信されてもよい。例えば、トリップが完了するごとに送信するようにしてもよいし、所定のスケジュールに従って送信するようにしてもよい。また、センサ情報は、必ずしも無線で送信しなくてもよく、オフラインでやり取りするようにしてもよい。 In the description of the embodiment, an example in which sensor information is periodically transmitted from the in-vehicle device 10 by wireless communication has been described. However, the sensor information may be transmitted by any method. For example, it may be transmitted every time a trip is completed, or may be transmitted according to a predetermined schedule. The sensor information does not necessarily have to be transmitted wirelessly, but may be exchanged offline.

また、実施形態の説明では、センサによって取得できる情報として速度、前後加速度、操舵角、ヨーレート、左右加速度を例示したが、車両の走行状態を表すものであれば、例示したもの以外を用いてもよい。また、処理対象のデータは、時系列のデータであれば、センサによって取得されたものでなくてもよく、車両以外から取得したものであってもよい。 In the description of the embodiment, the speed, the longitudinal acceleration, the steering angle, the yaw rate, and the lateral acceleration are exemplified as information that can be acquired by the sensor. Good. In addition, the processing target data may not be acquired by the sensor as long as it is time-series data, and may be acquired from other than the vehicle.

また、実施形態の説明では、車両から取得したセンサ情報をクラスタリングする例を挙げたが、必ずしもクラスタリングを行う必要はなく、時系列データは、必ずしもセンサ情報である必要はない。１つ以上の任意の時系列データに対してそのまま処理を行っても、発明の目的を達成することができる。 In the description of the embodiment, an example in which sensor information acquired from a vehicle is clustered has been described. However, clustering is not necessarily performed, and time-series data is not necessarily sensor information. Even if one or more arbitrary time-series data is processed as it is, the object of the invention can be achieved.

１０車載装置
１１センサ情報取得部
１２通信部
２０分類装置
２１通信部
２２クラスタ生成部
２３近似式生成部
２４記憶部 DESCRIPTION OF SYMBOLS 10 In-vehicle apparatus 11 Sensor information acquisition part 12 Communication part 20 Classification apparatus 21 Communication part 22 Cluster generation part 23 Approximate expression generation part 24 Storage part

Claims

A data approximation device that outputs a coefficient of a linear dynamic system representing time series data and a reference value at one or more time points as data approximating a cluster including time series data composed of a plurality of values. ,
Data acquisition means for acquiring one or more time-series data included in the cluster;
First calculating means for calculating a coefficient of a linear dynamic system approximating the time series data;
Second calculating means for determining one or more reference values consisting of time and value, used when approximating the time series data using the linear dynamic system;
An output means for associating the linear dynamic system with the one or more reference values and outputting as a parameter expressing the target time-series data;
A data approximation device.

The second calculating means includes
Means for applying the target linear dynamic system starting from one or more temporarily determined reference values and generating approximate data approximating the time series data;
Means for obtaining an approximation error of the approximate data with respect to the time-series data, and determining the provisionally determined reference value as a reference value to be output based on the approximation error;
The data approximation apparatus according to claim 1.

The time series data consists of n time steps,
The second calculation means temporarily selects m combinations of time and value from the n time steps of the time series data, and searches for a combination that minimizes the approximation error. A reference value to be output is determined, and m is a minimum value at which the approximation error falls below a predetermined threshold;
The data approximation apparatus according to claim 2.

The approximate error is the sum, average, sum of squares of the absolute values of the differences when the difference between the value of the time series data and the value of the approximate data is acquired at one or more time points. One of them,
The data approximation apparatus according to claim 3.

The plurality of values constituting the time series data are each a multidimensional vector.
The data approximation apparatus according to claim 1.

Clustering means for clustering a plurality of time series data acquired by the data acquisition means,
The first and second calculation means perform processing on a plurality of time-series data classified into the same cluster,
The linear dynamic system and the reference value are each associated with a corresponding cluster;
The data approximation apparatus according to any one of claims 1 to 5.

Of the plurality of clusters generated by clustering, the apparatus further includes a combining unit that combines similar clusters.
The data approximation apparatus according to claim 6.

When the combining means combines clusters,
The first calculation means recalculates the linear dynamic system using time series data belonging to the combined cluster,
The output means associates and outputs the recalculated linear dynamic system and reference values corresponding to the respective clusters before combining.
The data approximation apparatus according to claim 7.

The first calculating means determines a coefficient of the linear dynamic system so that a sum of squares of errors between a value of the target time series data and an approximate value represented by the linear dynamic system is minimized. To
The data approximation apparatus according to claim 1.

The time series data is at least one of vehicle speed, longitudinal acceleration, steering angle, yaw rate, lateral acceleration,
The data approximation apparatus according to claim 1.

A data approximation device according to any one of claims 1 to 10,
A vehicle having a plurality of sensors and transmitting the acquired sensor information to the data approximation device as time-series data;
A data collection system consisting of

Data performed by a data approximation device that outputs a coefficient of a linear dynamic system representing the time series data and a reference value at one or more time points as data approximating a cluster including time series data composed of a plurality of values. An approximation method,
A data acquisition step of acquiring one or more time-series data included in the cluster;
A first calculation step of calculating a coefficient of a linear dynamic system approximating the time series data;
A second calculation step for determining one or more reference values consisting of time and value, used when approximating the time series data using the linear dynamic system;
An output step of associating the linear dynamic system with the one or more reference values and outputting as a parameter expressing time-series data of interest;
A data approximation method, including

A data approximation device that outputs a coefficient of a linear dynamic system representing time series data and a reference value at one or more time points as data approximating a cluster including time series data composed of a plurality of values. ,
Data acquisition means for acquiring one or more time-series data included in the cluster;
First calculating means for calculating a coefficient of a linear dynamic system approximating the time series data;
Second calculating means for determining one or more reference values consisting of time and value, used when approximating the time series data using the linear dynamic system;
An output means for associating the linear dynamic system with the one or more reference values and outputting as a parameter expressing the target time-series data;
Have
The second calculating means includes
Applying the target linear dynamic system starting from one or more reference values determined as a starting point, generating approximate data approximating the time series data, and then calculating an approximation error of the approximate data with respect to the time series data Means to obtain,
While increasing the number m of the reference values, m time / value combinations that minimize the approximation error are searched from n time steps of the time series data, and the approximation error is And a means for terminating the search process and outputting the m reference values when a predetermined threshold value is exceeded.
Data approximation device.