KR102302979B1

KR102302979B1 - Data processing devices and data processing methods

Info

Publication number: KR102302979B1
Application number: KR1020197026908A
Authority: KR
Inventors: 겐지 가와사키; 히데노리 야마모토; 유코 야마시타; 다케시 한다; 다카시 츠노
Original assignee: 가부시끼가이샤 히다치 세이사꾸쇼
Priority date: 2017-05-08
Filing date: 2018-04-04
Publication date: 2021-09-17
Also published as: KR20190117654A; WO2018207506A1; JP2018190195A; JP6833604B2

Abstract

[과제] 업무 시스템간에서의 데이터 부정합에 적의(適宜)하게 대응하여, 정밀도 양호한 데이터 분석을 가능하게 한다.
[해결 수단] 데이터 처리 장치(1040)에 있어서, 복수의 업무 시스템 각각의 업무 데이터(2120)를 저장한 기억 관리부(2050)와, 상기 복수의 업무 시스템에 걸쳐 일의(一意)로 인식될 수 있는 오브젝트를 상기 업무 데이터로부터 특정하는 제1 처리, 각 업무 시스템 사이에서의 당해 오브젝트에 관한 업무 데이터의 어긋남에 의거하여 당해 업무 데이터간의 특징을 나타내는 특징량 데이터를 계산하는 제2 처리, 당해 특징량 데이터에 의거하여 당해 업무 데이터에 관한 데이터 클렌징 처리의 내용을 특정하는 제3 처리, 및 당해 내용에서의 데이터 클렌징 처리를 행하는 제4 처리를 실행하는 주(主)제어부(2010)를 갖는 구성으로 한다.[Task] Properly respond to data inconsistency between business systems, enabling accurate data analysis.
[Solution means] In the data processing device 1040, the storage management unit 2050 storing the business data 2120 of each of the plurality of business systems, and the plurality of business systems can be recognized as one A first process for specifying an existing object from the business data, a second process for calculating feature data indicating characteristics between the business data based on a discrepancy in the business data related to the object between each business system, the feature amount It is set as the structure which has the main control part 2010 which performs the 3rd process which specifies the content of the data cleansing process concerning the said business data based on the data, and the 4th process which performs the data cleansing process in the said content. .

Description

Data processing devices and data processing methods

본 발명은 데이터 처리 장치 및 데이터 처리 방법에 관한 것이며, 구체적으로는, 업무 시스템간에서의 데이터 부정합에 적의(適宜)하게 대응하여, 정밀도 양호한 데이터 분석을 가능하게 하는 기술에 관한 것이다.TECHNICAL FIELD The present invention relates to a data processing apparatus and a data processing method, and more particularly, to a technique for enabling data analysis with high precision by appropriately responding to data inconsistency between business systems.

최근, 업무 시스템이 보유하는 데이터를 분석하고, 그 결과를 업무 비용 삭감이나 서비스 향상에 활용하는 활동이 진행되고 있다. 한편, 그러한 데이터 분석에 있어서는, 대량의 업무 데이터로부터 부정(不正)·불필요한 데이터를 제거하는 처리나, 대상 데이터를 분석에 적합한 형식으로 변환하는 처리 등, 소위 데이터 클렌징 처리가 필요해진다.Recently, an activity of analyzing data held by a business system and utilizing the results to reduce business costs or improve services is underway. On the other hand, in such data analysis, so-called data cleansing processing is required, such as a process for removing illegal and unnecessary data from a large amount of work data, and a process for converting target data into a format suitable for analysis.

상술한 데이터 클렌징 처리를 실현하는 종래 기술의 하나로서, 기계 학습을 이용하여 센서 이상(異常)값을 검지하는 방법이 있다(예를 들면 비특허문헌 1 참조). 본 방법에서는, 자동차의 센서 데이터에 있어서의 대표적인 4종류의 이상값을 대상으로 하고, 특징량으로서 「평균값」, 「표준편차」 등의 합계 12종류의 값을 산출하고, 이들에 관하여 기계 학습 기술을 적용함으로써, 이상값 패턴의 검출을 행할 수 있다.As one of the prior art which implement|achieves the data cleansing process mentioned above, there exists a method of detecting a sensor abnormal value using machine learning (refer nonpatent literature 1, for example). In this method, a total of 12 types of values, such as "average value" and "standard deviation", are calculated as characteristic quantities for four representative outlier values in sensor data of automobiles, and machine learning techniques are used for these values. By applying , it is possible to detect an outlier pattern.

다른 한편, 데이터 클렌징 처리의 관점에 있어서는, 복수의 업무 시스템의 데이터를 분석 대상으로 하기 때문인 과제가 존재한다. 그래서, 그러한 과제를 고려하여 데이터 처리를 행하는 종래 기술의 하나로서, 제1 업무를 실행하는 제1 업무 장치와, 제2 업무를 실행하는 제2 업무 장치와, 상기 제1 업무 장치와 상기 제2 업무 장치에 의한 업무 상황을 관리하는 사업 연휴(連携) 장치와, 상기 사업 연휴 장치를 통해 상기 제1 업무 장치 및 상기 제2 업무 장치에 대하여 서비스 제공을 요구하는 서비스 제공 장치를 갖는 데이터 연휴 시스템으로서, 상기 사업 연휴 장치는, 상기 제1 업무 장치와 상기 제2 업무 장치로부터 축차 통지되는 업무마다의 업무 상황을 등록한 업무 상황 관리표와, 상기 서비스 제공 장치로부터의 서비스 요구의 종별과 상기 제1 업무 장치와 상기 제2 업무 장치가 취할 수 있는 상기 업무 상황의 조합마다, 상기 서비스 요구에 따른 서비스가 제공 가능한지의 여부를 정의한 처리 대응표와, 상기 제1 업무 장치와 상기 제2 업무 장치 중 어느 것이 상기 서비스 요구에 따른 서비스를 제공하는지를 정의한 서비스 관리표와, 상기 제1 업무 장치와 상기 제2 업무 장치에 의한 업무가 취할 수 있는 업무 상황마다 당해 업무 상황에 의해 영향을 받는 서비스를 정의한 업무 프로세스 관리표와, 상기 서비스 제공 장치로부터 서비스 요구를 접수하는 수단과, 상기 업무 프로세스 관리표로부터, 상기 서비스 요구에 따른 서비스의 제공에 영향을 주는 업무를 특정하는 수단과, 상기 업무 상황 관리표로부터, 상기 서비스의 제공에 영향을 주는 업무의 현재의 업무 상황을 취득하는 수단과, 상기 처리 대응표에 의거하여, 상기 현재의 업무 상황에 있어서 상기 서비스 요구에 따른 서비스가 제공 가능한지의 여부를 판단하는 수단과, 상기 서비스의 제공이 가능하다고 판단했을 경우에는, 상기 서비스 관리표에 의거하여 상기 서비스의 제공을 행하는 업무 장치가 상기 제1 업무 장치와 상기 제2 업무 장치 중 어느 것인지 특정하는 수단과, 상기 서비스의 제공을 행하는 업무 장치에 대하여 상기 서비스의 실행 요구를 송신하는 수단과, 상기 서비스의 제공을 행하는 업무 장치로부터 상기 서비스의 실행 요구에 대한 응답을 접수하는 수단과, 상기 응답을 상기 서비스 제공 장치에 통지하는 수단을 갖는 것을 특징으로 하는 데이터 연휴 시스템(특허문헌 1 참조) 등이 제안되어 있다.On the other hand, from a viewpoint of a data cleansing process, the subject for making the data of a some work system into an analysis object exists. Therefore, as one of the prior art for data processing in consideration of such problems, a first business device for executing a first business, a second business device for executing a second business, the first business device and the second business device A data holiday system having a business holiday device for managing a business situation by a business device, and a service providing device for requesting service provision to the first business device and the second business device through the business holiday device , the business continuity device includes a business status management table in which business statuses for each job successively notified from the first business device and the second business device are registered, types of service requests from the service providing device, and the first business device and a processing correspondence table defining whether a service according to the service request can be provided for each combination of the business conditions that the second business device can take, and which of the first business device and the second business device is the service a service management table defining whether a service according to a request is provided; a business process management table defining a service affected by the business situation for each business situation that can be taken by the business by the first business device and the second business device; means for receiving a service request from a service providing device; means for specifying, from the business process management table, a task affecting the provision of a service according to the service request; means for acquiring the current business status of the given business; means for judging whether or not the service according to the service request can be provided in the current business status based on the processing correspondence table; and the provision of the service is possible means for specifying whether the business device that provides the service is the first business device or the second business device based on the service management table, and the business device that provides the service means for transmitting a request for execution of the service; , means for receiving a response to a request for execution of the service from a business device that provides the service, and means for notifying the service providing device of the response (refer to Patent Document 1) etc. have been proposed.

상술한 종래 기술에서는, 업무 상황의 조합마다, 서비스 요구에 따른 서비스가 제공 가능한지의 여부를 정의한 처리 대응표를 이용함으로써, 복수의 업무 시스템간의 업무 상황의 영향을 고려한 서비스를 제공할 수 있다.In the prior art described above, by using a processing correspondence table that defines whether or not a service according to a service request can be provided for each combination of business conditions, it is possible to provide a service in consideration of the influence of the business situation between a plurality of business systems.

일본국 특개2013-58116호 공보Japanese Patent Laid-Open No. 2013-58116

쿠리하라 케이스케, 네야마 료, 산노미야 치히로, 나와 카즈나리, 기계 학습에 의한 센서 이상값 검출, FIT2015(제14회 정보과학기술 포럼), 제2분책 p179-182(제180쪽) Keisuke Kurihara, Ryo Neyama, Chihiro Sannomiya, Kazunari Nawa, Sensor Outlier Detection by Machine Learning, FIT2015 (14th Information Science and Technology Forum), Volume 2 p179-182 (p. 180)

그러나 어느 종래 기술에 있어서도, 복수의 업무 시스템의 데이터를 분석 대상으로 할 때의, 업무 시스템간의 데이터 부정합을 검지하고, 이것에 적의하게 대처할 수는 없다.However, in any prior art, when data of a plurality of business systems are analyzed, data inconsistency between business systems cannot be detected and appropriately dealt with.

예를 들면, 철도회사의 선로의 보수 상황을 관리하는 보선(保線) 관리 시스템의 계측 데이터와, 가선(架線)의 보수 상황을 관리하는 가선 관리 시스템의 계측 데이터를 분석 대상 데이터로 하고, 선로 마모와 가선 마모와의 상관관계를 조사하는 분석을 행하는 것으로 한다.For example, the measurement data of the maintenance line management system that manages the maintenance status of the track of a railway company and the measurement data of the overhead wire management system that manages the maintenance status of the overhead wire are used as the analysis target data, and the track wear It is assumed that an analysis is conducted to investigate the correlation between and wire wear.

이 상황에 종래 기술을 적용해도, 같은 장소에 관한 상술한 각 계측 데이터간에서 킬로미터 거리의 불일치를 발생시키고 있으면, 이것을 검지할 수 없이, 그대로 다른 장소의 상관을 분석하게 되어 버린다. 즉, 본래는 같은 장소에 있어서의 선로 마모와 가선 마모와의 상관을 분석할 필요가 있음에도 불구하고, 그것을 실행할 수 없어, 분석 정밀도가 크게 저하해 버리는 것으로 이어진다.Even if the prior art is applied to this situation, if a discrepancy in kilometer distance is generated between the above-described measurement data for the same location, this cannot be detected and the correlation between different locations is analyzed as it is. That is, although it is originally necessary to analyze the correlation between the line wear and the wire wear in the same place, this cannot be performed, leading to a significant decrease in analysis accuracy.

본 발명은 상기 과제를 해결하기 위해 이루어진 것으로, 그 목적은, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여, 정밀도 양호한 데이터 분석을 가능하게 하는 기술을 제공하는 것에 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object thereof is to provide a technique that enables data analysis with high precision by appropriately responding to data inconsistency between business systems.

상기 과제를 해결하는 본 발명의 데이터 처리 장치는, 복수의 업무 시스템 각각의 업무 데이터를 저장한 기억 관리부와, 상기 복수의 업무 시스템에 걸쳐 일의(一意)로 인식될 수 있는 오브젝트를 상기 업무 데이터로부터 특정하는 제1 처리, 각 업무 시스템 사이에서의 당해 오브젝트에 관한 업무 데이터의 어긋남에 의거하여 당해 업무 데이터간의 특징을 나타내는 특징량 데이터를 계산하는 제2 처리, 당해 특징량 데이터에 의거하여 당해 업무 데이터에 관한 데이터 클렌징 처리의 내용을 특정하는 제3 처리, 및 당해 내용에서의 데이터 클렌징 처리를 행하는 제4 처리를 실행하는 주(主)제어부를 갖는 것을 특징으로 한다.The data processing apparatus of the present invention for solving the above problems includes a storage management unit storing business data of each of a plurality of business systems, and an object that can be uniquely recognized across the plurality of business systems as the business data. A first process specified from , a second process for calculating feature data indicating characteristics between the business data based on a shift in the business data related to the object between each business system, and the business based on the feature data It is characterized by having a main control unit that executes a third process for specifying the content of the data cleansing process related to the data, and a fourth process for performing the data cleansing process on the content.

또한, 본 발명의 데이터 처리 방법은, 복수의 업무 시스템 각각의 업무 데이터를 저장한 기억 관리부를 구비한 정보 처리 장치가, 상기 복수의 업무 시스템에 걸쳐 일의로 인식될 수 있는 오브젝트를 상기 업무 데이터로부터 특정하는 제1 처리와, 각 업무 시스템 사이에서의 당해 오브젝트에 관한 업무 데이터의 어긋남에 의거하여 당해 업무 데이터간의 특징을 나타내는 특징량 데이터를 계산하는 제2 처리와, 당해 특징량 데이터에 의거하여 당해 업무 데이터에 관한 데이터 클렌징 처리의 내용을 특정하는 제3 처리와, 당해 내용에서의 데이터 클렌징 처리를 행하는 제4 처리를 실행하는 것을 특징으로 한다.Further, in the data processing method of the present invention, an information processing device having a storage management unit storing business data of each of a plurality of business systems sets an object that can be uniquely recognized across the plurality of business systems as the business data. A first process specified from , and a second process for calculating feature data indicating characteristics between the work data based on a shift in the work data related to the object between each work system, and based on the feature data A third process for specifying the content of the data cleansing process relating to the business data and a fourth process for performing a data cleansing process on the content are performed.

본 발명에 따르면, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여, 정밀도 양호한 데이터 분석이 가능해진다.ADVANTAGE OF THE INVENTION According to this invention, data analysis with high precision becomes possible by responding appropriately to data inconsistency between business systems.

도 1은 제1 실시예에 있어서의 데이터 처리 장치를 포함하는 네트워크 구성예를 나타내는 도면.
도 2는 제1 실시예의 데이터 처리 장치의 하드웨어 구성예를 나타내는 도면.
도 3은 제1 실시예에 있어서의 데이터 처리 방법의 플로우예 1을 나타내는 도면.
도 4는 제1 실시예의 철도 오브젝트 정의 테이블의 구성예를 나타내는 도면.
도 5는 제1 실시예의 특징량 계산 프로그램이 처리하는 데이터예를 나타내는 도면.
도 6은 제1 실시예에 있어서의 데이터 처리 방법의 플로우예 2를 나타내는 도면.
도 7은 제1 실시예의 클렌징 처리 정의 테이블의 구성예를 나타내는 도면.
도 8은 제1 실시예의 클렌징 처리 결과 화면의 화면예를 나타내는 도면.
도 9는 제1 실시예의 클렌징 처리 결과 로그의 로그 데이터예를 나타내는 도면.
도 10은 제2 실시예에 있어서의 데이터 처리 방법의 플로우예를 나타내는 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram showing an example of a network configuration including a data processing apparatus according to a first embodiment;
Fig. 2 is a diagram showing an example of the hardware configuration of the data processing apparatus of the first embodiment;
Fig. 3 is a diagram showing flow example 1 of the data processing method in the first embodiment;
Fig. 4 is a diagram showing a configuration example of a railway object definition table in the first embodiment;
Fig. 5 is a diagram showing an example of data processed by the feature amount calculation program of the first embodiment;
Fig. 6 is a diagram showing flow example 2 of the data processing method in the first embodiment;
It is a figure which shows the structural example of the cleansing process definition table of 1st Example.
Fig. 8 is a diagram showing a screen example of a cleansing processing result screen according to the first embodiment;
Fig. 9 is a diagram showing an example of log data of a cleansing processing result log in the first embodiment;
Fig. 10 is a diagram showing an example of a flow of a data processing method in the second embodiment;

--- 제1 실시예 ------ First embodiment ---

이하, 본 발명의 일 실시형태를, 도면을 참조하면서 상세하게 설명한다. 도 1은, 본 발명의 제1 실시예에 있어서의 데이터 처리 장치(1040)를 포함하는 네트워크 구성예를 나타내는 도면이다.EMBODIMENT OF THE INVENTION Hereinafter, one Embodiment of this invention is described in detail, referring drawings. 1 is a diagram showing an example of a network configuration including a data processing apparatus 1040 according to a first embodiment of the present invention.

본 실시예에 있어서 데이터 처리 장치(1040)가 처리 대상으로 하는 업무 데이터는, 복수의 업무 시스템으로부터 얻어지는 것이 된다. 그러므로, 도 1에서 예시하는 네트워크에 있어서는, 데이터 처리 장치(1040)가, 네트워크(1030)를 통해, 업무 시스템 단말(1010) 및 업무 시스템 단말(1020)과 통신 가능하게 접속된 구성으로 되어 있다.In the present embodiment, the business data to be processed by the data processing device 1040 is obtained from a plurality of business systems. Therefore, in the network illustrated in FIG. 1 , the data processing device 1040 is configured to be communicatively connected to the business system terminal 1010 and the business system terminal 1020 via the network 1030 .

본 구성은, 예를 들면 업무 시스템 단말(1010, 1020), 데이터 처리 장치(1040), 및 데이터 분석 처리 단말(1060)을, 퍼스널 컴퓨터나 워크스테이션 등의 계산기로, 네트워크(1030, 1050)를 Ethernet으로 각각 구성함으로써 실현할 수 있다.In this configuration, for example, the business system terminals 1010 and 1020 , the data processing device 1040 , and the data analysis processing terminal 1060 are computers such as personal computers and workstations, and the networks 1030 and 1050 are configured. This can be realized by configuring each with Ethernet.

이들 업무 시스템 단말(1010, 1020)은, 각각의 업무 시스템에 있어서 발생한 업무 데이터를 축적하고 있는 단말이다. 당해 업무 시스템 단말(1010, 1020)들은, 각각 업무 시스템 자체여도 물론 좋다.These business system terminals 1010 and 1020 are terminals which accumulate|store the business data which generate|occur|produced in each business system. Of course, each of the business system terminals 1010 and 1020 may be the business system itself.

한편, 데이터 처리 장치(1040)는, 상술한 업무 시스템 단말(1010, 1020)의 각각으로부터, 그 업무 데이터를 취득하고, 이에 대하여 데이터 분석의 전(前)처리로서 필요한 데이터 클렌징 처리를 행하게 된다. 또한, 데이터 처리 장치(1040)는, 당해 데이터 클렌징 처리를 거친 데이터를, 네트워크(1050) 경유로 데이터 분석 처리 단말(1060)에 출력 가능하다.On the other hand, the data processing device 1040 acquires the business data from each of the business system terminals 1010 and 1020 mentioned above, and performs the data cleansing process required as a preprocessing of data analysis with respect to this. In addition, the data processing device 1040 can output the data that has undergone the data cleansing process to the data analysis processing terminal 1060 via the network 1050 .

상술한 데이터 분석 처리 단말(1060)은, 데이터 처리 장치(1040)로부터 얻은 데이터 클렌징 처리가 끝난 데이터를 입력으로 하여, 유저 등이 미리 지정한 분석 목적에 따른 상세 분석(예를 들면 상관 분석이나 클러스터링 등)을 행하고, 그 분석 결과 데이터를 출력하는 단말이 된다.The above-described data analysis processing terminal 1060 receives, as an input, data that has been cleansed of data obtained from the data processing device 1040, and performs detailed analysis (eg, correlation analysis, clustering, etc.) according to an analysis purpose specified in advance by a user or the like. ), and it becomes a terminal that outputs the analysis result data.

또한, 본 실시예에 있어서는, 업무 시스템 단말(1010, 1020)이, 업무 데이터를, 네트워크(1030) 경유로 데이터 처리 장치(1040)에 입력하고 있는 형태를 상정했지만, 이것에 한정하는 것이 아니다. 예를 들면, 업무 데이터를 저장한 포터블 하드디스크, USB 메모리, DVD 등의 기억 매체를, 데이터 처리 장치(1040)의 인터페이스에 접속하여 판독 동작을 실행시켜, 소위 오프라인에서 업무 데이터를 데이터 처리 장치(1040)에 입력하는 형태를 채용해도 된다.In addition, in this embodiment, although the form in which the business system terminals 1010 and 1020 are inputting business data into the data processing apparatus 1040 via the network 1030 was assumed, it is not limited to this. For example, a storage medium such as a portable hard disk, a USB memory, or a DVD storing business data is connected to the interface of the data processing device 1040 to execute a read operation, so that the business data is transferred offline to the data processing device ( 1040) may be adopted.

또한, 본 실시예에서는 데이터 처리 장치(1040)와 데이터 분석 처리 단말(1060)을 다른 단말로서 구성했지만, 동일 단말로서 구성해도 된다.In addition, although the data processing apparatus 1040 and the data analysis processing terminal 1060 were comprised as different terminals in this embodiment, you may configure as the same terminal.

또한, 상술한 데이터 처리 장치(1040)의 하드웨어 구성은 이하와 같이 된다. 도 2는, 제1 실시예에 있어서의 데이터 처리 장치(1040)의 하드웨어 구성예를 나타내는 도면이다. 본 실시예에 있어서의 데이터 처리 장치(1040)는, SSD(Solid State Drive)나 하드디스크 드라이브 등 적의한 기억 소자로 구성되는 기억 관리부(2050), 기억 관리부(2050)에 유지되는 각종 프로그램을 실행하고 장치 자체의 통괄 제어를 행함과 함께 각종 판정, 연산 및 제어 처리를 행하는 CPU 등의 주제어부(2010), 관리자로부터의 프로그램 실행 개시 지시나 중지 지시 등의 입력을 접수하는 키보드나 마우스로 실장(實裝)되는 입력부(2020), 처리 데이터의 표시 등을 행하는 디스플레이 등의 출력부(2030), 네트워크(1030, 1050)와 접속하여 다른 장치(예: 업무 시스템 단말(1010, 1020)이나 데이터 분석 처리 단말(1060))와의 통신 처리를 담당하는 네트워크 인터페이스 등의 통신 처리부(2040)를 구비한다. 또한, 이들 각 구성 요소는, 버스에 의해 실장되는 통신부(2060)에서 서로 접속되어 있다.In addition, the hardware configuration of the above-described data processing apparatus 1040 is as follows. Fig. 2 is a diagram showing an example of the hardware configuration of the data processing apparatus 1040 according to the first embodiment. The data processing device 1040 in the present embodiment executes a storage management unit 2050 composed of an appropriate storage element such as an SSD (Solid State Drive) or a hard disk drive, and various programs held in the storage management unit 2050 . and a main control unit 2010 such as a CPU that performs various judgment, calculation and control processing while performing integrated control of the device itself, and is implemented with a keyboard or mouse that receives input such as an instruction to start or stop program execution from an administrator ( An input unit 2020 to be inputted, an output unit 2030 such as a display for displaying processing data, etc., connected to the networks 1030 and 1050, and other devices (eg, business system terminals 1010 and 1020) and data analysis A communication processing unit 2040 such as a network interface in charge of communication processing with the processing terminal 1060 is provided. In addition, each of these components is connected to each other in a communication unit 2060 mounted by a bus.

또한, 기억 관리부(2050) 내에는, 데이터 처리 장치(1040)로서 필요한 기능을 실장하기 위한 각종 프로그램(특징량 계산 프로그램(2070), 데이터 분류 프로그램(2080), 데이터 클렌징 처리 프로그램(2090))에 더하여, 철도 오브젝트 정의 테이블(2100), 클렌징 처리 정의 테이블(2110), 및 업무 데이터(2120)가 적어도 기억되어 있다. 상술한 철도 오브젝트 정의 테이블(2100), 클렌징 처리 정의 테이블(2110), 및 업무 데이터(2120)의 구체예에 대해서는 후술한다.In addition, in the storage management unit 2050, various programs (feature amount calculation program 2070, data classification program 2080, data cleansing processing program 2090) for implementing functions necessary as the data processing device 1040 are provided. In addition, at least the railway object definition table 2100, the cleansing process definition table 2110, and the work data 2120 are stored. Specific examples of the railroad object definition table 2100 , the cleansing process definition table 2110 , and the business data 2120 described above will be described later.

이하, 본 실시형태에 있어서의 데이터 처리 방법의 실제 절차에 대해서 도면에 의거하여 설명한다. 이하에서 설명하는 데이터 처리 방법에 대응하는 각종 동작은, 데이터 처리 장치(1040)가 실행하는 프로그램에 의해 실현된다. 그리고, 이 프로그램은, 이하에 설명되는 각종 동작을 행하기 위한 코드로 구성되어 있다.Hereinafter, the actual procedure of the data processing method in this embodiment is demonstrated based on drawing. Various operations corresponding to the data processing method described below are realized by a program executed by the data processing apparatus 1040 . Then, this program is composed of codes for performing various operations described below.

도 3은, 제1 실시예에 있어서의 데이터 처리 방법의 플로우예 1을 나타내는 도면이다. 구체적으로는, 데이터 처리 장치(1040)에 있어서의 특징량 계산 프로그램(2070)의 처리예를 나타내는 플로우 차트이다. 이 특징량 계산 프로그램(2070)은, 예를 들면 데이터 분석자로부터의 지시에 따라 실행하거나, 새로운 업무 데이터의 취득시에 실행하는 것으로 한다. 또한, 데이터 처리 장치(1040)는, 업무 시스템 단말(1010, 1020) 등으로부터, 각각의 업무 데이터를 미리 취득하고, 기억 관리부(2050)에서 업무 데이터(2120)로서 유지하고 있는 것으로 한다.Fig. 3 is a diagram showing flow example 1 of the data processing method in the first embodiment. Specifically, it is a flowchart showing a processing example of the feature amount calculation program 2070 in the data processing device 1040 . It is assumed that this feature-quantity calculation program 2070 is executed, for example, in accordance with an instruction from a data analyst, or is executed when new business data is acquired. In addition, it is assumed that the data processing apparatus 1040 acquires each business data in advance from the business system terminals 1010, 1020, etc., and holds it as the business data 2120 in the storage management part 2050.

이 경우 우선, 데이터 처리 장치(1040)의 특징량 계산 프로그램(2070)은, 기억 관리부(2050)에 저장하고 있는 업무 데이터(2120)로부터, 복수의 업무 시스템에 걸쳐 일의로 인식될 수 있는 오브젝트로서, 철도 오브젝트를 추출한다(스텝 3010).In this case, first, the feature amount calculation program 2070 of the data processing device 1040 is an object that can be uniquely recognized across a plurality of business systems from the business data 2120 stored in the storage management unit 2050 . As such, a railway object is extracted (step 3010).

특징량 계산 프로그램(2070)은, 이 추출에 있어서, 예를 들면 업무 데이터(2120)의 각 칼럼명이나 당해 칼럼의 값을 판독하고, 그것들 중, 열차 번호 등 미리 형식 등이 판명되고 있는 값에 해당하는 것을, 복수의 업무 시스템에 걸쳐 일의로 인식될 수 있는 오브젝트, 이 경우에서는 철도 오브젝트에 관한 정보로서 특정한다. 물론, 업무 데이터(2120) 중에 복수의 철도 오브젝트가 포함될 경우, 특징량 계산 프로그램(2070)은, 업무 데이터(2120)로부터 모든 철도 오브젝트를 추출하는 것으로 한다.In this extraction, the feature amount calculation program 2070 reads, for example, each column name of the business data 2120 and the value of the column, and among them, the train number, etc. A corresponding one is specified as information about an object that can be uniquely recognized across a plurality of business systems, in this case a railroad object. Of course, when a plurality of railway objects are included in the business data 2120 , the feature amount calculation program 2070 extracts all railway objects from the business data 2120 .

다음으로, 특징량 계산 프로그램(2070)은, 기억 관리부(2050)의 철도 오브젝트 정의 테이블(2100)을 읽어들인다(스텝 3020). 도 4에, 제1 실시예에 있어서의 철도 오브젝트 정의 테이블(2100)의 구성예를 나타낸다.Next, the feature amount calculation program 2070 reads the railway object definition table 2100 of the storage management unit 2050 (step 3020). Fig. 4 shows a configuration example of the railway object definition table 2100 in the first embodiment.

이 철도 오브젝트 정의 테이블(2100)은, 오브젝트명(4010), 이동 가부(4020), 및 이동 속성(4030)으로 이루어지는 레코드의 리스트이다.This railroad object definition table 2100 is a list of records comprising an object name 4010 , movement availability 4020 , and movement attributes 4030 .

이 중 오브젝트명(4010)은, 철도 시스템에 있어서의, 열차, 역, 여객, 선로, 가선, 신호 등의 물체(즉 오브젝트)를 나타내는 값이 된다.Among them, the object name 4010 is a value indicating an object (ie, object) such as a train, station, passenger, track, overhead line, and signal in the railway system.

또한, 이동 가부(4020)는, 상술한 오브젝트명(4010)이 나타내는 오브젝트의 이동 가부를 나타내는 값으로서, 예를 들면 오브젝트가 「열차」이면 「가」, 「역」이면 「불가」와 같은 값이 된다.In addition, the movement permission 4020 is a value indicating whether the object indicated by the object name 4010 can be moved. becomes this

또한, 이동 속성(4030)은, 「위치 판정 기준 및 특징 산출 방법」, 「속도·방향 판정 기준 및 특징 산출 방법」, 및 「경로 룰 및 특징 산출 방법」의 각 값으로 이루어진다.In addition, the movement attribute 4030 consists of each value of "position determination standard and characteristic calculation method", "speed/direction determination standard and characteristic calculation method", and "route rule and characteristic calculation method".

이 중 위치 판정 기준은, 킬로미터 거리, 위도 경도 등, 당해 오브젝트의 위치를 산출하기 위한 데이터 항목명이나 단위를 나타내는 값이다. 또한, 속도·방향 판정 기준은, 다이어그램 데이터, 위도 경도의 시계열 추이 등, 당해 오브젝트의 이동 속도나 이동 방향을 산출하기 위한 데이터 항목명이나 단위를 나타내는 값이다.Among them, the position determination criterion is a value indicating a data item name or unit for calculating the position of the object, such as a kilometer distance, latitude and longitude. In addition, the speed/direction determination criterion is a value indicating a data item name and unit for calculating the moving speed and moving direction of the object, such as diagram data and time-series trend of latitude and longitude.

또한, 경로 룰은, 예를 들면 열차의 시발역, 도중 정차역, 종점역을 나열한 배열이나, 교통 IC 카드 이력 데이터에 있어서의 입장 이력 데이터에 대응하는 출장 이력 데이터와 같은, 당해 오브젝트의 이동 경로를 판별하기 위한 룰을 나타내는 값이다.In addition, the route rule determines the movement route of the object, such as, for example, an arrangement in which the train's starting station, stopping station, and ending station are arranged, or business trip history data corresponding to the entry history data in the traffic IC card history data. It is a value indicating the rule for doing.

또한, 특징 산출 방법은, 업무 데이터(2120)로부터 특징량을 산출하는 방법이나 계산식, 업무 시스템간에서 업무 데이터에 어긋남이 있다고 판별하는 임계값 등을 나타내는 값이다.Note that the feature calculation method is a method or formula for calculating a feature amount from the work data 2120, a value indicating a threshold value for determining that there is a discrepancy in the work data between work systems, and the like.

또한, 이동 속성(4030)의 값으로서는, 1개의 오브젝트에 대하여, 상술한 「위치 판정 기준 및 특징 산출 방법」, 「속도·방향 판정 기준 및 특징 산출 방법」, 및 「경로 룰 및 특징 산출 방법」의 각 값 중, 최저 1개 설정되어 있으면 된다.In addition, as a value of the movement attribute 4030, for one object, the above-mentioned "position determination standard and feature calculation method", "speed/direction determination standard and feature calculation method", and "path rule and feature calculation method" Among the values of , at least one should just be set.

예를 들면, 오브젝트가 「역」일 경우, 역은 이동하지 않기 때문에, 이동 속성(4030)으로서 「위치 판정 기준 및 특징 산출 방법」은 설정되어 있어야 하지만, 「속도·방향 판정 기준 및 특징 산출 방법」 및 「경로 룰 및 특징 산출 방법」은 설정이 없어도 된다.For example, when the object is "station", the station does not move, so "position determination standard and feature calculation method" must be set as the movement attribute 4030, but "speed/direction determination standard and feature calculation method" ' and "path rule and feature calculation method" do not need to be set.

다음으로, 특징량 계산 프로그램(2070)은, 상술한 스텝 3010에서 추출한 철도 오브젝트에 대해서 1개씩 이하의 처리를 개시하고(스텝 3030), 모든 오브젝트에 대하여 처리를 행해 있으면(스텝 3030: No), 처리를 종료한다.Next, the feature amount calculation program 2070 starts the following processing one by one for the railroad objects extracted in the above-mentioned step 3010 (step 3030), and if the processing is performed on all the objects (step 3030: No), Terminate processing.

다른 한편, 상술한 스텝 3030에 있어서, 처리 대상이 되는 다음 오브젝트가 존재한다고 판정했을 경우(스텝 3030: Yes), 특징량 계산 프로그램(2070)은, 당해 오브젝트가 이동 가능한지의 여부를 판정한다(스텝 3040).On the other hand, in the above-described step 3030, when it is determined that the next object to be processed exists (step 3030: Yes), the feature amount calculation program 2070 determines whether the object can be moved (step 3030). 3040).

당해 스텝 3040에 있어서의 특징량 계산 프로그램(2070)은, 철도 오브젝트 정의 테이블(2100) 중의 당해 오브젝트의 이동 가부(4020)란에 정의된 값을 참조하고, 그 값이 「가」 즉 이동 가능한지 판정하게 된다.The feature amount calculation program 2070 in the step 3040 refers to the value defined in the object movement permission 4020 column in the railway object definition table 2100, and determines whether the value is "probable", that is, it can be moved. will do

상술한 판정의 결과, 당해 오브젝트가 이동 가능하지 않으면(스텝 3040: No), 특징량 계산 프로그램(2070)은, 당해 오브젝트에 관한 업무 데이터(2120)가 나타내는 당해 오브젝트의 위치 정보가, 업무 시스템간에서 동일한지, 즉 동일 오브젝트가 업무 시스템에 걸쳐 같은 장소에 있다고 인식되고 있는지 판정하고, 당해 오브젝트의 위치 정보가 업무 시스템간에서 같은 장소를 나타내는 것이 아니면, 특징량 데이터로서 업무 시스템간에서의 위치 정보의 차이를, 장소의 어긋남으로서 산출한다(스텝 3050).As a result of the above determination, if the object cannot be moved (step 3040: No), the feature amount calculation program 2070 determines that the position information of the object indicated by the work data 2120 related to the object is transferred between work systems. is the same, that is, whether the same object is recognized as being in the same place across business systems, and if the position information of the object does not indicate the same place between business systems, position information between business systems as feature data is calculated as a displacement of the location (step 3050).

여기에서, 상술한 장소의 어긋남의 산출 방법은, 철도 오브젝트 정의 테이블(2100) 중의 「위치 판정 기준 및 특징 산출 방법」의 내용을 참조하여 결정한다. 예를 들면 당해 오브젝트가 「역」이면, 특징량 계산 프로그램(2070)은, 철도 오브젝트 정의 테이블(2100)의 해당 레코드에서 나타내는 바와 같이, 당해 역에 관한 「킬로미터 거리는 위도 경도로 데이터 단위를 변환하고, 구면(球面) 삼각법으로 거리를 산출」하고, 업무 시스템간에서 그 거리의 차가 「5㎞ 이상」일 경우, 장소의 어긋남이 있다고 판정하고, 당해 거리를 특징량 데이터로 한다. 또한, 철도 오브젝트 정의 테이블(2100)에 있어서, 당해 오브젝트에 관하여 「위치 판정 기준 및 특징 산출 방법」이 설정되어 있지 않으면, 특징량 계산 프로그램(2070)은, 본 스텝을 스킵한다.Here, the above-described calculation method of the displacement of the location is determined with reference to the contents of "position determination criteria and characteristic calculation method" in the railway object definition table 2100 . For example, if the object is "station", the feature amount calculation program 2070 converts the data unit into "the distance in kilometres, latitude and longitude" regarding the station as indicated by the record in the railway object definition table 2100. . In addition, in the railway object definition table 2100, if the "position determination standard and characteristic calculation method" is not set for the object, the feature amount calculation program 2070 skips this step.

한편, 상술한 스텝 3040에서 당해 오브젝트가 이동 가능한 오브젝트라고 판정했을 경우(스텝 3040: Yes), 특징량 계산 프로그램(2070)은, 업무 시스템에 걸쳐, 동일 오브젝트가 같은 시간에 같은 장소에 있는지의 여부를 판정한다(스텝 3060).On the other hand, when it is determined in the above-described step 3040 that the object is a movable object (step 3040: Yes), the feature amount calculation program 2070 determines whether the same object exists in the same place at the same time across the work system. is determined (step 3060).

여기에서, 상술한 이동 가능한 오브젝트에 관한 장소의 어긋남의 산출 방법은, 상술한 스텝 3050과 마찬가지이며, 철도 오브젝트 정의 테이블(2100) 중의 「위치 판정 기준 및 특징 산출 방법」의 내용을 참조하여 결정한다. 예를 들면 당해 오브젝트가 「열차」이면, 특징량 계산 프로그램(2070)은, 철도 오브젝트 정의 테이블(2100)의 해당 레코드에서 나타내는 바와 같이, 당해 열차에 관한 「궤도 회로 번호는 중앙 킬로미터 거리로 데이터 단위를 변환」하고, 업무 시스템간에서 그 킬로미터 거리 차분이 「7㎞ 이상」일 경우, 장소의 어긋남이 있다고 판정하고, 당해 킬로미터 거리 차분을 특징량 데이터로 한다.Here, the above-described method for calculating the displacement of the movable object is the same as the above-described step 3050, and it is determined with reference to the contents of "position determination criteria and characteristic calculation method" in the railway object definition table 2100. . For example, if the object is "train", the feature amount calculation program 2070, as shown in the record in the railroad object definition table 2100, "track circuit number for the train is a data unit with a center kilometer distance is converted", and when the difference in kilometer distance between the business systems is "7 km or more", it is determined that there is a location shift, and the kilometer distance difference is used as the feature data.

상술한 판정의 결과, 동일 오브젝트가 같은 시간에 같은 장소에 없으면(스텝 3060: No), 특징량 계산 프로그램(2070)은, 특징량 데이터로서 장소의 어긋남을 산출한다(스텝 3070).As a result of the above-mentioned determination, if the same object does not exist in the same place at the same time (step 3060: No), the feature amount calculation program 2070 calculates a displacement of the location as the feature data data (step 3070).

다른 한편, 상술한 판정의 결과, 동일 오브젝트가 같은 시간에 같은 장소에 있으면(스텝 3060: Yes), 특징량 계산 프로그램(2070)은, 당해 오브젝트가 같은 시간에 같은 속도, 같은 방향으로 이동하고 있는지 판정한다(스텝 3080).On the other hand, as a result of the above determination, if the same object is in the same place at the same time (step 3060: Yes), the feature calculation program 2070 determines whether the object is moving at the same speed and in the same direction at the same time. A determination is made (step 3080).

상술한 판정의 결과, 당해 오브젝트가 같은 시간에 같은 속도, 같은 방향으로 이동하고 있지 않으면(스텝 3080: No), 특징량 계산 프로그램(2070)은, 특징량 데이터로서, 속도·방향의 어긋남을 산출한다(스텝 3090).As a result of the above determination, if the object is not moving at the same speed and in the same direction at the same time (step 3080: No), the feature-quantity calculation program 2070 calculates the speed/direction shift as feature-quantity data. do (step 3090).

여기에서, 특징량 계산 프로그램(2070)은, 상술한 속도·방향의 어긋남의 산출 방법으로서, 철도 오브젝트 정의 테이블(2100) 중의 「속도·방향 판정 기준 및 특징 산출 방법」의 내용을 참조하여 결정하는 것으로 한다.Here, the feature amount calculation program 2070 is a method for calculating the speed/direction deviation described above, referring to the contents of "Speed/direction determination standard and characteristic calculation method" in the railway object definition table 2100. make it as

예를 들면, 당해 오브젝트가 「열차」이면, 특징량 계산 프로그램(2070)은, 속도차로서 차량 속도의 차분을 산출하고, 업무 시스템간에서, 당해 산출한 속도차가 10㎞／h 이상이면, 어긋남이 있다고 판정하고, 당해 속도차를 특징량 데이터로 한다. 또한, 특징량 계산 프로그램(2070)은, 방향에 관하여, 업무 시스템간에서, 상행／하행이나 노선명이 다르면, 어긋남이 있다고 판정하고, 당해 식별자(예: 노선명은 동일하고 상행／하행만 다를 경우에는 「1」, 노선명이 다를 경우에는 「2」)를 특징량 데이터로 한다.For example, if the object is a "train", the feature amount calculation program 2070 calculates a vehicle speed difference as a speed difference, and if the calculated speed difference between business systems is 10 km/h or more, there is a discrepancy. It is determined that there is, and the speed difference is used as the feature data. In addition, the feature amount calculation program 2070 determines that there is a discrepancy between the business systems when the up/down line or route name is different between the business systems, and the identifier (eg, the route name is the same but only the up/down line is different when the direction is the same). “1” and, if the route name is different, “2”) is used as the feature data.

또한, 당해 오브젝트에 관하여, 「속도·방향 판정 기준 및 특징 산출 방법」이 철도 오브젝트 정의 테이블(2100)에서 설정되어 있지 않으면, 특징량 계산 프로그램(2070)은, 본 스텝을 스킵한다.In addition, with respect to the said object, if "speed/direction determination standard and characteristic calculation method" is not set in the railway object definition table 2100, the characteristic amount calculation program 2070 skips this step.

여기에서, 이렇게 해서 차량 속도의 차분을 산출하는 구체예에 대해서, 도 5 에 의거하여 설명한다. 도 5는, 제1 실시예에 있어서의 특징량 계산 프로그램(2070)이 처리하는 데이터예를 나타내는 도면이다.Here, a specific example of calculating the difference in vehicle speed in this way will be described with reference to FIG. 5 . Fig. 5 is a diagram showing an example of data processed by the feature amount calculation program 2070 in the first embodiment.

여기에서는, 업무 데이터(2120)로서, 업무 시스템 중 차량 정보 제어 시스템이 축적하는 차량 센서 데이터(열차 번호, 시각, 킬로미터 거리, 속도로 이루어짐)와, 다른 업무 시스템인 운행 관리 시스템이 축적하는 실적 다이어그램 데이터(열차 번호, 궤도 회로 번호, 통과 시각으로 이루어짐)가, 데이터 처리 장치(1040)의 특징량 계산 프로그램(2070)에 입력된 것으로 한다.Here, as the business data 2120, vehicle sensor data (consisting of train number, time, kilometer distance, and speed) accumulated by the vehicle information control system in the business system and the performance diagram accumulated by the operation management system, which is another business system, are used here. It is assumed that data (consisting of a train number, a track circuit number, and a passing time) is input to the feature amount calculation program 2070 of the data processing device 1040 .

이 경우, 특징량 계산 프로그램(2070)은, 상술한 업무 데이터(2120)를 얻어, 당해 업무 시스템에 걸쳐 일의로 특정되는 철도 오브젝트로서, 열차 번호 「A1A001」의 열차 오브젝트를 추출한다. 물론, 특징량 계산 프로그램(2070)은, 업무 데이터(2120)에 있어서의 칼럼으로서 「열차 번호」을 포함하고, 당해 칼럼에 값이 설정되어 있을 경우, 당해 업무 데이터는, 이동 가능한 철도 오브젝트인 「열차」의 정보를 포함한다고 판정하는 룰을 미리 유지하고 있는 것으로 한다.In this case, the feature amount calculation program 2070 obtains the above-mentioned work data 2120, and extracts the train object of the train number "A1A001" as a railroad object uniquely specified throughout the work system. Of course, the feature amount calculation program 2070 includes "train number" as a column in the work data 2120, and when a value is set in the column, the work data is " It is assumed that the rules for judging that information on "train" are maintained in advance.

상술한 열차 오브젝트는 이동 가능한 오브젝트이기 때문에, 차량 센서 데이터 및 실적 다이어그램 데이터의 각 업무 데이터(2120)에서, 당해 열차가 같은 시간에 같은 장소에 있는 것을 나타내는지 판정하고, 또한, 당해 열차가 같은 시간에 같은 속도로 이동하고 있는지 판정한다.Since the above-described train object is a movable object, it is determined in each work data 2120 of the vehicle sensor data and the performance diagram data whether the train is in the same place at the same time, and furthermore, the train is in the same time. It is determined whether it is moving at the same speed.

여기에서 특징량 계산 프로그램(2070)은, 차량 정보 제어 시스템의 업무 데이터(2120)로부터는, 당해 열차에 관한 시각과 속도를 포함하는 데이터를 추출한다. 또한, 특징량 계산 프로그램(2070)은, 운행 관리 시스템의 업무 데이터(2120)로부터는, 각 궤도 회로의 도착 시각을 추출하고, 이 도착 시각 사이의 시간 길이로 당해 궤도 회로 사이의 거리(미리 판명되어 있음)를 제산(除算)함으로써, 당해 열차에 있어서의 시각마다의 속도를 계산한다.Here, the feature amount calculation program 2070 extracts data including the time and speed of the train concerned from the work data 2120 of the vehicle information control system. In addition, the feature-quantity calculation program 2070 extracts the arrival time of each track circuit from the operation data 2120 of the operation management system, and determines the distance between the track circuits as the length of time between these arrival times (determined in advance). ) is divided by the division to calculate the speed for each time in the train concerned.

예를 들면, 열차 번호 「A1A001」이 되는 열차에 관하여, 궤도 회로 번호 「B」가 되는 궤도 회로에의 도착 시각이 「10:02:00」, 궤도 회로 번호 「C」가 되는 궤도 회로에의 도착 시각이 「10:08:00」, 궤도 회로 번호 「B」 및 궤도 회로 번호 「C」의 각 궤도 회로 사이의 거리가 「10㎞」였다고 하면, 특징량 계산 프로그램(2070)은, 당해 궤도 회로간의 평균 속도를, 「10:08:00」과 「10:02:00」의 시각 사이의 시간 길이인 「6분」으로, 거리 「10㎞」를 제산하여, 10÷0.1＝「100㎞／h」이라고 산정하고, 이것을 시각 「10:08:00」에 있어서의 당해 열차의 속도로 한다.For example, with respect to the train whose train number is "A1A001", the arrival time to the track circuit which is the track circuit number "B" is "10:02:00", and the arrival time to the track circuit becomes the track circuit number "C". Assuming that the arrival time is "10:08:00" and the distance between the orbit circuits of the orbit circuit number "B" and the orbit circuit number "C" is "10 km", the feature-quantity calculation program 2070 is The average speed between circuits is calculated by dividing the distance “10km” by “6 minutes” which is the length of time between the times “10:08:00” and “10:02:00”, 10÷0.1 = “100km” /h", and let this be the speed of the train concerned at the time "10:08:00".

한편, 이 시각 「10:08:00」에 있어서의 당해 열차의 속도로서, 차량 센서 데이터에서는 「110㎞／h」을 나타내고 있다. 그러므로, 당해 업무 시스템간에서, 시각 「10:08:00」에서의 당해 열차 「A1A001」의 속도를 비교하면, 「100㎞／h」과 「110㎞／h」에서 10㎞／h 이상 어긋나 있기 때문에, 업무 시스템간에서 당해 열차에 관한 속도는 다르다고, 즉 같은 속도로 움직이고 있지 않다고 판정하고, 당해 속도의 차분을 특징량 데이터로서 산출한다.On the other hand, the vehicle sensor data indicates "110 km/h" as the speed of the train at this time "10:08:00". Therefore, comparing the speed of the train "A1A001" at the time "10:08:00" between the business systems, there is a difference of 10 km/h or more between "100 km/h" and "110 km/h" Therefore, it is determined that the speed regarding the train is different between the business systems, that is, it is not moving at the same speed, and the difference in the speed is calculated as the feature data.

여기에서, 도 3의 플로우의 설명으로 돌아간다. 다른 한편, 상술한 판정의 결과, 당해 오브젝트가 같은 시간에 같은 속도, 같은 방향으로 이동하고 있으면(스텝 3080: Yes), 특징량 계산 프로그램(2070)은, 당해 오브젝트가 같은 시간대에 같은 경로로 이동하고 있는지 판정한다(스텝 3100).Here, it returns to the description of the flow of FIG. On the other hand, if, as a result of the above-mentioned determination, the object is moving at the same speed and in the same direction at the same time (step 3080: Yes), the feature amount calculation program 2070 indicates that the object moves along the same path in the same time period. It is determined whether or not (step 3100).

상술한 판정의 결과, 당해 오브젝트가 같은 시간대에 같은 경로로 이동하고 있지 않으면(스텝 3100: No), 특징량 계산 프로그램(2070)은, 특징량 데이터로서 경로의 어긋남을 산출한다(스텝 3110).As a result of the above determination, if the object is not moving on the same route in the same time period (step 3100: No), the feature-quantity calculation program 2070 calculates a deviation of the route as the feature-value data (step 3110).

여기에서 경로의 어긋남의 산출에 있어서, 특징량 계산 프로그램(2070)은, 철도 오브젝트 정의 테이블(2100) 중의 「경로 룰 및 특징 산출 방법」의 내용을 참조하여 결정하는 것으로 한다. 예를 들면, 당해 오브젝트가 「열차」이면, 시발역, 도중 정차역, 및 종점역의 순번이나 수가, 업무 시스템간에서 다르면, 어긋남이 있다고 판정하고, 당해 순번이 다른 역 수를 특징량 데이터로 한다.Here, in calculating the deviation of the route, it is assumed that the feature amount calculation program 2070 makes a determination with reference to the contents of “Route rule and feature calculation method” in the railway object definition table 2100 . For example, if the object is a "train", if the sequence numbers and numbers of the starting station, the stopping station, and the ending station differ between business systems, it is determined that there is a discrepancy, and the station number with the different sequence is used as the feature data data.

또한, 철도 오브젝트 정의 테이블(2100)에서, 당해 오브젝트에 관하여 「경로 룰 및 특징 산출 방법」이 등록되어 있지 않으면, 특징량 계산 프로그램(2070)은, 본 스텝을 스킵한다.In addition, in the railway object definition table 2100, if the "route rule and feature calculation method" is not registered with respect to the object, the feature amount calculation program 2070 skips this step.

다른 한편, 상술한 판정의 결과, 당해 오브젝트가 같은 시간대에 같은 경로로 이동하고 있으면(스텝 3100: Yes), 특징량 계산 프로그램(2070)은, 처리를 스텝 3030으로 되돌리고, 다음 처리 대상인 오브젝트가 있으면 스텝 3040 이하의 처리를 반복하고, 모든 오브젝트에 대하여 처리를 행해 있으면(스텝 3030: No), 당해 플로우를 종료한다.On the other hand, as a result of the above determination, if the object is moving on the same route in the same time zone (step 3100: Yes), the feature amount calculation program 2070 returns the process to step 3030, and if there is an object to be processed next The processing of step 3040 or less is repeated, and when the processing has been performed for all objects (step 3030: No), the flow ends.

계속해서, 데이터 처리 장치(1040)에 있어서의 데이터 분류 프로그램(2080) 및 데이터 클렌징 처리 프로그램(2090)의 처리예를, 도 6의 플로우 차트를 이용하여 설명한다. 도 6에 나타내는 플로우를 실행하는 프로그램(데이터 분류 프로그램(2080) 및 데이터 클렌징 처리 프로그램(2090))은, 예를 들면 데이터 분석자로부터의 지시에 따라 실행하거나, 특징량 계산 프로그램(2070)의 실행 완료 후에 실행하는 것으로 한다.Then, the processing example of the data classification program 2080 and the data cleansing processing program 2090 in the data processing apparatus 1040 is demonstrated using the flowchart of FIG. The programs (data classification program 2080 and data cleansing processing program 2090) for executing the flow shown in Fig. 6 are executed, for example, according to an instruction from a data analyzer, or the characteristic quantity calculation program 2070 has been executed. to be executed later.

또한, 데이터 분류 프로그램(2080) 및 데이터 클렌징 처리 프로그램(2090)은, 특징량 계산 프로그램(2070)에서 산출한 복수의 특징량 데이터 및 당해 특징량 데이터에 대응하는 오브젝트명을 입력으로 한다.In addition, the data classification program 2080 and the data cleansing processing program 2090 receive a plurality of feature data calculated by the feature variable calculation program 2070 and object names corresponding to the feature data data as inputs.

먼저, 데이터 분류 프로그램(2080)은, 기억 관리부(2050)의 클렌징 처리 정의 테이블(2110)을 읽어들인다(스텝 6010).First, the data classification program 2080 reads the cleansing process definition table 2110 of the storage management unit 2050 (step 6010).

도 7에, 제1 실시예에 있어서의 클렌징 처리 정의 테이블(2110)의 구성예를 나타낸다. 본 테이블은, 오브젝트명(7010), 특징량의 조건(7020), 및 처리 내용(7030)으로 이루어지는 레코드의 리스트이다.In FIG. 7, the structural example of the cleansing process definition table 2110 in 1st Example is shown. This table is a list of records composed of an object name 7010 , a condition for a feature amount 7020 , and a processing content 7030 .

이 중 오브젝트명(7010)은, 열차, 역, 여객, 선로, 가선, 신호 등의 오브젝트의 정보가 설정되어 있다. 또한, 특징량의 조건(7020)은, 예를 들면 「특징량 데이터의 평균값이 20 이상 50 이하이고, 표준편차가 10 이하」, 「특징량 데이터의 7할 이상의 값이 10 이하」 등의 특징량 데이터에 대한 합치 조건이 설정되어 있다. 또한, 처리 내용(7030)은, 예를 들면 「차량 정보 제어 시스템과 운행 관리 시스템의 동(同)열차·동시각의 속도차의 평균값을 구하고, 차량 정보 제어 시스템의 데이터의 속도의 데이터값으로부터, 당해 평균값을 뺀다」, 「운행 관리 시스템의 실적 다이어그램에서의 역 도착부터 역 출발까지의 시간 사이는, 차량 정보 제어 시스템의 데이터의 당해 열차·당해 시각의 속도의 데이터값을 0㎞／h로 한다」 등의 처리 로직이 설정되어 있다.Among these, object name 7010 is set with object information such as train, station, passenger, track, overhead line, and signal. In addition, the characteristic quantity condition 7020 is, for example, "the average value of the characteristic quantity data is 20 or more and 50 or less, and the standard deviation is 10 or less", "the value of 70% or more of the characteristic quantity data is 10 or less", etc. Concordance conditions for quantity data are set. Further, the processing content 7030 is, for example, "an average value of the speed difference between the same train and the same time between the vehicle information control system and the operation management system is obtained, and from the data value of the data speed of the vehicle information control system. , subtract the average value”, “In the time between station arrival and station departure in the performance diagram of the operation management system, processing logic such as 'to do' is set.

다음으로, 데이터 분류 프로그램(2080)은, 본 프로그램 실행시에 입력된 복수의 특징량 데이터 및 당해 특징량 데이터에 대응하는 오브젝트명에 대해서, 1개씩 이하의 처리를 개시하고(스텝 6020), 모든 특징량 데이터 및 오브젝트에 대하여 처리를 행해 있으면(스텝 6020: No), 처리를 스텝 6050으로 이동한다.Next, the data classification program 2080 starts the following processing one by one for a plurality of feature data input at the time of executing the present program and an object name corresponding to the feature data data (step 6020), and all If processing has been performed on the feature data and the object (step 6020: No), the processing moves to step 6050.

다른 한편, 상술한 스텝 6020에 있어서, 처리 대상인 다음의 특징량 데이터 및 오브젝트가 남아있으면(스텝 6020: Yes), 데이터 분류 프로그램(2080)은, 당해 특징량 데이터 및 오브젝트에 관하여, 클렌징 처리 정의 테이블(2110) 중의 오브젝트명(7010) 및 특징량의 조건(7020)에 합치하는 레코드가 있는지의 여부를 판정한다(스텝 6030).On the other hand, in the above-mentioned step 6020, if the following feature data and object to be processed remain (step 6020: Yes), the data classification program 2080 relates to the feature data and object, the cleansing process definition table It is determined whether there is a record matching the condition 7020 of the object name 7010 and the feature amount in (2110) (step 6030).

상술한 판정의 결과, 합치하는 레코드가 있으면(스텝 6030: Yes), 데이터 분류 프로그램(2080)은, 당해 레코드의 처리 내용(7030)을, 데이터 클렌징 처리 프로그램(2090)에 건네주고, 데이터 클렌징 처리 프로그램(2090)에 의한 해당 데이터 클렌징 처리를, 당해 업무 데이터에 대하여 실행시키고(스텝 6040), 스텝 6020으로 돌아간다.As a result of the above determination, if there is a matching record (step 6030: Yes), the data classification program 2080 passes the processing contents 7030 of the record to the data cleansing processing program 2090, and the data cleansing processing The data cleansing process by the program 2090 is performed with respect to the said work data (step 6040), and it returns to step 6020.

다른 한편, 상술한 스텝 6030에서 합치하는 레코드가 없을 경우(스텝 6030: No), 데이터 분류 프로그램(2080)은, 처리를 스텝 6020으로 되돌린다.On the other hand, when there is no matching record in the above-described step 6030 (step 6030: No), the data classification program 2080 returns the process to step 6020.

한편, 스텝 6020의 판정의 결과, 모든 특징량 데이터 및 오브젝트에 대하여 처리를 행해 있으면(스텝 6020: Yes), 데이터 클렌징 처리 프로그램(2090)은, 데이터 클렌징 처리의 실행 결과를, 데이터 분석 처리 단말(1060)에 대하여 송신하거나, 혹은 자신(自身)의 출력부(2030)에서 화면(800)으로서 표시함과 함께, 실행 결과의 로그 파일(900)을 출력하고(스텝 6050), 플로우를 종료한다.On the other hand, as a result of the determination of step 6020, if processing is performed on all the feature data and objects (step 6020: Yes), the data cleansing processing program 2090 transmits the execution result of the data cleansing processing to the data analysis processing terminal ( 1060), or displayed as a screen 800 on its own output unit 2030, outputting a log file 900 of the execution result (step 6050), and ending the flow.

도 8에서, 데이터 클렌징 처리의 실행 결과를 나타내는 화면(800)의 예를, 또한, 도 9에 로그 파일(900)의 예를 각각 나타낸다. 본 화면(800)이나 로그 파일(900)에서는, 어느 업무 데이터(2120)의 파일 중의 어느 위치의 데이터에 대하여, 어떤 데이터 부정합이 검출되고, 그것에 대하여 어떻게 데이터 클렌징 처리를 행했는지를 출력함으로써, 이용자가 실행 결과를 확인할 수 있다.In FIG. 8, an example of the screen 800 which shows the execution result of a data cleansing process is shown, and the example of the log file 900 is shown respectively in FIG. In this screen 800 and log file 900, by outputting which data inconsistency was detected with respect to the data at which position in the file of which work data 2120, and how the data cleansing process was performed for it, the user can check the execution result.

--- 제2 실시예 ------ Second embodiment ---

본 실시예에서는, 데이터 분류 프로그램(2080)이 데이터 분류를 행할 때에, 기계 학습이나 심층 학습의 기술을 적용하여 분류를 행하기 위한 방법에 대해서, 도 10에 의거하여 설명한다. 여기에서는, 상술한 제1 실시예의 도 6에 있어서 나타낸, 데이터 분류 프로그램(2080) 및 데이터 클렌징 처리 프로그램(2090)의 플로우를 변형하여 나타내는 것으로 한다.In the present embodiment, when the data classification program 2080 classifies data, a method for classifying by applying a technique of machine learning or deep learning will be described with reference to FIG. 10 . Here, it is assumed that the flow of the data classification program 2080 and the data cleansing processing program 2090 shown in FIG. 6 of the above-mentioned 1st Embodiment is modified and shown.

먼저, 데이터 분류 프로그램(2080)은, 클렌징 처리 정의 테이블(2110)을 읽어들인다(스텝 10010). 본 실시예의 클렌징 처리 정의 테이블(2110)에서는, 특징량의 조건(7020)의 열에, 기계 학습 프로그램에서 출력되는 분류의 식별자가 등록되어 있는 것으로 한다. 예를 들면, 특징량의 조건(7020)에, 식별자 "X"가 등록되어 있을 경우, 「특징량 데이터의 평균값이 20 이상 50 이하이고, 표준편차가 10 이하」라고 기계 학습 프로그램이 과거의 특징량 데이터에 의거하여 특징량 데이터간의 어긋남에 관하여 분류한 내용이 대응하고, 식별자 "Y"가 등록되어 있을 경우, 「특징량 데이터의 7할 이상의 값이 10 이하」라고 기계 학습 프로그램이 과거의 특징량 데이터에 의거하여 특징량 데이터간의 어긋남에 관하여 분류한 내용이 대응하고 있다.First, the data classification program 2080 reads the cleansing process definition table 2110 (step 10010). In the cleansing process definition table 2110 of the present embodiment, it is assumed that the identifier of the classification output from the machine learning program is registered in the column of the condition 7020 of the feature quantity. For example, when the identifier "X" is registered in the condition 7020 of the feature amount, the machine learning program says that "the average value of the feature amount data is 20 or more and 50 or less, and the standard deviation is 10 or less" When the content classified regarding the deviation between the feature data based on the quantity data corresponds and the identifier "Y" is registered, the machine learning program will display the past characteristics as "the value of 70% or more of the feature data is 10 or less" Based on the quantity data, the contents classified regarding the deviation between the characteristic quantity data correspond to each other.

다음으로, 데이터 분류 프로그램(2080)은, 본 프로그램 실행시에 입력된 복수의 특징량 데이터 및 당해 특징량 데이터에 대응하는 오브젝트명에 대해서, 1개씩 이하의 처리를 개시하고(스텝 10020), 모든 특징량 데이터 및 오브젝트에 대하여 처리를 행해 있으면(스텝 10020: No), 스텝 10070으로 이동한다.Next, the data classification program 2080 starts the following processing one by one for a plurality of feature data input at the time of executing the present program and an object name corresponding to the feature data data (step 10020), and all If processing has been performed on the feature data and the object (step 10020: No), the flow advances to step 10070.

다른 한편, 상술한 스텝 10020에 있어서, 처리 대상인 특징량 데이터 및 오브젝트가 남아있을 경우(스텝 10020: Yes), 데이터 분류 프로그램(2080)은, 당해 특징량 데이터의 어긋남에 관하여, n차원의 특징 벡터로 변환한다(스텝 10030). 예를 들면 특징량 데이터의 평균값, 최대값, 최소값, 어긋남의 표준편차를 산출하여 4차원의 특징량 벡터로 한다.On the other hand, in the above-described step 10020, when the feature data and object to be processed remain (step 10020: Yes), the data classification program 2080 determines the deviation of the feature data in the n-dimensional feature vector. is converted to (step 10030). For example, the average value, the maximum value, the minimum value, and the standard deviation of the deviation of the feature data data are calculated and set as a four-dimensional feature variable vector.

다음으로, 데이터 분류 프로그램(2080)은, 상술한 특징량 벡터를 기계 학습 프로그램이나 심층 학습 프로그램에 입력하고, 출력으로서 분류 결과를 얻는다(스텝 10040). 여기에서 기계 학습 프로그램이나 심층 학습 프로그램의 실행시에, 오브젝트명에 대응하는 모델명을 지정해도 된다.Next, the data classification program 2080 inputs the above-described feature vector into a machine learning program or a deep learning program, and obtains a classification result as an output (step 10040). Here, you may designate a model name corresponding to an object name at the time of execution of a machine learning program or a deep learning program.

다음으로, 데이터 분류 프로그램(2080)은, 클렌징 처리 정의 테이블(2110)을 참조하여, 당해 오브젝트명 및 분류 결과에 합치하는 레코드가 있는지 판정한다(스텝 10050).Next, the data classification program 2080 refers to the cleansing process definition table 2110 and determines whether there is a record matching the object name and the classification result (step 10050).

상술한 판정의 결과, 합치하는 레코드가 있으면(스텝 10050: Yes), 데이터 분류 프로그램(2080)은, 당해 레코드의 처리 내용의 클렌징 처리를, 데이터 클렌징 처리 프로그램(2090)에 지시하고, 업무 데이터에 대한 데이터 클렌징 처리를 실행시키고(스텝 10060), 스텝 10020으로 돌아간다.As a result of the above determination, if there is a matching record (step 10050: Yes), the data classification program 2080 instructs the data cleansing processing program 2090 to clean the processing contents of the record, and to the business data. The data cleansing process is executed (step 10060), and the flow returns to step 10020.

다른 한편, 상술한 스텝 10050에서 합치하는 레코드가 없을 경우(스텝 10050: No), 데이터 분류 프로그램(2080)은, 처리를 스텝 10020으로 되돌린다.On the other hand, when there is no matching record in step 10050 described above (step 10050: No), the data classification program 2080 returns the process to step 10020.

한편, 상술한 스텝 10020에서 모든 특징량 데이터 및 오브젝트에 대하여 처리를 행해 있으면(스텝 10020: No), 데이터 클렌징 처리 프로그램(2090)은, 데이터 클렌징 처리의 실행 결과를, 데이터 분석 처리 단말(1060)에 대하여 송신하거나, 혹은 자신의 출력부(2030)에서 화면 표시함과 함께, 실행 결과의 로그 파일을 출력하고(스텝 10070), 플로우를 종료한다.On the other hand, if processing is performed on all the feature data and objects in the above-described step 10020 (step 10020: No), the data cleansing processing program 2090 transmits the execution result of the data cleansing processing to the data analysis processing terminal 1060 . is transmitted or displayed on the screen in its own output unit 2030, and a log file of the execution result is output (step 10070), and the flow is terminated.

이상, 각 실시예에서는 철도 분야에 있어서의 오브젝트를 예로 들어 설명을 행했다. 그러나 오브젝트의 종류로서 이것에 한정되는 것이 아니라, 자동차, 트럭, 버스, 항공, 선박 등의 다른 모빌리티 분야의 오브젝트에 관하여, 그 업무 데이터의 처리에 적용해도 된다. 본 발명은 오브젝트마다의 이동 가부 및 이동 속성에 착목하여 특징량을 산출하기 때문에, 다른 모빌리티 분야에서도 마찬가지의 처리 플로우로 실현할 수 있다.As mentioned above, in each Example, the object in the field of a railway was taken as an example and demonstrated. However, it is not limited to this as a kind of object, It may apply to the processing of the business data with respect to objects in other mobility fields, such as a car, a truck, a bus, an aviation, and a ship. Since the present invention calculates the feature amount by paying attention to the movement availability and movement attribute of each object, it can be realized with the same processing flow in other mobility fields.

또한 제1 실시예 및 제2 실시예에서는, 2개의 업무 시스템에 있어서 업무 데이터를 취급할 경우의 처리예에 대해서 설명했다. 그러나 이것에 한정되는 것이 아니라, 3개 이상의 업무 시스템의 업무 데이터를 처리 대상으로 해도 된다. 예를 들면 A, B, C의 3개의 업무 시스템의 업무 데이터를 취급할 경우, A와 B, B와 C, C와 A에 대해서 각각 특징량 산출이나 데이터 클렌징 처리를 행해도 되고, A와 B와 C의 3개의 업무 데이터를 대상으로 하여 특징량 산출이나 데이터 클렌징 처리를 행해도 된다. A와 B와 C의 3개의 업무 데이터를 대상으로 하여 특징량 산출이나 데이터 클렌징 처리를 행할 경우에는, 철도 오브젝트 정의 테이블에 3개의 업무 데이터를 대상으로 한 특징 산출 방법을 등록하며, 또한 클렌징 처리 정의 테이블에 3개의 업무 데이터를 대상으로 한 특징량의 조건이나 처리 내용을 등록해 둠으로써, 마찬가지의 처리 플로우로 실현할 수 있다.In addition, in the first embodiment and the second embodiment, processing examples in the case of handling business data in two business systems have been described. However, it is not limited to this, It is good also considering the business data of three or more business systems as a process object. For example, when dealing with business data of three business systems A, B, and C, you may perform feature quantity calculation and data cleansing processing for A and B, B and C, and C and A, respectively, and A and B You may perform feature-quantity calculation and data cleansing processing targeting three business data of and C. When the feature amount calculation or data cleansing processing is performed for the three business data A, B, and C, the feature calculation method for the three business data is registered in the railway object definition table, and the cleansing process definition By registering the conditions and processing contents of the feature amounts for three business data in the table, it can be realized with the same processing flow.

이상, 본 발명을 실시하기 위한 최량의 형태 등에 대해서 구체적으로 설명했지만, 본 발명은 이것에 한정되는 것이 아니라, 그 요지를 일탈하지 않는 범위에서 각종 변경 가능하다.As mentioned above, although the best form etc. for implementing this invention were demonstrated concretely, this invention is not limited to this, Various changes are possible in the range which does not deviate from the summary.

이러한 본 실시형태에 따르면, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여, 정밀도 양호한 데이터 분석이 가능해진다.According to this embodiment, it is possible to appropriately respond to data inconsistency between business systems, and to analyze data with good precision.

본 명세서의 기재에 따라, 적어도 다음의 것이 분명해진다. 즉, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제2 처리에 있어서, 당해 오브젝트가 이동하지 않는 물체일 경우에, 각 업무 시스템 사이에서의, 당해 오브젝트에 관한 업무 데이터가 나타내는 위치 정보의 어긋남에 의거하여 상기 특징량 데이터를 계산하는 것임으로 해도 된다.According to the description of the present specification, at least the following becomes clear. That is, in the data processing device of the present embodiment, in the second processing, when the object is an object that does not move, the position indicated by the business data relating to the object between the respective business systems. It is good also as calculating the said feature-quantity data based on the discrepancy of information.

이에 따르면, 예를 들면, 철도 시스템에 있어서의 역 등의 이동하지 않는 오브젝트에 관하여, 그 업무 데이터의 데이터 클렌징 처리를 적의하게 실행하고, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여 정밀도 양호한 데이터 분석이 가능해진다.According to this, for example, with respect to an object that does not move, such as a station in a railway system, the data cleansing process of the business data is appropriately executed, and data with high precision in response appropriately to data inconsistency between business systems. analysis becomes possible.

또한, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제2 처리에 있어서, 당해 오브젝트가 이동하는 물체일 경우에, 각 업무 시스템 사이에서의, 당해 오브젝트에 관한 업무 데이터가 나타내는, 동시각에 있어서의 위치 정보의 어긋남에 의거하여 상기 특징량 데이터를 계산하는 것임으로 해도 된다.Moreover, in the data processing apparatus of this embodiment, in the said 2nd process, when the said object is a moving object, the business data regarding the said object between each business system shows, It is good also as calculating the said characteristic amount data based on the shift|offset|difference of the positional information in time.

이에 따르면, 예를 들면, 철도 시스템에 있어서의 열차 등의 이동하는 오브젝트에 관하여, 그 업무 데이터에 있어서의 위치 정보의 어긋남에 의거한 적의한 데이터 클렌징 처리를 실행하고, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여 정밀도 양호한 데이터 분석이 가능해진다.According to this, for example, with respect to a moving object such as a train in the railway system, an appropriate data cleansing process based on the deviation of positional information in the business data is performed, and data inconsistency between business systems. By responding appropriately to this, data analysis with good precision becomes possible.

또한, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제2 처리에 있어서, 당해 오브젝트가 이동하는 물체일 경우에, 각 업무 시스템 사이에서의, 당해 오브젝트에 관한 업무 데이터가 나타내는, 동시각에 있어서의 속도 정보의 어긋남에 의거하여 상기 특징량 데이터를 계산하는 것임으로 해도 된다.Moreover, in the data processing apparatus of this embodiment, in the said 2nd process, when the said object is a moving object, in each business system, the business data regarding the said object indicates, It is good also as calculating the said characteristic amount data based on the shift|offset|difference of the speed information in time.

이에 따르면, 예를 들면, 철도 시스템에 있어서의 열차 등의 이동하는 오브젝트에 관하여, 그 업무 데이터에 있어서의 속도 정보의 어긋남에 의거한 적의한 데이터 클렌징 처리를 실행하고, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여 정밀도 양호한 데이터 분석이 가능해진다.According to this, for example, with respect to a moving object such as a train in the railway system, appropriate data cleansing processing based on the deviation of speed information in the business data is performed, and data inconsistency between business systems. By responding appropriately to this, data analysis with good precision becomes possible.

또한, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제2 처리에 있어서, 당해 오브젝트가 이동하는 물체일 경우에, 각 업무 시스템 사이에서의, 당해 오브젝트에 관한 업무 데이터가 나타내는, 동시각에 있어서의 이동 방향의 정보의 어긋남에 의거하여 상기 특징량 데이터를 계산하는 것임으로 해도 된다.Moreover, in the data processing apparatus of this embodiment, in the said 2nd process, when the said object is a moving object, the business data regarding the said object between each business system shows, It is good also as calculating the said characteristic amount data based on the shift|offset|difference of the information of the movement direction in time.

이에 따르면, 예를 들면, 철도 시스템에 있어서의 열차 등의 이동하는 오브젝트에 관하여, 그 업무 데이터에 있어서의 이동 방향의 정보의 어긋남에 의거한 적의한 데이터 클렌징 처리를 실행하고, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여 정밀도 양호한 데이터 분석이 가능해진다.According to this, for example, with respect to a moving object, such as a train in a railway system, an appropriate data cleansing process based on the shift|offset|difference of the information of the moving direction in the business data is performed, By appropriately responding to data inconsistencies, data analysis with high precision becomes possible.

또한, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제2 처리에 있어서, 당해 오브젝트가 이동하는 물체일 경우에, 각 업무 시스템 사이에서의, 당해 오브젝트에 관한 업무 데이터가 나타내는, 동시간대에 있어서의 이동 경로의 어긋남에 의거하여 상기 특징량 데이터를 계산하는 것임으로 해도 된다.Moreover, in the data processing apparatus of this embodiment, in the said 2nd process, when the said object is a moving object, in each business system, the business data regarding the said object indicates, It is good also as calculating the said characteristic amount data based on the shift|offset|difference of the movement path in time zone.

이에 따르면, 예를 들면, 철도 시스템에 있어서의 열차 등의 이동하는 오브젝트에 관하여, 그 업무 데이터에 있어서의 이동 경로의 정보의 어긋남에 의거한 적의한 데이터 클렌징 처리를 실행하고, 업무 시스템간에서의 데이터 부정합에 적의하게 대응하여 정밀도 양호한 데이터 분석이 가능해진다.According to this, for example, with respect to a moving object, such as a train in a railway system, an appropriate data cleansing process based on the shift|offset|difference of the information of the movement route in the business data is performed, and the business system is separated. By appropriately responding to data inconsistencies, data analysis with high precision becomes possible.

또한, 본 실시형태의 데이터 처리 장치에 있어서, 상기 주제어부는, 상기 제3 처리에 있어서, 당해 특징량 데이터에 대하여 기계 학습 알고리즘을 적용함으로써 상기 어긋남을 분류하고, 당해 분류의 결과에 따라 상기 데이터 클렌징 처리의 내용을 특정하는 것임으로 해도 된다.Moreover, in the data processing apparatus of this embodiment, in the said 3rd process, the said main control part classifies the said shift|offset|difference by applying a machine learning algorithm with respect to the said feature-quantity data, and said data cleansing according to the result of the said classification. It is good also as what specifies the content of a process.

이에 따르면, 업무 시스템간에 있어서의 (동일 오브젝트에 관한) 업무 데이터의 어긋남을 그 경향 등에 의거하여 기계 학습에 의해 효율적으로 분류하고, 나아가서는, 이 분류 결과에 따른 호적(好適)한 데이터 클렌징 처리의 특정, 실행이 가능해진다.According to this, the discrepancy of the business data (related to the same object) between the business systems is efficiently classified by machine learning based on the tendency, etc., and further, the matching data cleansing process according to the classification result specific, executable.

1010, 1020: 업무 시스템 단말
1030, 1050: 네트워크
1040: 데이터 처리 장치
1060: 데이터 분석 처리 단말
2010: 주제어부
2020: 입력부
2030: 출력부
2040: 통신 처리부
2050: 기억 관리부
2060: 통신부
2070: 특징량 계산 프로그램
2080: 데이터 분류 프로그램
2090: 데이터 클렌징 처리 프로그램
2100: 철도 오브젝트 정의 테이블
2110: 클렌징 처리 정의 테이블
2120: 업무 데이터1010, 1020: business system terminal
1030, 1050: Network
1040: data processing unit
1060: data analysis processing terminal
2010: main fisherman
2020: Input
2030: output
2040: communication processing unit
2050: Memory Management Department
2060: Ministry of Communications
2070: feature quantity calculation program
2080: Data Classification Program
2090: Data Cleansing Processing Program
2100: Railway object definition table
2110: Cleansing treatment definition table
2120: business data

Claims

A storage management unit storing business data of each of the plurality of business systems;
The first process for specifying from the business data an object that can be uniquely recognized across the plurality of business systems, and the discrepancy between the business data regarding the object between the business systems When calculating the feature amount data representing a characteristic, when the object is a non-moving object, the feature amount data is calculated based on a shift in the positional information indicated by the business data on the object between the respective business systems. A main control unit that executes a second process, a third process for specifying the content of the data cleansing process related to the business data based on the feature data, and a fourth process for performing data cleansing process on the content A data processing device characterized in that it has.

delete

According to claim 1,
The main control unit,
In said 2nd process, when the said object is a moving object, based on the shift|offset|difference of the positional information in the same time indicated by the business data regarding the said object between each business system, the said characteristic Data processing device, characterized in that for calculating the amount data.

According to claim 1,
The main control unit,
In the second process, when the object is a moving object, the feature data is calculated based on the shift in speed information at the same time indicated by the work data on the object between the respective work systems. A data processing device, characterized in that it is calculated.

According to claim 1,
The main control unit,
In the said 2nd process, when the said object is a moving object, based on the shift|offset|difference of the information of the movement direction in the simultaneous time which the business data concerning the said object between each business system shows, the said feature amount Data processing device, characterized in that for calculating data.

According to claim 1,
The main control unit,
In the second process, when the object is a moving object, the feature amount is based on a shift in movement path information in the same time zone indicated by the work data relating to the object between the respective work systems. Data processing device, characterized in that for calculating data.

According to claim 1,
The main control unit,
The data processing apparatus according to claim 3, wherein, in the third process, the discrepancy is classified by applying a machine learning algorithm to the feature data, and the content of the data cleansing process is specified according to a result of the classification.

An information processing device having a storage management unit storing business data of each of the plurality of business systems,
a first process for specifying, from the business data, an object that can be uniquely recognized across the plurality of business systems;
When calculating the feature amount data indicating the characteristics between the work data based on the shift in the work data related to the object between the respective work systems, when the object is a non-moving object, between the work systems, a second process of calculating the feature amount data based on a shift in positional information indicated by the business data relating to the object;
a third process for specifying the contents of the data cleansing process related to the business data based on the feature data;
A data processing method characterized by executing a fourth processing for performing data cleansing processing in the content.