WO2018117901A1 - Information processing system - Google Patents

Information processing system Download PDF

Info

Publication number
WO2018117901A1
WO2018117901A1 PCT/RU2017/000305 RU2017000305W WO2018117901A1 WO 2018117901 A1 WO2018117901 A1 WO 2018117901A1 RU 2017000305 W RU2017000305 W RU 2017000305W WO 2018117901 A1 WO2018117901 A1 WO 2018117901A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
external
auxiliary
target
code
Prior art date
Application number
PCT/RU2017/000305
Other languages
French (fr)
Inventor
Brenda TUROVSKAYA
Original Assignee
Flexi Connect Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Flexi Connect Limited filed Critical Flexi Connect Limited
Publication of WO2018117901A1 publication Critical patent/WO2018117901A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Definitions

  • the present invention relates to an information processing field, in particular, to system and method of processing data obtained from external databases, by interconnecting a unique record in home database with a set of records about object, including duplicate records about object, obtained from external databases, to delete duplicate records about object.
  • the present information processing technology has been developed to improve the information processing quality, namely, the verification and duplicate data deleting.
  • the source data for internal systems are obtained from databases, web services or directly from websites. Thereafter the source data (including media data) are processed by extracting, transforming, and loading (ETL technologies). Upon the output after processing the input information is presented in different formats: associated databases, XML data, flat files, non-associated database or others.
  • the main purpose of the source information processing is to prepare data in a single format that can be used for further processing. It should be noted that some data cannot be retrieved, if the source records are erroneous.
  • the second stage of the retrieved data processing - its transformation in accordance with the rules.
  • the data validation and transformation rules initiate various necessary information changes.
  • the most frequently used data transformation stage rules are as follows: code conversion, value encoding, calculating, sorting, integrating, aggregating, new values' generating, transposition, dividing, detailing, verifying and others.
  • the final ETL stage performs the data loading into the target database, the information system, the data warehouse.
  • the Big Data Technology is characterized by enormous quantity of data, various formats and speed data processing. It is a series of approaches, tools and methods for processing structured and unstructured data of enormous quantities and significant variety, ensuring human-perceivable results, effective in conditions of continuous data gain, data distribution to multiple) computer network nodes, formed in late 2000s, that is alternative to traditional database management systems and Business Intelligence class solutions.
  • This series comprises tools of massively parallel vaguely structured data processing, first of all, by NoSQL category solutions, MapReduce algorithms, software frameworks and Hadoop project libraries, that should be obvious to those skilled in the art.
  • a disadvantage of present systems is the fact, that existing systems do not allow to 100% eliminate an error, especially in case of repeated information transmission via several databases.
  • Databases can accumulate information on the same objects. Slight errors and inaccuracies in spelling the names lead to a multiple repetition of records relating to the same object in the single database.
  • a user who is not familiar with the objects, meaning the objects, this user wants to get the information about, can easily take the information about one object in different records as information about various objects, and it considerably complicates the user's decision making process based on the object information.
  • the task that underlies the present invention is to provide the information processing method and system, allowing to obtain a variety of information about objects from different databases and integrate it in one unique object record in its own database.
  • the technical result achieved by the present invention is improving the quality of data processing due to both deleting duplicate data and processing time reduction.
  • the information processing system comprising at least one data warehouse, at least one data adapter, and data output unit
  • the data warehouse comprises at least one target data set and at least one auxiliary data set
  • the auxiliary data set comprises a plurality of auxiliary data elements
  • each auxiliary data element comprises one auxiliary data internal code, one auxiliary data flag, at least one auxiliary data description field and data set with auxiliary data codes of data providers
  • data set with auxiliary data codes of data providers comprises a plurality of data elements with external data provider code for auxiliary data
  • each data element with external data provider code for auxiliary data comprises a field with external data provider unique identifier, a field with an external data provider code for auxiliary data and an external data provider code flag for auxiliary data
  • the target data set comprises a plurality of target data elements
  • each target data element comprises one target data internal code, one target data flag, at least one target data description field, at least one field with auxiliary data internal code
  • at least one data set comprising with target data codes of data providers
  • the information processing system is configured to set a flag status in the form of color marks.
  • each flag can have four statuses: "new element”, “verified element”, “blocked/deleted element” and "element is in processing".
  • Fig. 1 shows a block diagram of an information processing system
  • Fig. 2 - a target data element and a data set with external data provider codes for target data
  • Fig. 3 an auxiliary data element and a data set with external data provider codes for auxiliary data
  • Fig. 4 an operator interface with a target data table
  • Fig. 5 an operator interface with an auxiliary data table
  • Fig. 6 an auxiliary data set structure and data
  • Fig. 7 - a target data set structure and data
  • Fig. 8 - a continuation of the target data set shown in Fig. 7, after new data adding;
  • Fig. 9 - a target data set after operator's processing
  • Fig. 11 - an auxiliary data set after operator's processing.
  • the information processing system 1 comprises at least one data warehouse 2.
  • the data warehouse 2 comprises at least one target data set 3 and at least one auxiliary data set 4.
  • the target data set 3 consists of a plurality of target data elements 5.
  • each target data element 5 comprises one target data internal code 6, one target data flag 17, at least one target data element description field 7 and one data set 8 with external data provider codes.
  • Each data element 5 comprises at least one field 9 with an auxiliary data internal code.
  • the target data internal code 6 is a unique identifier, and it is assigned to the target data element only once.
  • the data set 8 with external data providers codes of target data comprises a plurality of elements 10 with external data provider code of target data.
  • Each data element 10 with external data provider code of target data comprises a field 11 with external data provider unique identifier (UID), a field 12 with external data provider code for target data and an external data provider code flag 18 for the target data.
  • UID external data provider unique identifier
  • the external data provider unique identifier (UID) is unique for each data provider, and it is assigned by the present information processing system 1 only once.
  • the auxiliary data set 4 comprises a plurality of auxiliary data elements 13.
  • each auxiliary data element 13 comprises an auxiliary data internal code 14, one auxiliary data flag 25, at least one auxiliary data element description field 15, one data set 16 with auxiliary data codes of data providers;
  • the auxiliary data internal code 14 is a unique identifier, and it is assigned to the auxiliary data element only once.
  • the data set 16 with auxiliary data codes of data providers comprises a plurality of data elements 21 with external data provider code for auxiliary data.
  • Each data element 21 with external data provider code for auxiliary data comprises a field 22 with external data provider unique identifier (UID), that is described above, a field 23 with external data provider code for auxiliary data and an external data provider code flag 24 for auxiliary data.
  • UID external data provider unique identifier
  • All these flags have at least three statuses: "new element”, “verified element” and “blocked/deleted element”. Also, if necessary, any other flag status can be added, for example "element is in processing”.
  • the information processing system 1 comprises at least one external data adapter 19.
  • Each external data adapter 19 is assigned to load data from an external database 20 and to format data in the data warehouse 2 format.
  • external data bases 20 may be of different types (relational databases, XML format, hierarchical, object-based and object-oriented, object-relational, relational, network, functional), different structure, different encoding and different names of data fields.
  • the external data adapter 19 allocates the data obtained from the external database 20 in the data warehouse 2 format.
  • the information processing system 1 comprises the data output unit 26.
  • the data output unit 26 generates data in response to the request of external data users. Meanwhile, said unit generates only the data with the "verified element" flag status.
  • the information processing system 1 comprises the operator interface 27.
  • the operator interface 27 configuration is described below along with a description of system operation as a whole.
  • the information processing system 1 operates as follows.
  • the data is retrieved from the external database 20, and the data are generated for processing in the external data adapter 19.
  • the external data adapter 19 processes the obtained data and reduces the data to the data warehouse 2 format.
  • the information processing system 1 loads the data processed by the data adapter 19 in the target data set 3.
  • the target data unique internal code 6 is assigned to each loaded target data element 5, and target data element description fields 7 are filled with corresponding data.
  • the target data flag 17 is assigned with the "new element" status.
  • the data element 10 with an external data provider code of the data set 8 with target data codes of data providers is formed, namely, the field 11 with external data provider unique identifier (UID) that corresponds to current external database, is filled up.
  • UID external data provider unique identifier
  • the field 12 with external data provider code for target data current element is filled up with the current data element code, that was assigned by an external data provider. Said code is retrieved by the data adapter 19 at the data loading stage.
  • the information processing system 1 searches for target data of the target data element 5 in the target data set 3, which comprises the data element 10 with external data provider code in the data set 8 with target data codes of data providers, and which has the field 11 with external data provider unique identifier (UID) comprising the data corresponding to a current external database (an external data provider), and the field 12 with external data provider code for target data comprising the same code as the external data provider code for target data retrieved by the external data adapter 19.
  • UID external data provider unique identifier
  • the search is carried out only among the target data elements 5, where the target data flag 17 has a "verified element” status.
  • the said data element 10 with external data provider code is found, the external data provider code flag 18 for target data is assigned with the "verified element" status.
  • the external data provider code flag 18 for target data is assigned with a "new element" status.
  • the external data adapter 19 retrieves the external data provider code for auxiliary data, meaning the code that is comprised in one of the target data element description fields 7.
  • the information processing system 1 searches for auxiliary data of the auxiliary data element 13 in the auxiliary data set 4, which comprises the data element 21 with external data provider code for auxiliary data in the data set 16 with auxiliary data codes of data providers, and which has the field 22 with external data provider unique identifier (UID) comprising the data corresponding to a current external database (an external data provider), and the field 23 with external data provider code for auxiliary data comprising the same code as the external data provider code for auxiliary data retrieved by the external data adapter 19.
  • UID external data provider unique identifier
  • the search is carried out only among the auxiliary data elements 13, where the auxiliary data element flag 25 has a "verified element" status.
  • the auxiliary data set 4 comprises such auxiliary data element 13
  • the auxiliary data internal code 14 of this auxiliary data element is recorded in the field 9 with auxiliary data internal code of the target data current element 5.
  • the information processing system 1 has an operator interface 27.
  • the information processing system 1 displays the data in tabular form on the operator interface 27.
  • the information processing system 1 displays the target data table 28 on the operator interface 27, where the data from one target data set 3 are displayed.
  • One row in the target data table 28 corresponds to each target data element 5.
  • the information processing system 1 displays the corresponding data obtained by the external data adapter 19, if said data were obtained, instead of the corresponding auxiliary data element description fields 15.
  • the operator interface 27 is provided with the target data filter unit 29 (at least, one filter). Each target data filter corresponds to one data column of the target data table 28.
  • the element flags can be presented in the form of graphic color elements.
  • a white square can correspond to the "new element” flag status, a green square - to the "verified element” status, a black square - to a "blocked/deleted element” status, a red square - to "element is in processing" status.
  • the information processing system 1 allows the operator to download various data in the filters of the filter unit 29 via the operator interface 27, and said system filters out only the target data elements that correspond to filter values.
  • the information processing system 1 allows the operator to change flag statuses via the operator interface 27, to edit the data of the target data element 5, to add, to delete, and to move the data elements 10 with external data provider code from one target data element 5 to another target data element 5.
  • the information processing system 1 processing displays the auxiliary data table 30 and the auxiliary data filter element 31 on the operator interface
  • Each auxiliary data filter corresponds to one data column of the auxiliary data table
  • the information processing system 1 allows the operator via the operator interface 27 to enter various data in the filters of the auxiliary data filter unit 31 and filter out only the auxiliary data elements corresponding to filter values.
  • the information processing system 1 allows the operator via the operator interface 27 to change the flag statuses, edit the data of the auxiliary data element 13, add, delete, and move from one auxiliary data item 13 to another auxiliary data item 13 the elements of the data element 21 about the external data provider code for auxiliary data. Also, the information processing system 1 allows the operator to add new auxiliary data elements 13 in the auxiliary data set 4 via the operator interface 27.
  • the example contains two providers with the unique external data provider identifiers, that are respectively "Provider 1" and "Provider 2.
  • Fig. 6 shows structure and data of auxiliary data set 4. As follows from the presented data, the Provider 2 has two identifiers for Moscow, and Provider 1 has two identifiers for New York.
  • duplication of identifiers in case of machine data processing, means duplication of cities and the presentation of false or incomplete data to the end user.
  • said duplication is eliminated, as in the auxiliary data set 4 all identifiers match unambiguously to cities.
  • Fig. 7 shows structure and data of target data set 3.
  • the target data set 3 comprises information about two hotels. Data on these hotels are verified and the target data flag status 17 is set as a "verified" one. The information about these hotels is to be transmitted by the data output unit 26, in response to an external request.
  • the data element corresponding to the "Bent" hotel with the "H09459” code has a data set 8 with external data provider codes, comprising two data elements 10 with an external data provider code.
  • the duplication of records and, accordingly, the provision of incomplete or incorrect information to the end user are prevented.
  • Fig. 8 shows the continuation of the target data set 3, shown in Fig. 7, after adding new data by the data Provider 1 and Provider 2 appropriate external data adapters 19.
  • the target data flag 17 of the previously downloaded data about the hotels has a "verified” status and as for the newly-downloaded data - it has a "new” status.
  • the output data element 26 doesn't transmit the data with the target data flag 17 marked as "new", external users will not obtain unverified and possibly incorrect information.
  • the information processing system 1 displays the information about six hotels on the operator interface 27.
  • the information processing system 1 allows the operator to edit the data via the operator interface 27 as described above, in this case the further processing is carried out.
  • the data Provider 2 has added the information about the new hotel as for the data element corresponding to the "Book Hotel” hotel with the "HL7832" code.
  • the information processing system 1 Upon data loading, the information processing system 1 identifies the external data provider (Provider 2) code for auxiliary data "MSK" and stores this value in the field 23 with external data provider code for auxiliary data and saves the relevant auxiliary data unique code in the field 9 with auxiliary data internal code.
  • the external data provider Provider 2 code for auxiliary data "MSK”
  • the operator interface 27 provides the operator with the standardized auxiliary geographic data contained in the auxiliary data set 4, namely, "Russia”, "Moscow”, for these data elements.
  • the operator interface 27 is equipped with the filter element 29, the operator is given the opportunity to filter properly all the array of information, for example, with words “Russia”, “Moscow” and “Boo", and to reveal the fact that the "HL7832" - "Book Hotel” data element is a new one for the Provider-2, but this data element already exists in the target data set 3.
  • the operator has opportunity to download a new data element 10 with the external data provider code from the new "HL7832" - “Book Hotel” data element in the "HL7831” - “Book Hotel” target data element, to change the flag 18 for external data provider code for target data "HL7831” - “Book Hotel” to the "verified” status and to change the target data flag 17 of the"HL7832" - “Book Hotel” target data element to the "blocked” status, as shown in Fig. 9.
  • the external data provider code flag 18 of the data element 10 about the external data provider code has a "verified" status. It means, that the target data set 2 already comprises target data element, comprising the same external data provider code, and, respectively, the said new target data element is data updating, but not a new element.
  • the operator can easily identify the target data "H09459” - "Bent”, transfer the missing data in this element, namely, the additional telephone number "1-212-555-6789” and substitute the target data flag 17 of the "HO9460" - "Bent” target data element for the "blocked” status, as shown in Fig. 10.
  • a distinctive feature of this target data element is the fact, that the field 9 with auxiliary data internal code does not contain data.
  • auxiliary data set 4 does not contain the auxiliary data element 13 with the field 23 with external data provider code for auxiliary data that is equal to "Germany Kunststoff".
  • the operator can add in the auxiliary data set 4 - in the auxiliary data element 13 with a unique internal code 14 the auxiliary data element that is equal to "id003" - in the data set 16 about auxiliary data codes of data providers - the data element 21 about external data provider code for auxiliary data from the "H03522" - "Movenpick” target data element, namely, "Provider 1" - "Germany Kunststoff” - “verified", as shown in Fig. 11.
  • information processing system 1 will update data in the target data set 3 by adding the "H03522" - “Movenpick” target data element with the auxiliary data unique code "id003" to the field 9 with the auxiliary data internal unique code.
  • the information processing system is an effective tool for obtaining various information about the objects from different databases, and its integration in one unique object record in its own database, aside from the duplication of information about the same object.

Abstract

The information processing system, representing an effective tool for obtaining various information about objects from different databases and for its integration in one unique object record in its own database, aside from duplicating the information about the same object is provided.

Description

Information processing system
Field of the Invention
The present invention relates to an information processing field, in particular, to system and method of processing data obtained from external databases, by interconnecting a unique record in home database with a set of records about object, including duplicate records about object, obtained from external databases, to delete duplicate records about object.
Background of the Invention
Rapidly advancing information technologies have increased the demand for processing the data obtained from the enormous number of sources. Banks require market data from stock exchanges, insurance companies request data from reinsurance companies and brokerage firms, and travel agencies receive the data from the global distribution systems and airlines, payment systems facilitate millions of transactions in the supply chain, etc. A growing number of software applications require a sound technology for data processing and correlation with internal information systems.
Corporations use external data sources for production, supply chains, customer relationships, material resource planning, global distribution systems, pension systems and other value added chains. To use these data in the system, the data validation and the duplicate data deleting in native format are required. Previously, the said data have been validated and deleted using approaches based on the ETL (extract transformation load) technology, Big Data technologies or combination approaches. The source information processing based on these technologies is characterized by low data cleaning-up quality.
The present information processing technology has been developed to improve the information processing quality, namely, the verification and duplicate data deleting.
The source data for internal systems are obtained from databases, web services or directly from websites. Thereafter the source data (including media data) are processed by extracting, transforming, and loading (ETL technologies). Upon the output after processing the input information is presented in different formats: associated databases, XML data, flat files, non-associated database or others. The main purpose of the source information processing is to prepare data in a single format that can be used for further processing. It should be noted that some data cannot be retrieved, if the source records are erroneous. The second stage of the retrieved data processing - its transformation in accordance with the rules. The data validation and transformation rules initiate various necessary information changes. The most frequently used data transformation stage rules are as follows: code conversion, value encoding, calculating, sorting, integrating, aggregating, new values' generating, transposition, dividing, detailing, verifying and others. The final ETL stage performs the data loading into the target database, the information system, the data warehouse.
The Big Data Technology is characterized by enormous quantity of data, various formats and speed data processing. It is a series of approaches, tools and methods for processing structured and unstructured data of enormous quantities and significant variety, ensuring human-perceivable results, effective in conditions of continuous data gain, data distribution to multiple) computer network nodes, formed in late 2000s, that is alternative to traditional database management systems and Business Intelligence class solutions. This series comprises tools of massively parallel vaguely structured data processing, first of all, by NoSQL category solutions, MapReduce algorithms, software frameworks and Hadoop project libraries, that should be obvious to those skilled in the art.
It should also be noted, that without assistance people are not able to clean up the database from duplicate information, since modern databases comprise millions of records on hundreds of thousands of objects.
A disadvantage of present systems is the fact, that existing systems do not allow to 100% eliminate an error, especially in case of repeated information transmission via several databases. Databases can accumulate information on the same objects. Slight errors and inaccuracies in spelling the names lead to a multiple repetition of records relating to the same object in the single database. A user, who is not familiar with the objects, meaning the objects, this user wants to get the information about, can easily take the information about one object in different records as information about various objects, and it considerably complicates the user's decision making process based on the object information.
While working with multiple object databases, there is the complication in obtaining complete and accurate object information, due to the enormous quantity of information and errors during compiling the databases.
The task that underlies the present invention is to provide the information processing method and system, allowing to obtain a variety of information about objects from different databases and integrate it in one unique object record in its own database.
The technical result achieved by the present invention is improving the quality of data processing due to both deleting duplicate data and processing time reduction.
Summary of the Invention
The claimed technical result is achieved due to the information processing system, comprising at least one data warehouse, at least one data adapter, and data output unit, wherein the data warehouse comprises at least one target data set and at least one auxiliary data set, the auxiliary data set comprises a plurality of auxiliary data elements, each auxiliary data element comprises one auxiliary data internal code, one auxiliary data flag, at least one auxiliary data description field and data set with auxiliary data codes of data providers, data set with auxiliary data codes of data providers comprises a plurality of data elements with external data provider code for auxiliary data, each data element with external data provider code for auxiliary data comprises a field with external data provider unique identifier, a field with an external data provider code for auxiliary data and an external data provider code flag for auxiliary data, the target data set comprises a plurality of target data elements, each target data element comprises one target data internal code, one target data flag, at least one target data description field, at least one field with auxiliary data internal code, and at least one data set comprising with target data codes of data providers, and data set with target data codes of data providers comprises a plurality of data elements with external data provider code for target data, each data element with external data provider code for target data comprises a field with external data provider unique identifier, a field with external data provider code for target data, external data provider code flag for target data, the data adapter is configured to load the external provider data, to format the external provider data in a warehouse data format, to select an external data provider code for target data from the external provider data, an external data provider code for auxiliary data and to record these codes in the target data set, each flag has at least three statuses: "new element", "verified element" and "blocked/deleted element", a data output unit is configured to ensure the possibility of data output from the data warehouse in response to an external user request, wherein the data output unit outputs only the data with the "verified element" flag, while loading an external provider data, the information processing system for each downloaded element: sets the target data flag status as "new element", searches in the auxiliary data set for the same external data provider code for auxiliary data as the external data provider code for auxiliary data of the target data downloaded element and, in case said current data provider code is found, the information processing system records the auxiliary data internal code of the auxiliary data found element in the field with auxiliary data internal code, searches in the target data set for the same external data provider code for target data as the external data provider code for target data of the target data downloaded element, and in case, such a current data provider code is found, the information processing system sets the status of said external data provider code flag for target data as a "verified element", and if such code is not found, the information processing system sets the status of said external data provider code flag for target data as a "new element", the information processing system is configured in a way to provide an operator with an operator interface, that provides the operator upon said operator's request with the target data set and the filter unit, containing at least one filter, and meanwhile the information processing system is configured to display the operator the information in accordance with the data downloaded in the filter unit, also the information processing system is configured to provide the operator with the operator interface, that upon the operator 's request provides the said operator with an auxiliary data set and a filter unit, containing at least one filter, wherein the information processing system is configured to display the operator the information in accordance with the data downloaded in the filter unit, meanwhile the information processing system is adapted to provide the operator with operator interface, that enables the operator to copy and modify data from the target and auxiliary data sets, including said flags, and to save the modified data in the target and auxiliary data sets.
Preferably, the information processing system is configured to set a flag status in the form of color marks.
And a white color mark is used for the "new element" mark status, green color is used for the "verified element" mark status, black color is used for the "blocked/deleted element" mark status. Additionally, each flag can have four statuses: "new element", "verified element", "blocked/deleted element" and "element is in processing".
And red color is used for the "element is in processing" mark status.
Detailed description of the invention
A detailed description of the present invention follows with reference to the accompanying drawings, wherein:
Fig. 1 shows a block diagram of an information processing system;
Fig. 2 - a target data element and a data set with external data provider codes for target data;
Fig. 3 - an auxiliary data element and a data set with external data provider codes for auxiliary data;
Fig. 4 - an operator interface with a target data table;
Fig. 5 - an operator interface with an auxiliary data table;
Fig. 6 - an auxiliary data set structure and data;
Fig. 7 - a target data set structure and data;
Fig. 8 - a continuation of the target data set shown in Fig. 7, after new data adding;
Fig. 9 - a target data set after operator's processing;
Fig. 10 - the Fig. 9 continuation;
Fig. 11 - an auxiliary data set after operator's processing.
As shown in Fig. 1, the information processing system 1 comprises at least one data warehouse 2.
The data warehouse 2 comprises at least one target data set 3 and at least one auxiliary data set 4.
The target data set 3 consists of a plurality of target data elements 5.
As shown in Fig. 2, each target data element 5 comprises one target data internal code 6, one target data flag 17, at least one target data element description field 7 and one data set 8 with external data provider codes. Each data element 5 comprises at least one field 9 with an auxiliary data internal code.
The target data internal code 6 is a unique identifier, and it is assigned to the target data element only once. The data set 8 with external data providers codes of target data comprises a plurality of elements 10 with external data provider code of target data.
Each data element 10 with external data provider code of target data comprises a field 11 with external data provider unique identifier (UID), a field 12 with external data provider code for target data and an external data provider code flag 18 for the target data.
The external data provider unique identifier (UID) is unique for each data provider, and it is assigned by the present information processing system 1 only once.
The auxiliary data set 4 comprises a plurality of auxiliary data elements 13.
As shown in Fig. 3, each auxiliary data element 13 comprises an auxiliary data internal code 14, one auxiliary data flag 25, at least one auxiliary data element description field 15, one data set 16 with auxiliary data codes of data providers;
The auxiliary data internal code 14 is a unique identifier, and it is assigned to the auxiliary data element only once.
The data set 16 with auxiliary data codes of data providers comprises a plurality of data elements 21 with external data provider code for auxiliary data.
Each data element 21 with external data provider code for auxiliary data comprises a field 22 with external data provider unique identifier (UID), that is described above, a field 23 with external data provider code for auxiliary data and an external data provider code flag 24 for auxiliary data.
All these flags have at least three statuses: "new element", "verified element" and "blocked/deleted element". Also, if necessary, any other flag status can be added, for example "element is in processing".
Besides, the information processing system 1 comprises at least one external data adapter 19. Each external data adapter 19 is assigned to load data from an external database 20 and to format data in the data warehouse 2 format.
The reduction of data obtained from external data bases 20, is carried out, as external data bases 20 may be of different types (relational databases, XML format, hierarchical, object-based and object-oriented, object-relational, relational, network, functional), different structure, different encoding and different names of data fields.
The external data adapter 19 allocates the data obtained from the external database 20 in the data warehouse 2 format. Besides, the information processing system 1 comprises the data output unit 26. The data output unit 26 generates data in response to the request of external data users. Meanwhile, said unit generates only the data with the "verified element" flag status.
To check the data correctness, the information processing system 1 comprises the operator interface 27. The operator interface 27 configuration is described below along with a description of system operation as a whole.
The information processing system 1 operates as follows.
Initially, the data is retrieved from the external database 20, and the data are generated for processing in the external data adapter 19.
The external data adapter 19 processes the obtained data and reduces the data to the data warehouse 2 format.
Further, the information processing system 1 loads the data processed by the data adapter 19 in the target data set 3.
The target data unique internal code 6 is assigned to each loaded target data element 5, and target data element description fields 7 are filled with corresponding data. The target data flag 17 is assigned with the "new element" status.
Further the data element 10 with an external data provider code of the data set 8 with target data codes of data providers, is formed, namely, the field 11 with external data provider unique identifier (UID) that corresponds to current external database, is filled up.
The field 12 with external data provider code for target data current element is filled up with the current data element code, that was assigned by an external data provider. Said code is retrieved by the data adapter 19 at the data loading stage.
Further, the information processing system 1 searches for target data of the target data element 5 in the target data set 3, which comprises the data element 10 with external data provider code in the data set 8 with target data codes of data providers, and which has the field 11 with external data provider unique identifier (UID) comprising the data corresponding to a current external database (an external data provider), and the field 12 with external data provider code for target data comprising the same code as the external data provider code for target data retrieved by the external data adapter 19.
Meanwhile, the search is carried out only among the target data elements 5, where the target data flag 17 has a "verified element" status. In case, the said data element 10 with external data provider code is found, the external data provider code flag 18 for target data is assigned with the "verified element" status.
In case, the said data element 10 with external data provider code is not found, the external data provider code flag 18 for target data is assigned with a "new element" status.
Also at the data loading stage the external data adapter 19 retrieves the external data provider code for auxiliary data, meaning the code that is comprised in one of the target data element description fields 7.
Further, the information processing system 1 searches for auxiliary data of the auxiliary data element 13 in the auxiliary data set 4, which comprises the data element 21 with external data provider code for auxiliary data in the data set 16 with auxiliary data codes of data providers, and which has the field 22 with external data provider unique identifier (UID) comprising the data corresponding to a current external database (an external data provider), and the field 23 with external data provider code for auxiliary data comprising the same code as the external data provider code for auxiliary data retrieved by the external data adapter 19.
Meanwhile, the search is carried out only among the auxiliary data elements 13, where the auxiliary data element flag 25 has a "verified element" status.
In case, the auxiliary data set 4 comprises such auxiliary data element 13, the auxiliary data internal code 14 of this auxiliary data element is recorded in the field 9 with auxiliary data internal code of the target data current element 5.
This is the completion of the information processing of one newly-loaded element 5, and the system moves to the next loaded target data element processing.
Since all new downloaded elements have the "new element" flag, and they are not included in the volume of data transmitted to external customers by the data output unit 26, there is a need in their human-assisted verification.
To enable the data verification, the information processing system 1 has an operator interface 27.
The information processing system 1 displays the data in tabular form on the operator interface 27.
As shown in Fig. 4, the information processing system 1 displays the target data table 28 on the operator interface 27, where the data from one target data set 3 are displayed. One row in the target data table 28 corresponds to each target data element 5.
And in this case, instead of the field 9 with auxiliary data internal code, the corresponding auxiliary data element description fields 15 are displayed in a row of the target data element 5.
In case, the external data adapter 19 derives the external data provider code for auxiliary data, but the external data provider code for auxiliary data code has not been found, the information processing system 1 displays the corresponding data obtained by the external data adapter 19, if said data were obtained, instead of the corresponding auxiliary data element description fields 15.
The operator interface 27 is provided with the target data filter unit 29 (at least, one filter). Each target data filter corresponds to one data column of the target data table 28.
The element flags can be presented in the form of graphic color elements. For example, a white square can correspond to the "new element" flag status, a green square - to the "verified element" status, a black square - to a "blocked/deleted element" status, a red square - to "element is in processing" status.
The information processing system 1 allows the operator to download various data in the filters of the filter unit 29 via the operator interface 27, and said system filters out only the target data elements that correspond to filter values.
Also the information processing system 1 allows the operator to change flag statuses via the operator interface 27, to edit the data of the target data element 5, to add, to delete, and to move the data elements 10 with external data provider code from one target data element 5 to another target data element 5.
As shown in Fig.5, the information processing system 1 processing displays the auxiliary data table 30 and the auxiliary data filter element 31 on the operator interface
27. Each auxiliary data filter corresponds to one data column of the auxiliary data table
30.
The information processing system 1 allows the operator via the operator interface 27 to enter various data in the filters of the auxiliary data filter unit 31 and filter out only the auxiliary data elements corresponding to filter values.
The information processing system 1 allows the operator via the operator interface 27 to change the flag statuses, edit the data of the auxiliary data element 13, add, delete, and move from one auxiliary data item 13 to another auxiliary data item 13 the elements of the data element 21 about the external data provider code for auxiliary data. Also, the information processing system 1 allows the operator to add new auxiliary data elements 13 in the auxiliary data set 4 via the operator interface 27.
The following is the description of the system by way of a certain example.
In said example, there is a need to process the tourist information, namely, to maintain a hotel database in different cities and countries around the world.
The example contains two providers with the unique external data provider identifiers, that are respectively "Provider 1" and "Provider 2.
The data about three cities: Moscow, Munich and New York, are downloaded in the auxiliary data set 4.
Fig. 6 shows structure and data of auxiliary data set 4. As follows from the presented data, the Provider 2 has two identifiers for Moscow, and Provider 1 has two identifiers for New York.
It is obvious, that such duplication of identifiers, in case of machine data processing, means duplication of cities and the presentation of false or incomplete data to the end user. In accordance with the present invention, said duplication is eliminated, as in the auxiliary data set 4 all identifiers match unambiguously to cities.
Fig. 7 shows structure and data of target data set 3.
The target data set 3 comprises information about two hotels. Data on these hotels are verified and the target data flag status 17 is set as a "verified" one. The information about these hotels is to be transmitted by the data output unit 26, in response to an external request.
The data element corresponding to the "Bent" hotel with the "H09459" code has a data set 8 with external data provider codes, comprising two data elements 10 with an external data provider code. This means that both Provider's 1 and Provider's 2 external databases comprise the information about the said hotel. And due to the data set 8 with external data provider codes, the duplication of records and, accordingly, the provision of incomplete or incorrect information to the end user are prevented.
Fig. 8 shows the continuation of the target data set 3, shown in Fig. 7, after adding new data by the data Provider 1 and Provider 2 appropriate external data adapters 19. The target data flag 17 of the previously downloaded data about the hotels has a "verified" status and as for the newly-downloaded data - it has a "new" status. As in response to the request from an external data user, the output data element 26 doesn't transmit the data with the target data flag 17 marked as "new", external users will not obtain unverified and possibly incorrect information.
The information processing system 1 displays the information about six hotels on the operator interface 27.
As the information processing system 1 allows the operator to edit the data via the operator interface 27 as described above, in this case the further processing is carried out.
As one can see, the data Provider 2 has added the information about the new hotel as for the data element corresponding to the "Book Hotel" hotel with the "HL7832" code.
Upon data loading, the information processing system 1 identifies the external data provider (Provider 2) code for auxiliary data "MSK" and stores this value in the field 23 with external data provider code for auxiliary data and saves the relevant auxiliary data unique code in the field 9 with auxiliary data internal code.
Due to the fact that the information processing system 1 displays the corresponding auxiliary data element description fields 15 instead of the field 9 with auxiliary data internal code, the operator interface 27 provides the operator with the standardized auxiliary geographic data contained in the auxiliary data set 4, namely, "Russia", "Moscow", for these data elements.
As the operator interface 27 is equipped with the filter element 29, the operator is given the opportunity to filter properly all the array of information, for example, with words "Russia", "Moscow" and "Boo", and to reveal the fact that the "HL7832" - "Book Hotel" data element is a new one for the Provider-2, but this data element already exists in the target data set 3.
After comparing two records it is evident, that a new target data element "HL7832" - "Book Hotel" does not carry additional data.
Correspondingly the operator has opportunity to download a new data element 10 with the external data provider code from the new "HL7832" - "Book Hotel" data element in the "HL7831" - "Book Hotel" target data element, to change the flag 18 for external data provider code for target data "HL7831" - "Book Hotel" to the "verified" status and to change the target data flag 17 of the"HL7832" - "Book Hotel" target data element to the "blocked" status, as shown in Fig. 9.
Thus, it will be taken into account, that the Provider 2 has added the data about the new hotel, and the protection against data duplication with all implied negative consequences will be secured.
As for the data element, corresponding to the "Bent" hotel with code ΉΟ9460", it is evident, that the data Provider 2 has added the information about the new hotel.
And the external data provider code flag 18 of the data element 10 about the external data provider code has a "verified" status. It means, that the target data set 2 already comprises target data element, comprising the same external data provider code, and, respectively, the said new target data element is data updating, but not a new element.
By data searching, filtering and sorting, as described above, the operator can easily identify the target data "H09459" - "Bent", transfer the missing data in this element, namely, the additional telephone number "1-212-555-6789" and substitute the target data flag 17 of the "HO9460" - "Bent" target data element for the "blocked" status, as shown in Fig. 10.
Thus, taking into account the update of the information about the hotel, and the data protection against data duplication with all implied negative consequences, will be secured.
As for the data element, corresponding to the "Movenpick" hotel with "H03522" code, it is evident, that the Provider 1 data has added the information about the new hotel.
A distinctive feature of this target data element is the fact, that the field 9 with auxiliary data internal code does not contain data.
This is due to the fact, that the auxiliary data set 4 does not contain the auxiliary data element 13 with the field 23 with external data provider code for auxiliary data that is equal to "Germany Munich".
By searching, filtering and sorting data, as described above, and by analysis of data, the operator can add in the auxiliary data set 4 - in the auxiliary data element 13 with a unique internal code 14 the auxiliary data element that is equal to "id003" - in the data set 16 about auxiliary data codes of data providers - the data element 21 about external data provider code for auxiliary data from the "H03522" - "Movenpick" target data element, namely, "Provider 1" - "Germany Munich" - "verified", as shown in Fig. 11.
Thus, information processing system 1 will update data in the target data set 3 by adding the "H03522" - "Movenpick" target data element with the auxiliary data unique code "id003" to the field 9 with the auxiliary data internal unique code.
As a result, the correspondence with the verified geographic data contained in the auxiliary data set 4, will be established for said and other target data elements, that will be downloaded in future.
This will ensure a complete data search, filtering and sorting by the operator. By data searching, filtering and sorting, as described above, the operator can easily identify that the "H03522" - "Movenpick" target data is a new hotel that is not contained in the earlier target data elements and substitute the target data flag 17 of the target data element for the "verified" status, as shown in Fig. 10.
The information processing system according to the present invention is an effective tool for obtaining various information about the objects from different databases, and its integration in one unique object record in its own database, aside from the duplication of information about the same object.

Claims

1. The information processing system comprising at least one data warehouse, at least one data adapter, and data output unit, wherein the data warehouse comprises at least one target data set and at least one auxiliary data set,
the auxiliary data set comprises a plurality of auxiliary data elements, each auxiliary data element comprises one auxiliary data internal code, one auxiliary data flag, at least one auxiliary data description field and data set with auxiliary data codes of data providers, data set with auxiliary data codes of data providers comprises a plurality of data elements with external data provider code for auxiliary data, and each data element with external data provider code for auxiliary data comprises a field with external data provider unique identifier, a field with an external data provider code for auxiliary data and an external data provider code flag for auxiliary data,
the target data set comprises a plurality of target data elements, each target data element comprises one target data internal code, one target data flag, at least one target data description field, at least one field with auxiliary data internal code, and at least one data set comprising target data codes of data providers, data set with target data codes of data providers comprises a plurality of data elements with external data provider code for target data, each data element with external data provider code for target data comprises a field with external data provider unique identifier, a field with external data provider code for target data, external data provider code flag for target data,
the data adapter is configured to load the external provider data, to format the external provider data in a warehouse data format, to select an external data provider code for target data from the external provider data, an external data provider code for auxiliary data and to record these codes in the target data set,
each flag has at least three statuses: "new element", "verified element" and
"blocked/deleted element",
a data output unit is configured to ensure the possibility of data output from the data warehouse in response to an external user request, wherein the data output unit outputs only the data with the "verified element" flag,
while loading an external provider data, the information processing system for each downloaded element: sets the target data flag status as "new element",
searches in the auxiliary data set for the same external data provider code for auxiliary data as the external data provider code for auxiliary data of the target data downloaded element and in case said current data provider code is found, the information processing system records the auxiliary data internal code of the auxiliary data found element in the field with auxiliary data internal code,
searches in the target data set for the same external data provider code for target data as the external data provider code for target data of the target data downloaded element, and in case, such a current data provider code is found, the information processing system sets the status of said external data provider code flag for target data as a "verified element", and if such code is not found, the information processing system sets the status of said external data provider code flag for target data as a "new element", the information processing system is configured to provide an operator with an operator interface, that provides the operator upon said operator's request with the target data set and the filter unit, containing at least one filter, and meanwhile the information processing system is configured to display the operator the information in accordance with the data downloaded in the filter unit,
also the information processing system is configured to provide the operator with the operator interface, that, upon the operator's request, provides the said operator with an auxiliary data set and a filter unit, containing at least one filter, wherein the information processing system is configured to display the operator the information in accordance with the data downloaded in the filter unit,
the information processing system is adapted to provide the operator with operator interface, that enables the operator to copy and modify data from the target and auxiliary data sets, including said flags, and to save the modified data in the target and auxiliary data sets.
2. The information processing system according to claim 1, wherein said system is configured to set a flag status in the form of color marks.
3. The information processing system according to claim 2, wherein a white color mark is used for the "new element" mark status, green color is used for the "verified element" mark status, black color is used for the "blocked/deleted element" mark status.
4. The information processing system according to any of claims 1-3, wherein each flag has four statuses: "new element", "verified element", "blocked/deleted element" and "element is in processing".
5. The information processing system according to claim 4, wherein red color is used for the "element is in processing" mark status.
PCT/RU2017/000305 2016-12-21 2017-05-12 Information processing system WO2018117901A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2016150458 2016-12-21
RU2016150458A RU2016150458A (en) 2016-12-21 2016-12-21 Information processing system

Publications (1)

Publication Number Publication Date
WO2018117901A1 true WO2018117901A1 (en) 2018-06-28

Family

ID=62626912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2017/000305 WO2018117901A1 (en) 2016-12-21 2017-05-12 Information processing system

Country Status (2)

Country Link
RU (1) RU2016150458A (en)
WO (1) WO2018117901A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295868A1 (en) * 2005-03-21 2011-12-01 O'farrell Robert Adapter architecture for mobile data system
US20130144833A1 (en) * 2011-12-06 2013-06-06 International Business Machines Corporation Processing data in a data warehouse

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295868A1 (en) * 2005-03-21 2011-12-01 O'farrell Robert Adapter architecture for mobile data system
US20130144833A1 (en) * 2011-12-06 2013-06-06 International Business Machines Corporation Processing data in a data warehouse

Also Published As

Publication number Publication date
RU2016150458A (en) 2018-06-21

Similar Documents

Publication Publication Date Title
US10853387B2 (en) Data retrieval apparatus, program and recording medium
Karnitis et al. Migration of relational database to document-oriented database: Structure denormalization and data transformation
US8341131B2 (en) Systems and methods for master data management using record and field based rules
US8943059B2 (en) Systems and methods for merging source records in accordance with survivorship rules
CN110472068B (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
US8645332B1 (en) Systems and methods for capturing data refinement actions based on visualized search of information
CN106104592A (en) Map band key entity attributes
US20150278268A1 (en) Data encoding and corresponding data structure
US8224791B2 (en) Information lifecycle cross-system reconciliation
CN104714949A (en) Method for customizing report dynamically
US20200341965A1 (en) Data Tokenization System Maintaining Data Integrity
CN115328883A (en) Data warehouse modeling method and system
EP0398884A1 (en) A relational database representation with relational database operation capability
CN110389953B (en) Data storage method, storage medium, storage device and server based on compression map
CN111881126A (en) Big data management system
WO2018117901A1 (en) Information processing system
CN115048456A (en) User label generation method and device, computer equipment and readable storage medium
Gantner A spatiotemporal ontology for the administrative units of Switzerland
CN113553458A (en) Data export method and device in graph database
WO2016060551A1 (en) A method for mining electronic documents and system thereof
EP3652669A1 (en) Systems and methods for compiling a database
US11216486B2 (en) Data retrieval apparatus, program and recording medium
CN116501375B (en) Data dictionary version management method, device, computer equipment and storage medium
WO2023188049A1 (en) Metadata management system, metadata management method, and program
Jenkinson et al. COR technical note for the construction of intermediate data structure (IDS)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17882719

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17882719

Country of ref document: EP

Kind code of ref document: A1