WO1998057272A1 - Dispositif d'analyse et d'organisation de donnees - Google Patents
Dispositif d'analyse et d'organisation de donnees Download PDFInfo
- Publication number
- WO1998057272A1 WO1998057272A1 PCT/FR1998/001015 FR9801015W WO9857272A1 WO 1998057272 A1 WO1998057272 A1 WO 1998057272A1 FR 9801015 W FR9801015 W FR 9801015W WO 9857272 A1 WO9857272 A1 WO 9857272A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- columns
- data
- column
- coll
- database
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Definitions
- the present invention relates to information processing and storage systems of the database type.
- Such systems can be made accessible to uninitiated users, using appropriate development tools. These tools are very often used by professionals (those skilled in the art), because they considerably reduce the development costs of articulated applications in database form.
- the present invention aims in particular to provide a solution to this problem.
- the data processing device proposed for this purpose is of the type comprising at least one computer, provided with a unit control unit with a processor, at least one user device, and a memory, animated by an operating system, as well as a database management system, stored in this computer, and suitable for cooperating with the operating system to allow the user to create / enter and / or use a database comprising for example at least one data table, decomposable into rows and columns.
- database management system is meant here to cover any computer file system making it possible to manage tables, whatever the physical storage mode thereof.
- the invention includes in this device means that can be called a development and assistance tool.
- the assistance function is carried out by the fact that, in a means forming an autonomous meta-dictionary, dynamically stores selected information relating to the structure of the database, typically the tables and the links between tables, or equivalent information .
- the analysis function can involve a statistical tool, capable of determining interdependencies, and preferably also dependencies, between data sets, by counting distinct occurrences, and an analysis module (pilot). capable of cooperating with the etak Dictionary and with this statistical tool, to obtain and store said representation of groups of linked columns.
- the statistical tool is based on a counting means, which preferably operates directly on the columns of each table, in a manner which will be described later.
- the analysis module can be arranged to reiterate the presentation of sub-groups of at least two columns, until finding at least one sub-group whose columns are linked, or until all possibilities are exhausted. Preferably, it systematically performs the presentation of all the different possible subgroups of columns for said table, preferably also for all the tables of the database.
- the re-structuring module begins by selecting a starting table to be processed, columns to be treated from at least one group of linked columns, and a primary link key for this group of columns.
- Restructuring can be done by building a new table with data from a group of linked columns, as well as with a link key with the table in question.
- the analysis means and its storage means can operate in different ways, in particular: at the request and / or on satisfaction of certain criteria (which may include a partial analysis), or, conversely, permanently, dynamically (possibly for partial analysis only).
- the invention can also be expressed in the form of a process applied to a computer, or else in the form of the new industrial product that constitutes the development tool.
- FIGS. 1 and 2 illustrate two examples of architecture of information processing systems usable according to the invention
- FIG. 3 illustrates the combination of means in which the invention manifests itself
- FIG. 4 graphically illustrates a link between two tables
- FIG. 7 illustrates a single table corresponding to the two tables of Figures 5 and 5A
- Figure 7A illustrates the same single table, but with input anomalies
- - Figure 8 shows the appearance on the screen of a form for entering information, relating to components and products, managed on a single table
- FIG. 9 represents the result of an interrogation request from the monotable structure of FIG. 7
- FIG. 10 is the detailed block diagram of the analysis means according to the invention.
- FIG. 11 illustrates the meaning of the block graphics used here in the flowcharts
- FIG. 12 (divided into 12A and 12B) illustrates the flowchart of operation of the query builder
- Figure 13 illustrates the flowchart of operation of the analysis module
- Figure 13A illustrates a flowchart optional additional operation of the analysis module
- FIG. 14 represents the result of a search request for false dependence from the monotable structure of FIG. 7A
- FIG. 15 shows the selection screen for reorganization
- FIG. 16 (divided into 16A and 16B) illustrates the operating flowchart of the selection and reorganization modules
- Figure 17 represents the joins of the structure with three Components, Products and Categories tables
- Figure 18 represents the input form of the components after the reorganization of the information relating to the products
- Figure 18A represents a simplified form of input of the components after the reorganization of the information relating to the products
- FIG. 19 represents the result of the interrogation request after the reorganization of the information relating to the products
- FIG. 19A represents the result of the interrogation request after a new reorganization of the information relating to the product category
- FIG. 20 illustrates the operating flowchart of the virtual reorganization variant.
- the computer system of Figure 1 includes:
- - at least one computer 100 known as a client, with a central unit 110 (CPU, random access memory, etc.), a screen 120, a keyboard 130, a mass memory 140 (hard disk for example), a pointing device 150 ( mouse for example), a printer 160, and a peripheral 170 allowing access to a network (local or remote) as well as the corresponding software, and
- a central unit 110 CPU, random access memory, etc.
- a screen 120 a keyboard 130, a mass memory 140 (hard disk for example), a pointing device 150 ( mouse for example), a printer 160, and a peripheral 170 allowing access to a network (local or remote) as well as the corresponding software, and
- a computer 200 called a server, comprising a central unit 210 (CPU, random access memory, etc.), a screen 220, a keyboard 230, a file storage system 240 (hard disk for example), possibly a pointing system 250 , a peripheral 260 allowing access to the network (local or remote) as well as the corresponding software.
- a central unit 210 CPU, random access memory, etc.
- a screen 220 a keyboard 230
- a file storage system 240 hard disk for example
- possibly a pointing system 250 possibly a pointing system 250
- peripheral 260 allowing access to the network (local or remote) as well as the corresponding software.
- the mass memory (or other memory) of the two computers houses an operating system.
- an operating system For the "client”, we will take for example an operating system with a graphical interface, such as Windows, OS / 2 or MOTIF for example (Trademarks).
- the graphical interface is less useful in certain cases, and one can take Windows NT, OS / 2, Unix, or Novell for example (Trademarks).
- the operating system is launched, in a known manner, generally when the computer starts.
- a database is installed on the server's storage system.
- Each piece of data (“item” of data) is materialized by an ordered series of fingerprints (magnetic, optical, magneto-optical or other) on the storage medium (disk or other), accessible for example via a SQL type language, and using a database engine installed on the storage system.
- fingerprints magnetic, optical, magneto-optical or other
- disk or other accessible for example via a SQL type language
- database engine installed on the storage system.
- clients On the side of the "client" computer (s), it suffices to provide programs, recorded on the storage system, allowing access to the database via an appropriate language, for example SQL.
- Figure 2 illustrates a "single user" configuration integrating some of these elements on a single computer. There is then only one storage device (140 + 240), and the device allowing access to the network (170) is useless.
- a processing unit for example at least one of the units 110 and 210 ( Figures 1 and / or 2);
- the mass memory for example at least one of the hard disks 140 and 240; - in 450, the operating system which allows access to physical information from the mass memory, in the form of a 460 file system.
- the physical links are in solid lines, while the logical links are in dashed line. The rest of FIG. 3 essentially depends on the file system, therefore on the mass memory 440 and the processing unit 410, through the operating system 450.
- an "internal dictionary” (this is the name generally given, in relational databases, to information on the structure of the database, which are stored in a way accessible to the database engine, but inaccessible by direct modification by the user - read only).
- the development tool (general part) which accesses the database 475 (tables, forms, reports, in particular).
- the development tool 500 works with a generator of tables, forms and reports 505, a meta-dictionary 510, an analysis module 530, a selection module 570, as well as a physical reorganization 580, and / or a virtual reorganization module 590.
- Each table is a set of rows; each row is organized into one or more columns; and every column corresponds to a data type. Each row includes one and only one value (the "data") for each column of the table. Physically, data storage can correspond to this table structure, or be different.
- Each table generally comprises at least one column which contains only unique values (is "unique” a value which is found only once in the column of the table considered), or else an equivalent means, number of row for example, arranged or not in column of the table.
- This column (or one of them) is then called the primary key.
- the value of the primary key makes it possible to uniquely find the corresponding line, hence its interest. Any column that contains only unique values can be used as the primary key.
- the set of joins to link two tables together is called a path.
- Some database structures are such that there can be several paths between two of the tables they contain. For example, two Customers and Suppliers tables can be linked by a path passing through an Orders table or by a path passing through a Country table. Conversely, there may not be a path between two tables. For example, two tables "Production” and "History_of_production” although having the same structure and containing data of the same essence have no link (in the sense of the structure of the data) between them.
- joins or links makes it possible to reduce redundancy of information.
- specific relevant information is stored only once (except the primary key - foreign key pairs required for links). This saves space. In addition, updates are made easier.
- P_NAME the modification of the name of the product having a serial number P_SN equal to US1 only implies the modification of a single line (in the table Products) while it consists of three components.
- a form displayed on the screen Access to the data of a table is made by a form displayed on the screen.
- a form is made up of fields, each of which corresponds to a column in a table. It allows to insert new lines in the table (creation), to modify the existing lines, to delete lines.
- a form which allows access to several tables must then respect the integrity constraints of the structure concerned.
- a report presents selected columns of a table, for rows selected by a query (sometimes for the whole table). The result is displayed on the screen and / or printed.
- a report can query and present data from several tables, but it must then know the corresponding structure (foreign keys, primary keys and joins). In addition to reports, it is possible to define graphics displayed on the screen and / or printed.
- Reports and charts are commonly called reports.
- Forms and reports are commonly called programs. And programs access data via queries.
- the reports are accompanied by interrogation requests.
- interrogation request is included in the report and / or separated from the report.
- insertion requests, update requests and deletion requests which are in principle reserved for forms.
- SQL query language Structured Query Language
- the SQL language has been the subject of several standardizations, the latest of which is the SQL-92 ANSI / ISO standard.
- the keywords of SQL orders are presumed to be understood by the reader. If necessary, they can be found in the above-mentioned standard.
- the SQL language allows access to relational databases while respecting an independence between the way in which the data is physically stored and the way of logically handling this data.
- the SQL language makes it possible to express complex processes simply, including in particular nested loops (by means of joins): see examples A-13-1 and A-13-2.
- the use of the SQL language is in no way limiting, and the present invention can just as easily be applied using a lower level language, which would directly attack the file system of the storage device, for example .
- an application brings together a set of programs revolving around a given structure (tables and integrity constraints).
- a menu allows you to select, within an application, the program with which you want to work.
- a development tool is software that provides the programmer with a library of program generators. These program generators make it possible to significantly reduce development times: when using a program generator, the programmer enters parameters rather than writing source code; these parameters are then interpreted to generate a source code program that the programmer can modify or enrich manually, insofar as he masters the source code.
- the development of applications based on database engines follows in principle the following methodological approach: the programmer performs a needs analysis. He can use a design method for this (Merise for example) whose ordered set of operating rules guides him by systematizing the thinking process and avoiding design errors.
- the analysis makes it possible to define a "data model" which expresses the semantics of the data in a relevant manner. (A data model can be represented graphically by an Entity-Relation diagram.).
- the needs analysis phase requires the programmer to have a good knowledge of relational algebra and standardization of the structure of a database.
- the physical structure of the data must be implemented in the database starting from the "data model”.
- the development tool allows you to define forms and reports based on the structure (using the internal dictionary) as well as menus. The user can then enrich these forms, reports and menus.
- the meta-dictionary is subject to possible consistency checks by the development tool and can even integrate certain operating rules of an analysis method (CASE type tools). From this meta-dictionary, generator 505 (figure 3) of the development tool takes care of the following tasks:
- the report generator composes several query queries (usually one per set of tables), and builds a standard report for each query.
- the user can then enrich them.
- the complexity of access to the database is more or less hidden from the user, depending on the quality (level of complexity) of the development tool.
- a specific modification of the data structure is variously well accepted by the development tools: Being a "rustic" tool, the user must manually modify the data structure, as well as the queries in the forms and reports concerned. - On the other hand, with a high performance tool, more complex, it is enough to update the meta-dictionary of the development tool, and this is automatically reflected on the physical structure of the data as well as on the requests of the forms and states.
- a more substantial modification of the data structure is very heavy, if one wishes to keep the data already entered. And the user will come up against several difficulties: a) the new structure, even if it respects relational integrity, may no longer be in line with the needs of the application. The data may become unusable. b) the syntax of the languages (procedural or not) for data access is complex and many modifications not taken into account by the development tool have to be done manually. c) the modifications to be made to requests for forms and reports are complex. d) the modification of the structure is itself delicate and complex if one wishes to keep the existing data (a very simple example: having an article number column, we want to change it from the numeric type to the character type ).
- the invention makes it possible in all cases to start from a single-table structure. It then allows the user himself to modify the data structure, once production has started, according to his new needs. It does not require any particular knowledge of the data access technique (SQL language for example), nor of the principles governing relational data structures. It thus saves analysis time, development time and subsequent modification time. It is suitable for all relational databases on the market, as well as for other data management systems.
- the invention provides an improved development means or tool, which can also be considered as a user environment interface.
- This tool is based on a particular meta-dictionary 510 (FIG. 10), the minimum content of which (for the preferred embodiment) is given in A-21.
- the meta-dictionary repeats certain information already contained in the internal dictionary, such as the name and type of each column in each table. There are many more.
- This meta-dictionary 510 is maintained by a memory automaton, which monitors any event relating to the conditions given in A-21. This monitoring can be limited to the times when the development tool is running (insofar as it has the exclusive right to modify these conditions). It can be wider. It is preferably permanent.
- the invention is also based on a request constructor 520 (FIG. 10).
- the built queries obey a well defined general format, called here canonical form, whose currently preferred version is indicated in A-22, while examples of canonical queries are given in A-23.
- step 910 presents the user with different (available) possibilities for selecting columns, simple and calculated, as well as set functions.
- set function we mean a function covering several lines, such as SUM (), COUNTING (), AVERAGE (), for example.
- the user can construct expressions from columns, and / or set functions. From there, he can: - in 912, create and link restrictive conditions,
- steps 910 to 913 constitute the initialization of the query builder. They are described as made by the user. But the query builder can just as easily operate automatically, receiving this initialization when it is launched, in the form of parameters, which respectively define the elements entered in 911, 912, 913. It is important to note that this query designer can memorize, in the form of character strings, the definition of the queries it has enabled to write.
- Step 920 determines the set of tables to which the columns and expressions defined in 911 to 913 belong. This is done selectively for each of the expressions involved in the canonical request A-22. If a column belongs to two tables, it is the primary key of one and a foreign key of the other. The query generator can associate this column with one or the other of the two tables (the result of the query will be the same). We preferably take the table where the column is the primary key.
- Steps 921 to 925 and 929 determine the path connecting these tables (if there are several).
- step 923 adds one or more tables so that such a path exists, starting from the meta-dictionary. It will be noted however that, as soon as the implementation of the invention starts from a single table, all the tables then created are necessarily linked by one and only one path.
- the first two lines of the query A-22 are therefore constructed (and the optional third line, if the following lines call columns whose tables do not appear in the second line).
- step 940 the “following” of the path (if any) and the satisfaction of the restrictive conditions are treated in a similar manner. This is used to decide (941) to write the WHERE clause, with a list of joins (943) and / or a list of restrictive conditions (949).
- the ORDER BY clause is written in 971, if test 970 indicates so (the sorting criteria may include set functions).
- this query builder takes care of all the checks itself. Just give it, by input, or by passing parameters, the selective list of expressions to be processed.
- the invention also provides for automatically (dynamically) creating one or more forms.
- the value of a column must always correspond to its type (for example, what is entered in a column of date type must be a datum recognized as a date).
- an external key column must have a set of values included in the set of values of the corresponding primary key.
- the invention provides for the automatic generation of reports and graphics, by the module 505.
- a report is established on the basis of an interrogation request. It is possible in a report to define:
- the breaks are proposed according to the structure of the application: either a column COL 2 which belongs to a set of tables ⁇ TAB2- L ... TAB2 p ⁇ , and the previous column COL j which belongs to a set of tables ⁇ TABl lf ... TABl n ⁇ ; these two columns are presented in the same break level, if, for any value of i, from 1 to n, and for any value of j, from 1 to p, the path to go from TABl j ⁇ to TAB2 j does not include no master-detail sense join.
- These breaks correspond to the breakdown into levels of the structure and therefore offer the user a default presentation which is meaningful. The user can however modify them at his discretion and define certain group calculations only for certain levels of rupture as well as insert or delete ruptures.
- Figure 6 illustrates a report showing the name of the product, the serial number of the component, the cost with:
- a graph corresponds to a query and a graphical presentation of column values for certain rows.
- a graph is determined by parameters like:
- a graph is established on the basis of an interrogation request comprising from zero to several columns of the label type and one or more columns of the numeric type. It includes: - The format of the label type column.
- the pie chart only supports a label type column and a numeric type column.
- the XY graph (commonly called curve) does not support a column of the label type and only supports two columns of the numeric type (possibly three-dimensional graphics).
- the module 505 If in the query query, there are more columns of the label type than the graph format supports, the module 505 generates as many exploded graphs as necessary using a mechanism similar to that of breaks.
- the invention makes it possible to ensure that the relationship between the request and the report (report or graph) remains consistent after any reorganization.
- FIGS. 5 and 5A gives an already fitted version of the table structure.
- a layman will generally put everything in one table.
- the user can always start with a single table, which we will name here "Components_and_Products" ( Figure 7). To do this, it simply defines the name and type of the data it wants to enter.
- the development tool updated the meta-dictionary, created a monotable data structure in accordance with the name and type of the columns indicated by the user, and created a default form allowing access to the table, and states.
- FIG. 9 represents the result of an interrogation request from the monotable structure (without filtering).
- the invention can also apply to the case where an experienced developer will make initial choices for the structure of the tables which are guided by his formal analysis and practical considerations, since these choices should be reconsidered during subsequent exploitation (with more and more data).
- the current structure of the database is no longer suitable when certain data is redundant within the table.
- the user is confronted with anomalies because the data model does not or no longer respect the different normal forms of the relational model.
- the present invention provides for this purpose means which can be broken down into three parts (FIG. 3):
- phases ii) and especially iii) are optional. It will indeed be seen that it is also possible, as a variant, or in addition, to implement the invention without physically reorganizing the database (590).
- the analysis phase includes all or part of the operations which will now be described, with reference to Annex III. It is controlled by a 530 module ( Figure 10).
- the method first comprises an analysis (preferably exhaustive) of the repetitions of the values of the columns of the data tables. This analysis will be described with reference to Figure 13 and Annex III.
- the tables of the database are determined, or better those used in the application (1010).
- the analysis consists in determining: - in 1012, the number of rows, N.
- the number of rows in a table (step 1012) is obtained by the request A-31-1, where TABl is the name of the table considered.
- the number of distinct values of a column in a table is obtained by the SQL query A-31-2, where COL1 is the name of the column.
- Ni and Nj be the respective number of distinct values of the two columns C _ and C j (calculated in 1024-1026).
- P the number of distinct pairs of values of C. ⁇ and C j , by the query A- 31-3, where COL1 and COL2 are the names of columns C and C j , while:
- C ⁇ determines C j .
- the column ⁇ is said to be the mother of C j .
- Column C is said to be the daughter of C.
- C j determines C- ⁇ .
- Column C j is said to be the mother of C ⁇
- Column C j _ is said to be a daughter of C j .
- the steps 1050 and 1052 ensure respectively the looping over all the pairs of columns and all the tables until the End 1059.
- the queries mentioned can be developed using the SQL query builder already mentioned (A-22, and 520, figure 10). However, it will often be easier to prepare them individually in advance, in the form of character strings, some elements of which are variable.
- the variable elements are TAB1, COL1, "xyz" (or, which is equivalent to 'xyz'), and COL2.
- step 1024 (for example) of FIG. 13 will recompose the request A-31-2, by replacing there the names COL1 and TAB1 by the name of the column and that of the table , respectively.
- the writing of programs capable of implementing the loops described in FIG. 13 is considered to be accessible to those skilled in the art from the present description, and of a procedural language such as the C language and its variants (C ++ for example ), or SMALLTALK.
- the number of pairs of values, B, which block the mother-daughter dependence between two columns C ⁇ and C j is obtained (step 1210, FIG. 13A) by the query A-32-1, where a and b are aliases defined on the table TABl, and the other variables are defined as above, (a table alias gives access to it under a name other than its own; several aliases on the same table allow access to it in two independent ways).
- Query A-32-2 includes a nested SELECT statement, which counts the number of occurrences (greater than 1) of the pair of values within the table.
- COLl which block dependence. These are COLl values such as the number of distinct COLl values! ! 'xyz'
- C0L2 is greater than 1.
- the query A-32-2 builds on it a main SELECT order, which returns the pairs of values found, given in figure 14, for the example considered. For each element of the set of values of COLl which block the dependence (according to the nested query), we now search for the corresponding values of COL2 as well as their frequency. Likewise for the NULL value of COLl (NULL is the case where nothing has been entered).
- step 1230 it is possible to go directly to step 1230, without doing the preliminary counting of step 1210.
- step 1250 The forcing itself (step 1250, FIG. 13A) is carried out by the SQL order given in A-32-3, where:
- - COLl and COL2 are two columns of the table whose dependence is sought to force in the sense that COLl will determine COL2, - VALUE2 is the new value assigned to COL2,
- - VALUE1 is the value of COLl for the lines to be modified
- - VALUE2__1, VALUES2_2, ... VALUES2_n are the values of COL2 for the lines to be modified.
- the forcing can also include the SQL order given in A-32-5, where: - TAB1 is the name of the table,
- - COLl and COL2 are two columns of the table which one seeks to force the dependence in the sense where COLl will determine COL2,
- - VALUE1B is the new value assigned to COLl
- - VALUE1 is the value of COLl for the lines to be modified
- - VALUE2_1, VALUES2_2, ... VALUES2_n are the values of C0L2 for the lines to be modified.
- Any candidate keys for a group of columns must respect the constraint: Each candidate key must be, opposite all the other columns of the group, either mother or sister. In order for a column to become a candidate key for a group of columns, it is necessary to force the dependency with all the pseudo-daughters of the group (in the event of near-dependence).
- the result of the analysis is expressed and stored in the form of a group of columns, comprising:
- One of the implementations of the invention then consists in physically reorganizing the structure of the tables.
- this function of reorganizing the structure consists in splitting a table into two tables, which will be in master-detail relationship.
- This phase is carried out for example using a windowed user interface like that of FIG. 15, with a window 91 for selecting the tables (here only one), then window 95 for selecting the columns to be isolated from the chosen table, then selection in 96 of a candidate key for the new table (optional).
- the name of this new table is entered in 92, which will be noted TAB2 below.
- the selection-reorganization phases are illustrated in the form of steps in FIG. 16, in the case of interaction with the user, with the screen in FIG. 15.
- step 2010 uses the meta-dictionary to find out the list of tables used by the application, which list is displayed in 2012.
- the user chooses the table TAB1 to be reorganized.
- the system provides in 2020 the list of columns for TAB1.
- these columns are displayed, with the exception of the primary key of TAB1, and the user chooses in 2030 the columns Sj to be isolated.
- steps 2042 to 2046 If, in steps 2042 to 2046, the user has selected a candidate key, which is assumed to be (SI), the reorganization proper is carried out by steps 2048, 2049 and 2060, in accordance with SQL orders A-33;
- the creation of the master table is done by executing the order A-33-1.
- This SQL statement creates a table named TAB2, with the result of a SELECT statement, nested in the CREATE statement.
- TAB1 The modification of the original table TAB1 is done by chaining the 3 orders given in A-33-2.
- TEMPO is a name available at the database level: there is no other table or view with this name). Then we delete TAB1, and we rename TEMPO to TAB1.
- the creation of the details table follows the sequence of steps A-34-1, where CODE_TAB2 is the name of the primary key of the table TAB2.
- CODE_TAB2 and TEMPO are names that must be available at the database level.
- the first line is very similar to query A-33-1, except that instead of directly creating the table TAB2, we go through a TEMPO view. It is the second line which creates TAB2, by adding to the view TEMPO a calculated column CODE_TAB2, whose value is ROWNUM, ie the row rank in the table, after which TEMPO is erased.
- the layout of the original table TAB1 is carried out as indicated in A-34-2.
- the meta-dictionary is updated (step 2065): a) the new master table TAB2 formed by the columns SI, S2, ... Sn is taken into account. b) in the case of the creation of a primary key for TAB2, the creation of the column CODE_TAB2 in TABl is noted; the creation of the column CODE_TAB2 in TAB2 is noted.
- the forms allowing to attack the table TABl are enriched by the column CODE_TAB2, and do not allow any more to attack (seized) the columns SI, S2, ... Sn.
- FIG. 18 represents the form for entering the components used in production after the reorganization of the information relating to the products.
- the P_NAME, P_ID and CATEG fields are no longer directly accessible because they do not belong to the Components table whereas they belonged to the Components_and_Products table; they are determined by the value of the product serial number field (P_SN) which is the external key of the Components table which is linked to the primary key of the Products table.
- P_SN product serial number field
- FIG. 18A represents a simplified form for entering the components used.
- the fields P_ID and CATEG which were automatically filled in the form in figure 18 have been deleted from the form. Only the Product name field (P_NAME) has been kept to verify that the serial number of the product entered is the one desired. Concretely, the process is the same as above, except that instead of matching the fields of the "non-modifiable" property, they are simply removed from the definition of the form concerned.
- Figure 19 shows the result of the query after the reorganization of the information relating to the Products.
- a break (this is a way of illustrating the new structure) was automatically provided by the report generator, in response to the presence of a new table. Like forms, such a break can be simply determined by the "Break" property associated with the column under consideration, in the definition of the report under consideration. Other illustrations than rupture are possible.
- the selection / reorganization cycle (or the analysis / selection / reorganization cycle) can be repeated as many times as necessary, at least as long as there remains a group of linked columns to be processed.
- FIG. 17 shows the join relationships of the structure with the three Components, Products and Categories tables.
- FIG. 19A represents the result of the interrogation request on these three tables. A second break was made automatically.
- the physical reorganization of the tables has the following effect: - in data entry mode (form), access for entry (modification, creation, deletion) is restricted to only part of the columns of a group of linked columns, the others being simply accessible in reading, or inaccessible; - to access these other columns in writing, you must use a different form, processing the new table created.
- This other form can be made accessible from the first, in particular under the following conditions:
- the analysis carried out can only be used to modify the operation of the input forms, by acting as if the database structure had been reorganized ("virtual reorganization" of the database of data).
- the layout of the forms is then carried out from the dependencies (and keys) memory 550.
- the layout takes into account the levels of kinship between the linked columns of each group.
- a link key which is one of the candidate keys, selected by the user, or chosen automatically.
- the field P_ID (COL2) is automatically filled in by the result of the request A-36 searching for "US1" in the column P_SN (COLl) of the already existing rows.
- the keyword DISTINCT is optional in request A-36, since any line found contains the result sought for COL2 (P_ID).
- the user can overwrite the values entered by default. It can thus break the dependence between two columns and the operation of the virtual reorganization mode is then interrupted for this pair of columns.
- step 2270 the same is done for the direct daughter columns, which are also placed in the container.
- A is the direct daughter of B if there is no column C, other than A and B, such that B is the mother of C and C is the mother of A).
- B is the mother of C and C is the mother of A.
- the process ends in 2290 when the container is empty. This makes it possible to treat the sisters of the field targeted in 2200, as well as to completely browse the tree of his daughters, granddaughters (and their sisters), and so on.
- a modification is equivalent to a deletion followed by a creation.
- the present invention profoundly modifies the usual development cycle: The functional dependencies are deduced from the dependencies between the data. This minimizes the analysis phase. This also makes it possible to reveal dependencies not envisaged or not present at the origins.
- the present invention can therefore also be used as a tool for finding correlations between data and as a tool for partitioning data into domains.
- the invention thus provides a means allowing in particular to reorganize a database, after a certain operating time.
- This means can be implemented on the sole initiative of the user. It can be offered at regular intervals, over time, or depending on the growth of the database.
- the analysis part can be implemented automatically, at least partially, in order to determine if there is indeed something to do.
- this analysis part is kept up to date in the background. More generally, the analysis means and its storage means then operate permanently, dynamically (rather than from time to time, at the request of the user, or at the request of the development tool).
- the station (or one of them) can be equipped with an automaton ("trigger"), which triggers the update of the analysis as soon as an event occurs likely to influence this analysis. To do this, it may be necessary to keep a permanent record (directly or indirectly) of all the variables processed during the analysis phase.
- an automaton which triggers the update of the analysis as soon as an event occurs likely to influence this analysis.
- the analysis phase is not necessarily followed by such a reorganization.
- the user can indeed refuse the reorganization for various reasons, such as the time it will take, or the fact that he is not convinced of the usefulness of dividing his table into two parts, for example.
- the present invention applies in general to any type of data, any type of relationship between data, any type of database (not necessarily relational), any type of language for accessing databases, any type of database architecture, all types of operating systems, all types of storage media and storage systems.
- the invention therefore extends its effects to any file management system, accessible by a programming language which would make it possible to write the detailed equivalent of the SQL orders mentioned in the description.
- the power of the SQL query builder described means that it can generate (in particular) all the desired statistical queries. With other languages, it may be necessary to create a module (procedure) for each of these statistical queries. In this case, the structure of figure 10 can be found modified: the statistical tool 520 would only receive data to be processed, while the tables to access the columns would be queried directly by the analysis module 530.
- each statistical operation is carried out on all the rows, for one or two columns chosen.
- a variant of the SELECT COUNT of appendices A-31 would consist in browsing the values of the column considered, by making a detection of new value: if the value has already been encountered, we go to the next; otherwise we reference this value as being one of the values taken by the column, we increment a counter and we pass to the next value. At the end, the value of the counter gives the number of distinct occurrences taken by the column.
- each set for a first column is equal to one set for the second column, or if several sets for the second column are combined, then the first column is the daughter of the second.
- a comparison operation between meetings and / or intersections of row lists (of a predefined format) is accessible to the person skilled in the art. From there, the realization of an automaton performing the above functions to perform the analysis is also. This automaton is more efficient than the aforementioned SELECT orders, at least for certain applications.
- storage in the form of a line is not compulsory.
- index the list of line numbers
- the reconstruction of a line is then done dynamically by finding the different values of the columns of this line by searching for the line number in the lists assigned to each column.
- TABl (COLl, COL2)
- TAB2 (COL2, COL3)
- COL2 will become an external key of TABl and the primary key of TAB2.
- the indexes of the lines which block the dependence are easily obtained by considering the empty intersections of the sets of indexes which are neither empty, nor equal to the set of indexes of the almost child column.
- appendix A-27-3 shows that the dependency is blocked by the set of two sets of index following:
- the dependency When deleting a row, the dependency is only restored if the set of index sets blocking the dependency contains only one element.
- deleting line 1 gives ⁇ 2 ⁇ , ⁇ 6 ⁇ . There is more than one set, so dependency is not restored. But deleting line 2 gives ⁇ 6 ⁇ . Addiction is restored.
- the modification of a line can be treated for example as the sequence of a deletion and a creation in any order.
- the invention could also be expressed in the form of a process. It also results in a significantly improved development tool.
- the simplest version of this development tool includes the means to perform the following functions:
- said tool will include the means allowing the following operations to be carried out, in order to accommodate (import) an already existing file:
- each line is divided into an identical number of columns.
- the data can only be in numeric, alphanumeric or date format: For a column to be in numeric format, all the values of this column must be in numeric format. For a column to be in date format, all the values in that column must be in date format. If not, the column is in alphanumeric format.
- the width and the precision of the column are determined: The width is equal to the sum of the maximum number of digits located in front of the decimal separator and the maximum number of digits located behind the decimal separator. The precision is equal to the maximum number of digits behind the decimal separator.
- the width of the column is determined. The width is equal to the maximum number of characters in the column.
- the SQL order is of the form given in A-35, where TABl is the name of the table to be created, COLl ... COLn the name of the columns of the table and FORMAT1 ... FORMATn the corresponding formats; FORMATi is one of the following forms:
- VARCHAR2 (L) for the alphanumeric format where L is the width of the column.
- the tool can also include the means making it possible to carry out the following operations: - creating forms allowing access to the data contained in the tables, while validating the insertion, deletion or modification of a line can only be done if the following constraints are respected:.
- the value of a column must always match its format.
- a primary key column should have only unique values.
- An external key column must have a set of values included in the set of values for the corresponding primary key.
- the development tool advantageously includes means for generating states, with:
- - a graphics generator to present the results of query requests where the user simply specifies the query query comprising from zero to several columns of the label type and one or more columns of the numeric type, the format of the column of the label type (possibly), the format of the graph.
- the analysis module could stop based on selected criteria, including the fact that it has already found a group of linked columns (without false dependencies). We could also follow directly on the re-structuring, without selection by the user, as soon as the analysis gives a group of linked columns.
- the link key could then be chosen from among the candidate keys, such as:
- a calculated column is a formula made up:
- PRIX_HT * 1.206 where * is the symbol of the multiplication and PRIX_HT a column of a table corresponding to a price without tax.
- TAUX_TVA is a column of a table corresponding to a rate of VAT.
- NO_FACTURE Counting
- Sum (PRICE_UNITARY * QUANTITYJZOMMANDED) where Sum () is a summation operator, PRICE_UNITARY is the unit price column of an item table and QUANTITY_COMMANDED, the ordered quantity column of an order table.
- a restrictive condition is composed:
- an interrogation request consists of:
- SELECT * FROM STUDENTS UNIVERSITES returns a Cartesian product of the two tables (n times p rows), which can be considered as a new table reviewing all the universities for each student.
- UNIVERSITY CODE UNIVERSITY. CODEUNIVERSITE returns n lines only (1 per student). The SQL engine created a loop to arrive at this result: we loop on the STUDENTS table and for each student we search for information from his university thanks to the CODEUNIVERSITE.
- the meta-dictionary (of the development tool), which is stored in the database, contains, as a minimum:
- This SELECT SQL order is broken down into several lines (Here, with the exceptions mentioned or obvious, indented lines are in principle part of the same SQL order as the previous line).
- the first line of the query indicates in "Col- ⁇ , Col 2 , ... Col n " the list of columns (simple or calculated) that will be extracted by the query.
- the second line determines in TAB-L, TAB 2 , ... TAB p ⁇ the set of tables to which the simple columns and the components of the calculated columns belong, first for the list of columns, but also for the rest of the SQL order, essentially the restrictive conditions.
- VAL i a value, a set of values or nothing according to the choice of the operator OP ⁇
- the fourth line of the query concerns the restrictive conditions for columns (simple or calculated) which do not contain, directly or indirectly, a set function; it is of the form:
- the third and fourth lines are preceded by the keyword 'WHERE' and are linked, if both exist by the keyword 'AND'.
- the fifth line of the query concerns columns (simple or calculated) which do not contain, directly or indirectly, a set function; Let COL_NON_ENS 1 , ... COL_NON_ENS u these columns. Then come in the sixth line the restrictive conditions on the calculated columns which all contain at least one set function denoted COL_ENS i .
- a sort order has a column (simple or calculated) and an ASC (ascending) or DESC (descending) direction.
- the third part is present as soon as two tables (at least) are involved in the list of columns or conditions.
- the rest is optional.
- Sort order Product name, ascending.
- the Product Serial Number and Product Name columns belong to the same PRODUCTS table.
- SQL asks to remove the ambiguity by specifying in front of the name of columns which could cause confusion the name of the table to which they relate (followed by an expected separator, in SQL one point).
- Line D3 or lines D4 block dependence, because D cannot determine both 3 and 4, whether there is dependence.
- El and E2 are the names of these tables for COLl and C0L2, we will write for example:
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP98928350A EP0988607A1 (fr) | 1997-06-12 | 1998-05-20 | Dispositif d'analyse et d'organisation de donnees |
US09/445,751 US6553383B1 (en) | 1997-06-12 | 1998-05-20 | Device for data analysis and organization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9707305A FR2764719B1 (fr) | 1997-06-12 | 1997-06-12 | Dispositif d'analyse et d'organisation de donnees |
FR97/07305 | 1997-06-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1998057272A1 true WO1998057272A1 (fr) | 1998-12-17 |
Family
ID=9507905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR1998/001015 WO1998057272A1 (fr) | 1997-06-12 | 1998-05-20 | Dispositif d'analyse et d'organisation de donnees |
Country Status (4)
Country | Link |
---|---|
US (1) | US6553383B1 (fr) |
EP (1) | EP0988607A1 (fr) |
FR (1) | FR2764719B1 (fr) |
WO (1) | WO1998057272A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2395587A (en) * | 2002-11-25 | 2004-05-26 | Sun Microsystems Inc | Method for restructuring data which may be in a relational database |
DE10208959B4 (de) * | 2002-02-28 | 2006-10-12 | Equero Future Net Technologies Ag | Verfahren und Vorrichtung zur Erfassung und Auswertung von in einem Rechnernetzwerk abgelegten Informationen |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7174340B1 (en) * | 2000-08-17 | 2007-02-06 | Oracle International Corporation | Interval-based adjustment data includes computing an adjustment value from the data for a pending adjustment in response to retrieval of an adjusted data value from a database |
US7213013B1 (en) * | 2001-06-18 | 2007-05-01 | Siebel Systems, Inc. | Method, apparatus, and system for remote client search indexing |
US7546287B2 (en) * | 2001-06-18 | 2009-06-09 | Siebel Systems, Inc. | System and method to search a database for records matching user-selected search criteria and to maintain persistency of the matched records |
US7464072B1 (en) | 2001-06-18 | 2008-12-09 | Siebel Systems, Inc. | Method, apparatus, and system for searching based on search visibility rules |
US7406455B1 (en) * | 2002-01-17 | 2008-07-29 | International Business Machines Corporation | Automatic recognition and flagging of anomalous items within sets of automatically classified items |
EP1485871A2 (fr) * | 2002-02-27 | 2004-12-15 | Michael Rik Frans Brands | Solution d'integration de donnees et de gestion de connaissances |
JP3861044B2 (ja) * | 2002-10-24 | 2006-12-20 | 株式会社ターボデータラボラトリー | 連鎖したジョインテーブルのツリー構造への変換方法、および、変換プログラム |
US7203694B2 (en) * | 2002-12-20 | 2007-04-10 | International Business Machines Corporation | System and method for multicolumn sorting in a single column |
TWI245514B (en) * | 2002-12-20 | 2005-12-11 | Hon Hai Prec Ind Co Ltd | System and method for displaying relevant events of networking devices |
US7373354B2 (en) * | 2004-02-26 | 2008-05-13 | Sap Ag | Automatic elimination of functional dependencies between columns |
GB2420192A (en) * | 2004-11-12 | 2006-05-17 | Quadstone Ltd | Formulating and refining queries on structured data |
US7739290B2 (en) * | 2004-12-17 | 2010-06-15 | Sap (Ag) | System and method for object persistence |
WO2006096939A1 (fr) * | 2005-03-18 | 2006-09-21 | Kwok Kay Wong | Acces a distance a des donnees heterogenes |
US7454449B2 (en) * | 2005-12-20 | 2008-11-18 | International Business Machines Corporation | Method for reorganizing a set of database partitions |
US8346725B2 (en) * | 2006-09-15 | 2013-01-01 | Oracle International Corporation | Evolution of XML schemas involving partial data copy |
US7870163B2 (en) * | 2006-09-28 | 2011-01-11 | Oracle International Corporation | Implementation of backward compatible XML schema evolution in a relational database system |
US9235846B2 (en) | 2013-03-13 | 2016-01-12 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for populating a table having null values using a predictive query interface |
US10311364B2 (en) | 2013-11-19 | 2019-06-04 | Salesforce.Com, Inc. | Predictive intelligence for service and support |
US9483545B2 (en) * | 2014-05-30 | 2016-11-01 | International Business Machines Corporation | Grouping data in a database |
US10331947B2 (en) * | 2017-04-26 | 2019-06-25 | International Business Machines Corporation | Automatic detection on string and column delimiters in tabular data files |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5481703A (en) * | 1992-09-30 | 1996-01-02 | Kabushiki Kaisha Toshiba | Database restructuring system for detecting functionally dependent relations and converting them into third normal form |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6061515A (en) * | 1994-07-18 | 2000-05-09 | International Business Machines Corporation | System and method for providing a high level language for mapping and accessing objects in data stores |
US5627979A (en) * | 1994-07-18 | 1997-05-06 | International Business Machines Corporation | System and method for providing a graphical user interface for mapping and accessing objects in data stores |
US5870746A (en) * | 1995-10-12 | 1999-02-09 | Ncr Corporation | System and method for segmenting a database based upon data attributes |
US5832496A (en) * | 1995-10-12 | 1998-11-03 | Ncr Corporation | System and method for performing intelligent analysis of a computer database |
US6076088A (en) * | 1996-02-09 | 2000-06-13 | Paik; Woojin | Information extraction system and method using concept relation concept (CRC) triples |
US6026398A (en) * | 1997-10-16 | 2000-02-15 | Imarket, Incorporated | System and methods for searching and matching databases |
US5918232A (en) * | 1997-11-26 | 1999-06-29 | Whitelight Systems, Inc. | Multidimensional domain modeling method and system |
US6076091A (en) * | 1997-12-09 | 2000-06-13 | International Business Machines Corporation | Method and system for providing a flexible and extensible database interactive on-line electronic catalog |
US6138121A (en) * | 1998-05-29 | 2000-10-24 | Hewlett-Packard Company | Network management event storage and manipulation using relational database technology in a data warehouse |
US6243713B1 (en) * | 1998-08-24 | 2001-06-05 | Excalibur Technologies Corp. | Multimedia document retrieval by application of multimedia queries to a unified index of multimedia data for a plurality of multimedia data types |
-
1997
- 1997-06-12 FR FR9707305A patent/FR2764719B1/fr not_active Expired - Fee Related
-
1998
- 1998-05-20 EP EP98928350A patent/EP0988607A1/fr not_active Withdrawn
- 1998-05-20 WO PCT/FR1998/001015 patent/WO1998057272A1/fr not_active Application Discontinuation
- 1998-05-20 US US09/445,751 patent/US6553383B1/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5481703A (en) * | 1992-09-30 | 1996-01-02 | Kabushiki Kaisha Toshiba | Database restructuring system for detecting functionally dependent relations and converting them into third normal form |
Non-Patent Citations (1)
Title |
---|
SOUTOU C ET AL: "Automatic generation of SQL queries to improve knowledge discovery in relational databases", PADD97 PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON THE PRACTICAL APPLICATION OF KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON THE PRACTICAL APPLICATION OF KNOWLEDGE DISCOVERY AND DATA MINING PADD 9, ISBN 0-9525554-7-6, 1997, BLACKPOOL, UK, PRACTICAL APPLICATION CO, UK, pages 227 - 242, XP002057126 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE10208959B4 (de) * | 2002-02-28 | 2006-10-12 | Equero Future Net Technologies Ag | Verfahren und Vorrichtung zur Erfassung und Auswertung von in einem Rechnernetzwerk abgelegten Informationen |
GB2395587A (en) * | 2002-11-25 | 2004-05-26 | Sun Microsystems Inc | Method for restructuring data which may be in a relational database |
Also Published As
Publication number | Publication date |
---|---|
FR2764719A1 (fr) | 1998-12-18 |
FR2764719B1 (fr) | 2001-07-27 |
US6553383B1 (en) | 2003-04-22 |
EP0988607A1 (fr) | 2000-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0988607A1 (fr) | Dispositif d'analyse et d'organisation de donnees | |
US20040267823A1 (en) | Reconcilable and undoable file system | |
WO1995012855A1 (fr) | Systeme de controle d'une base de donnees relationnelle selon une logique d'acces orientee objet limitant le nombre des acces a ladite base de donnees, et procede correspondant | |
EP1515239A1 (fr) | Procédé et systéme de manipulation de données issues de bases de données multidimensionnelles à l'aide d'un tableur | |
WO2008107338A1 (fr) | Procedes d'extraction, de combinaison, de synthese et de visualisation de donnees multidimensionnelles provenant de differentes sources | |
WO2002027566A1 (fr) | Moteur de recherche | |
FR2668271A1 (fr) | Interface de modelisation dynamique des performances des systemes informatiques. | |
EP2188744B1 (fr) | Installation de gestion d'une base de données | |
EP0969391A1 (fr) | Procédé pour l'optimisation des accès à une base de données | |
FR2715486A1 (fr) | Procédé de comparaison de fichiers informatiques. | |
FR2859801A1 (fr) | Procede de traitement de donnees sur la base de structures dynamiques d'elements simples | |
FR2844372A1 (fr) | Procede d'organisation d'une base de donnees numeriques sous une forme tracable | |
EP1774441A1 (fr) | Procede de traitement de donnees logiciel associe | |
Gunderloy et al. | SQL Server's Developer's Guide to OLAP with Analysis Services | |
Watson | Beginning C# 2005 databases | |
EP1700233B1 (fr) | Procede d'organisation d'une base de donnees | |
EP0685802A1 (fr) | Système d'information pour la consultation d'informations centralisées en provenance d'applications opérationnelles | |
FR2917518A1 (fr) | Procede de tri d'informations | |
FR3102594A1 (fr) | Ensemble de génération d’application, méthode et programme associés | |
WO2003012682A1 (fr) | Systeme de gestion d'une base de donnees topologique | |
WO2007088254A1 (fr) | Systeme d'information structure, relationnel et incremental | |
FR3096157A1 (fr) | procédé d’indexation multidimensionnelle de contenus textuels | |
Saltin | Interactive visualization of financial data: development of a visual data mining tool | |
Intelligence et al. | Beginning DAX with Power BI | |
WO2010086523A1 (fr) | Systeme informatique de gestion de donnees historisees dans un outil de gestion de versions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1998928350 Country of ref document: EP Ref document number: 09445751 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref document number: 1999501699 Country of ref document: JP |
|
WWP | Wipo information: published in national office |
Ref document number: 1998928350 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1998928350 Country of ref document: EP |