ES2713097T3

ES2713097T3 - Method and apparatus to extract information from a database

Info

Publication number: ES2713097T3
Application number: ES09164490T
Authority: ES
Inventors: Håkan Wolgé
Original assignee: Qliktech International AB
Current assignee: Qliktech International AB
Priority date: 2008-07-18
Filing date: 2009-07-03
Publication date: 2019-05-17
Anticipated expiration: 2029-07-03
Also published as: DK2146292T3; SE0801708L; CN101635001A; CN101635001B; SE532252C2

Abstract

Método implementado en el ordenador para extraer información a partir de una base de datos, donde dicho método incluye una cadena secuencial de cálculos principales que incluye un primer cálculo principal (P1) que opera un primer elemento de selección (S1) en un conjunto de datos (R0) que representa la base de datos para producir un resultado intermedio (R1), y un segundo cálculo principal (P2) que opera un segundo elemento de selección (S2) en el resultado intermedio (R1) para producir un resultado final (R2), donde dicho método incluye además la recuperación del resultado final mediante las etapas de: (a) calcular un primer valor de identificador de selección (ID1) como una huella digital estadísticamente única generada por una función hash de al menos el primer elemento de selección (S1); (b) buscar, en los objetos de la estructura de datos, el primer valor de identificador de selección (ID1) y, si se encuentra el primer valor de identificador de selección (ID1), localizar y recuperar un primer identificador de resultado (ID2), almacenado con el primer valor de identificador de selección (ID1), como objetos asociados en una iteración precedente; (c) si el primer identificador de resultado (ID2) se encuentra en la subetapa (b), calcular un segundo valor de identificador de selección (ID3) como una huella digital estadísticamente única generada por una función hash de al menos el segundo elemento de selección (S2) y el primer identificador de resultado (ID2) recuperado, y buscar, en los objetos de la estructura de datos, el segundo valor de identificador de selección (ID3) y, si se encuentra el segundo valor de identificador de selección (ID3), localizar y recuperar un resultado final (R2), almacenado con el segundo valor de identificador de selección (ID3), como objetos asociados en una iteración precedente; (d) si el primer identificador de resultado (ID2) no se encuentra en la subetapa (b), ejecutar el primer cálculo principal (P1) para producir el resultado intermedio (R1) y el primer valor de identificador de resultado (ID2) como una huella digital generada por una función hash del resultado intermedio (R1), almacenar el primer valor de identificador de selección (ID1) y el primer valor de identificador de resultado (ID2) como objetos asociados en la estructura de datos; y almacenar el primer valor de identificador de resultado (ID2) y el resultado intermedio (R1) como objetos asociados en la estructura de datos, calcular un segundo valor de identificador de selección (ID3) como una huella digital estadísticamente única generada por una función hash del primer valor de identificador de resultado (ID2) y el segundo elemento de selección (S2), y buscar en los objetos de la estructura de datos basándose en el segundo valor de identificador de selección (ID3) y, si se encuentra el segundo valor de identificador de selección (ID3), localizar y recuperar un resultado final (R2) almacenado con el segundo valor de identificador de selección (ID3) como objetos asociados en una iteración precedente; (e) si el resultado final (R2) no se encuentra en la subetapa (c) o (d), buscar, en los objetos de la estructura de datos basados en el primer valor de identificador de resultado (ID2); (f) si el primer valor de identificador de resultado (ID2) no se encuentra en la subetapa (e), ejecutar el primer cálculo principal (P1) para producir el resultado intermedio (R1) y el primer valor de identificador de resultado (ID2) como una huella digital generada por una función hash del resultado intermedio (R1), almacenar el primer valor de identificador de resultado (ID2) y el resultado intermedio (R1) como objetos asociados en la estructura de datos, y ejecutar el segundo cálculo principal (P2) para producir el resultado final (R2) y almacenar el segundo valor de identificador de selección (ID3) y el resultado final (R2) como objetos asociados en la estructura de datos; y (g) si el primer valor de identificador de resultado (ID2) se encuentra en la subetapa (e), recuperar el resultado intermedio (R1) almacenado con el primer valor de identificador de resultado (ID2) como objetos asociados en una iteración precedente, y ejecutar el segundo cálculo principal (P2) para producir el resultado final (R2) y almacenar el segundo valor de identificador de selección (ID3) y el resultado final (R2) como objetos asociados en la estructura de datos.Method implemented in the computer to extract information from a database, where said method includes a sequential chain of main calculations that includes a first main calculation (P1) that operates a first selection element (S1) in a data set (R0) representing the database to produce an intermediate result (R1), and a second main calculation (P2) that operates a second selection element (S2) in the intermediate result (R1) to produce a final result (R2 ), where said method also includes the recovery of the final result by the steps of: (a) calculating a first selection identifier value (ID1) as a statistically unique fingerprint generated by a hash function of at least the first selection element (S1); (b) search, in the objects of the data structure, the first selection identifier value (ID1) and, if the first selection identifier value (ID1) is found, locate and retrieve a first result identifier (ID2 ), stored with the first selection identifier value (ID1), as associated objects in a previous iteration; (c) If the first result identifier (ID2) is in the sub-stage (b), calculate a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of at least the second element of selection (S2) and the first result identifier (ID2) retrieved, and to search, in the objects of the data structure, the second selection identifier value (ID3) and, if the second selection identifier value is found ( ID3), locate and retrieve a final result (R2), stored with the second selection identifier value (ID3), as associated objects in a previous iteration; (d) if the first result identifier (ID2) is not in the sub-stage (b), execute the first main calculation (P1) to produce the intermediate result (R1) and the first result identifier value (ID2) as a fingerprint generated by a hash function of the intermediate result (R1), storing the first selection identifier value (ID1) and the first result identifier value (ID2) as associated objects in the data structure; and store the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure, calculate a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of the first result identifier value (ID2) and the second selection element (S2), and search the objects in the data structure based on the second selection identifier value (ID3) and, if the second value is found of selection identifier (ID3), locate and retrieve a final result (R2) stored with the second selection identifier value (ID3) as associated objects in a preceding iteration; (e) if the final result (R2) is not found in the sub-stage (c) or (d), search the data structure objects based on the first result identifier value (ID2); (f) If the first result identifier value (ID2) is not in the sub-stage (e), execute the first main calculation (P1) to produce the intermediate result (R1) and the first result identifier value (ID2 ) as a fingerprint generated by a hash function of the intermediate result (R1), store the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure, and execute the second main calculation (P2) to produce the final result (R2) and store the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure; and (g) if the first result identifier value (ID2) is in the sub-stage (e), retrieve the intermediate result (R1) stored with the first result identifier value (ID2) as associated objects in a preceding iteration , and execute the second main calculation (P2) to produce the final result (R2) and store the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure.

Description

DESCRIPCIONDESCRIPTION

Metodo y aparato para extraer informacion de una base de datosMethod and apparatus for extracting information from a database

Campo tecnicoTechnical field

[0001] La presente invencion se refiere a tecnicas para extraer informacion de una base de datos y, en particular, a tecnicas que implican una cadena secuencial de calculos principales que comprende un primer calculo principal que opera un primer elemento de seleccion en un conjunto de datos que representan la base de datos para producir un primer resultado, y un segundo calculo principal que opera un segundo elemento de seleccion en el primer resultado para producir un segundo resultado.[0001] The present invention relates to techniques for extracting information from a database and, in particular, to techniques involving a sequential string of main calculations comprising a first principal calculation that operates a first selection element in a set of data representing the database to produce a first result, and a second main calculation that operates a second selection element in the first result to produce a second result.

Estado de la tecnicaState of the art

[0002] Con frecuencia, se desea extraer informacion especifica de una base de datos y, especialmente, resumir una gran cantidad de datos en la base de datos y presentar los datos resumidos a un usuario de una forma clara. Dicho procesamiento de datos normalmente se lleva a cabo por un ordenador y puede requerir una capacidad de memoria y una potencia de tratamiento significativas por parte del ordenador. El procesamiento de datos puede tener como objetivo la creacion de una estructura de datos grande comunmente conocida como un cubo multidimensional, a la que el usuario puede a su vez acceder para explorar los datos de la base de datos, por ejemplo visualizando datos seleccionados en las tablas dinamicas o graficamente en graficos 2D y 3D. Un ejemplo de un algoritmo eficaz para crear dicho cubo multidimensional se conoce de la patente US705862.[0002] Frequently, it is desired to extract specific information from a database and, especially, to summarize a large amount of data in the database and present the summarized data to a user in a clear manner. Said data processing is normally carried out by a computer and may require significant memory capacity and processing power by the computer. The data processing can have as its objective the creation of a large data structure commonly known as a multidimensional cube, which the user can in turn access to explore the data of the database, for example by visualizing selected data in the databases. dynamic tables or graphically in 2D and 3D graphics. An example of an effective algorithm for creating said multidimensional cube is known from patent US705862.

[0003] Este algoritmo del estado de la tecnica, como muchos otros algoritmos que operan en datos en una base de datos, implica una cadena secuencial de calculos principales, en los que el resultado de un calculo principal se usa como datos de entrada por un calculo principal posterior. Por ejemplo, en el contexto de la patente US7058621, el registro de datos de la base de datos se lee en la memoria principal, a partir de lo cual un usuario puede seleccionar una o mas variables y, opcionalmente, un valor o rango de valores para cada variable, provocando de esta forma que el algoritmo extraiga un subconjunto correspondiente del registro de datos en la base de datos. El subconjunto extraido forma un resultado intermedio. El cubo multidimensional se calcula a continuacion evaluando una funcion matematica seleccionada en el subconjunto extraido, donde la evaluacion de la funcion matematica se lleva a cabo basandose en un conjunto seleccionado de variables de calculo, y donde las dimensiones del cubo se obtienen de un conjunto seleccionado de variables de clasificacion.[0003] This algorithm of the state of the art, like many other algorithms that operate on data in a database, involves a sequential chain of main calculations, in which the result of a main calculation is used as input data by a later main calculation. For example, in the context of the patent US7058621, the data record of the database is read in the main memory, from which a user can select one or more variables and, optionally, a value or range of values for each variable, thus causing the algorithm to extract a corresponding subset of the data record in the database. The extracted subset forms an intermediate result. The multidimensional cube is then calculated by evaluating a mathematical function selected in the extracted subset, where the evaluation of the mathematical function is carried out based on a selected set of calculation variables, and where the cube dimensions are obtained from a selected set of classification variables.

[0004] Aunque el algoritmo del estado de la tecnica es eficaz, todavia puede necesitar llevar a cabo un gran numero de operaciones para crear el cubo multidimensional, especialmente si se deben analizar grandes cantidades de datos. En dichas situaciones, el algoritmo puede establecer requisitos indeseablemente altos en el hardware de procesamiento y/o presentar un tiempo de calculo mas largo de lo preferible.[0004] Although the algorithm of the state of the art is effective, it may still need to perform a large number of operations to create the multidimensional cube, especially if large amounts of data must be analyzed. In such situations, the algorithm may set undesirably high requirements in the processing hardware and / or present a calculation time longer than preferable.

[0005] La patente US2006/0230024 se refiere a un metodo para una infraestructura cache basada en el contexto para habilitar la consulta en el subconjunto acerca de un objeto en cache. En respuesta a la deteccion de una consulta a un contexto raiz de un arbol de contexto, se atraviesa el arbol hasta un contexto original de un subcontexto que corresponde con el par de nombre y valor, que es identificado por un usuario en la consulta. Si el contexto original almacena en cache todos resultados de la consulta, los resultados de la consulta se iteran y los pares de nombre y valor restantes se ignoran. Sin embargo, si el contexto original no almacena en cache todos los resultados de la consulta, la etapa de atravesar se repite para el siguiente contexto original del subcontexto hasta que se encuentra un contexto raiz. Si se encuentra un contexto raiz, se expide una consulta a la base de datos para el par de nombre y valor y el resultado de la consulta a la base de datos se almacena en cache en un contexto nuevo.[0005] Patent US2006 / 0230024 refers to a method for a context-based cache infrastructure to enable query in the subset about an object in cache. In response to the detection of a query to a root context of a context tree, the tree is traversed to an original context of a subcontext corresponding to the name and value pair, which is identified by a user in the query. If the original context caches all results of the query, the results of the query are iterated and the remaining name and value pairs are ignored. However, if the original context does not cache all the results of the query, the traverse step is repeated for the next original context of the subcontext until a root context is found. If a root context is found, a query is issued to the database for the name and value pair and the result of the query to the database is cached in a new context.

ResumenSummary

[0006] Un objetivo de la invencion es superar al menos parcialmente una o mas de las limitaciones del estado de la tecnica identificadas anteriormente.[0006] An objective of the invention is to overcome at least partially one or more of the limitations of the state of the art identified above.

[0007] Este y otros objetivos, que apareceran en la descripcion siguiente, se consiguen al menos parcialmente mediante un metodo, un medio legible por ordenador y un aparato segun las reivindicaciones independientes, donde los ejemplos de realizacion de los mismos se definen por las reivindicaciones dependientes.[0007] This and other objects, which will appear in the following description, are achieved at least partially by a method, a computer-readable medium and an apparatus according to the independent claims, wherein the examples of embodiment thereof are defined by the claims dependents

[0008] Un primer aspecto de la invencion es un metodo implementado por ordenador para extraer informacion de una base de datos, donde dicho metodo comprende una cadena secuencial de calculos principales que comprende un primer calculo principal que opera un primer elemento de seleccion en un conjunto de datos que representa la base de datos para producir un resultado intermedio, y un segundo calculo principal que opera un segundo elemento de seleccion en el resultado intermedio para producir un resultado final, donde dicho metodo comprende ademas la recuperacion del resultado final mediante las etapas de: [0008] A first aspect of the invention is a computer implemented method for extracting information from a database, wherein said method comprises a sequential string of main calculations comprising a first principal calculation that operates a first selection element in a set of data representing the database to produce an intermediate result, and a second main calculation that operates a second selection element in the intermediate result to produce a final result, where said method further comprises the recovery of the final result by the stages of :

(a) calcular un primer valor de identificador de seleccion (ID1) como una huella digital estad^sticamente unica generada por una funcion hash de al menos el primer elemento de seleccion (S1);(a) calculating a first selection identifier value (ID1) as a statically unique fingerprint generated by a hash function of at least the first selection element (S1);

(b) buscar, en los objetos de la estructura de datos, el primer valor de identificador de seleccion (ID1) y, si se encuentra el primer valor de identificador de seleccion (ID1), localizar y recuperar un primer identificador de resultado (ID2) almacenado con el primer valor de identificador de seleccion (ID1) como objetos asociados en una iteracion precedente;(b) searching, in the objects of the data structure, the first selection identifier value (ID1) and, if the first selection identifier value (ID1) is found, locating and retrieving a first result identifier (ID2) ) stored with the first selection identifier value (ID1) as associated objects in a preceding iteration;

(c) si el primer identificador de resultado (ID2) se encuentra en la subetapa (b),(c) if the first result identifier (ID2) is found in sub-step (b),

calcular un segundo valor de identificador de seleccion (ID3) como una huella digital estadisticamente unica generada por una funcion hash de al menos el segundo elemento de seleccion (S2) y el primer identificador de resultado (ID2) recuperado, ycalculating a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of at least the second selection element (S2) and the first result identifier (ID2) recovered, and

buscar, en los objetos de la estructura de datos, el segundo valor de identificador de seleccion (ID3) y, si se encuentra el segundo valor de identificador de seleccion (ID3), localizar y recuperar un resultado final (R2) almacenado con el segundo valor de identificador de seleccion (ID3) como objetos asociados en una iteracion precedente;find, in the objects of the data structure, the second value of selection identifier (ID3) and, if the second value of selection identifier (ID3) is found, locate and recover a final result (R2) stored with the second selection identifier value (ID3) as associated objects in a preceding iteration;

(d) si no se encuentra el primer identificador de resultado (ID2) en la subetapa (b),(d) if the first result identifier (ID2) is not found in sub-step (b),

ejecutar el primer calculo principal (P1) para producir el resultado intermedio (R1) y el primer valor de identificador de resultado (ID2) como una huella digital generada por una funcion hash del resultado intermedio (R1),executing the first principal calculation (P1) to produce the intermediate result (R1) and the first result identifier value (ID2) as a fingerprint generated by a hash function of the intermediate result (R1),

almacenar el primer valor de identificador de seleccion (ID1) y el primer valor de identificador de resultado (ID2) como objetos asociados en la estructura de datos; ystoring the first selection identifier value (ID1) and the first result identifier value (ID2) as associated objects in the data structure; Y

almacenar el primer valor de identificador de resultado (ID2) y el resultado intermedio (R1) como objetos asociados en la estructura de datos,store the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure,

calcular un segundo valor de identificador de seleccion (ID3) como una huella digital estadisticamente unica generada por una funcion hash del primer valor de identificador de resultado (ID2) y el segundo elemento de seleccion (S2), ycalculating a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of the first result identifier value (ID2) and the second selection element (S2), and

buscar en los objetos de la estructura de datos basandose en el segundo valor de identificador de seleccion (ID3) y, si se encuentra el segundo valor de identificador de seleccion (ID3), localizar y recuperar un resultado final (R2) almacenado con el segundo valor de identificador de seleccion (ID3) como objetos asociados en una iteracion precedente;search the objects of the data structure based on the second selection identifier value (ID3) and, if the second selection identifier value (ID3) is found, locate and retrieve a final result (R2) stored with the second selection identifier value (ID3) as associated objects in a preceding iteration;

(e) si no se encuentra el resultado final (R2) en la subetapa (c) o (d),(e) if the final result (R2) is not found in sub-step (c) or (d),

buscar en los objetos de la estructura de datos basandose en el primer valor de identificador de resultado (ID2);search the objects of the data structure based on the first result identifier value (ID2);

(f) si no se encuentra el primer valor de identificador de resultado (ID2) en la subetapa (e),(f) if the first result identifier value (ID2) is not found in sub-step (e),

almacenar el primer valor de identificador de resultado (ID2) y el resultado intermedio (R1) como objetos asociados en la estructura de datos, ystoring the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure, and

ejecutar el segundo calculo principal (P2) para producir el resultado final (R2) y almacenar el segundo valor de identificador de seleccion (ID3) y el resultado final (R2) como objetos asociados en la estructura de datos; yexecuting the second main calculation (P2) to produce the final result (R2) and storing the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure; Y

(g) si se encuentra el primer valor de identificador de resultado (ID2) en la subetapa (e),(g) if the first result identifier value (ID2) is found in sub-step (e),

recuperar el resultado intermedio (R1) almacenado con el primer valor de identificador de resultado (ID2) como objetos asociados en una iteracion precedente, yrecovering the intermediate result (R1) stored with the first result identifier value (ID2) as associated objects in a preceding iteration, and

ejecutar el segundo calculo principal (P2) para producir el resultado final (R2) y almacenar el segundo valor de identificador de seleccion (ID3) y el resultado final (R2) como objetos asociados en la estructura de datos.execute the second main calculation (P2) to produce the final result (R2) and store the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure.

[0009] De esta forma, en el metodo segun el primer aspecto, el primer y el segundo resultado se guardan en cache en la memoria informatica y quedan disponibles para su reutilizacion en iteraciones posteriores del metodo, reduciendo asi la necesidad de ejecutar el primer y/o segundo calculo principal para la extraccion de la informacion. La reutilizacion puede implicar calcular el primer y/o segundo valor de identificador de seleccion durante una iteracion posterior y acceder a la estructura de datos para recuperar potencialmente el primer y/o segundo resultado.[0009] Thus, in the method according to the first aspect, the first and the second result are cached in the computer memory and are available for reuse in subsequent iterations of the method, thus reducing the need to execute the first and / or second main calculation for the extraction of the information. The reuse may involve calculating the first and / or second selection identifier value during a subsequent iteration and accessing the data structure to potentially recover the first and / or second result.

[0010] Un segundo aspecto de la invencion es un medio legible por ordenador con un programa informatico almacenado que, cuando lo ejecuta un ordenador, es apto para llevar a cabo el metodo segun el primer aspecto.[0010] A second aspect of the invention is a computer readable medium with a stored computer program which, when executed by a computer, is capable of carrying out the method according to the first aspect.

[0011] Un tercer aspecto de la invencion es un aparato para extraer informacion de una base de datos, donde dicho aparato incluye medios para ejecutar una cadena secuencial de calculos principales que comprende un primer calculo principal que opera un primer elemento de seleccion en un conjunto de datos que representa la base de datos para producir un primer resultado, y un segundo calculo principal que opera un segundo elemento de seleccion en el primer resultado para producir un segundo resultado, donde dicho aparato comprende ademas medios para recuperar el resultado final mediante los pasos de:[0011] A third aspect of the invention is an apparatus for extracting information from a database, wherein said apparatus includes means for executing a sequential chain of main calculations comprising a first main calculation that operates a first selection element in a data set representing the database to produce a first result, and a second main calculation that operates a second selection element in the first result to produce a second result, where said apparatus further comprises means for recovering the final result by the steps of:

(a) calcular un primer valor de identificador de seleccion (ID1) como una huella digital estadisticamente unica generada por una funcion hash de al menos el primer elemento de seleccion (S1);(a) calculating a first selection identifier value (ID1) as a statistically unique fingerprint generated by a hash function of at least the first selection element (S1);

(c) si se encuentra el primer identificador de resultado (ID2) en la subetapa (b),(c) if the first result identifier (ID2) is found in sub-step (b),

(e) si el resultado final (R2) no se encuentra en la subetapa (c) o (d),(e) if the final result (R2) is not found in sub-step (c) or (d),

(f) si el primer valor de identificador de resultado (ID2) no se encuentra en la subetapa (e),(f) if the first result identifier value (ID2) is not found in sub-step (e),

(g) si el primer valor de identificador de resultado (ID2) se encuentra en la subetapa (e),(g) if the first result identifier value (ID2) is found in sub-step (e),

[0012] El aparato del tercer aspecto comparte las ventajas del metodo del primer aspecto y puede comprender caracteristicas adicionales que corresponden a cualquiera de los ejemplos de realizacion anteriormente descritos en relacion con el primer aspecto.[0012] The apparatus of the third aspect shares the advantages of the method of the first aspect and may comprise additional features corresponding to any of the embodiments described above in relation to the first aspect.

[0013] Otros objetivos, caracteristicas, aspectos y ventajas de la presente invencion apareceran en la siguiente descripcion detallada, las reivindicaciones adjuntas y los dibujos. [0013] Other objects, features, aspects and advantages of the present invention will appear in the following detailed description, the appended claims and the drawings.

Breve descripcion de los dibujosBrief description of the drawings

[0014] Los ejemplos de realizacion de la invencion se describiran a continuacion con mas detalle con referencia a los dibujos esquematicos de acompanamiento, donde los mismos numeros de referencia se utilizan para identificar los elementos correspondientes.[0014] The embodiments of the invention will now be described in more detail with reference to the accompanying schematic drawings, where the same reference numbers are used to identify the corresponding elements.

La Fig. 1 ilustra un proceso que implica una cadena de calculos para extraer informacion de una base de datos, donde los identificadores y los resultados se almacenan selectivamente y se recuperan de una memoria informatica.Fig. 1 illustrates a process involving a string of calculations to extract information from a database, where the identifiers and results are selectively stored and retrieved from a computer memory.

La Fig. 2 ilustra un ejemplo de un proceso como se muestra en la Fig. 1.Fig. 2 illustrates an example of a process as shown in Fig. 1.

La Fig. 3 ilustra otro ejemplo de un proceso como se muestra en la Fig. 1.Fig. 3 illustrates another example of a process as shown in Fig. 1.

La Fig. 4 ilustra una forma de realizacion del proceso de la Fig. 1.Fig. 4 illustrates an embodiment of the process of Fig. 1.

La Fig. 5 ilustra otro ejemplo de un proceso como se muestra en la Fig. 1.Fig. 5 illustrates another example of a process as shown in Fig. 1.

La Fig. 6 es un diagrama de flujo que ejemplifica el proceso de la Fig. 5.Fig. 6 is a flow diagram exemplifying the process of Fig. 5.

La Fig. 7 es una vision de conjunto del proceso de la Fig. 5 implementada en un contexto especificoFig. 7 is an overview of the process of Fig. 5 implemented in a specific context

La Fig. 8 es un diagrama de bloques de un entorno de ordenador para implementar los ejemplos de realizacion de la invencion.Fig. 8 is a block diagram of a computer environment for implementing the embodiments of the invention.

Descripcion detallada de las formas de realizacion ejemplaresDetailed description of the exemplary embodiments

[0015] La presente invencion se refiere a tecnicas para extraer informacion a partir de una base de datos. Para una mayor facilidad de comprension, algunos principios subyacentes se discutiran en primer lugar en relacion con un ejemplo general. Mas adelante, se comentaran diferentes aspectos, caracteristicas y ventajas en relacion con una implementacion especifica.[0015] The present invention relates to techniques for extracting information from a database. For greater ease of understanding, some underlying principles will be discussed first in relation to a general example. Later on, different aspects, characteristics and advantages will be discussed in relation to a specific implementation.

GENERALGENERAL

[0016] La Fig. 1 ilustra un ejemplo de un proceso implementado en el ordenador para extraer informacion de una base de datos DB, que puede almacenarse o no almacenarse de forma externa al ordenador que implementa el proceso. El proceso de extraccion incluye la extraccion de un conjunto o campo de datos inicial RO de la base de datos DB, por ejemplo leyendo el conjunto de datos inicial RO en la memoria principal (por ejemplo RAM) del ordenador. El conjunto de datos inicial RO puede incluir todo el contenido de la base de datos DB o un subconjunto de la misma.[0016] Fig. 1 illustrates an example of a process implemented in the computer to extract information from a DB database, which can be stored or not stored externally to the computer that implements the process. The extraction process includes the extraction of an initial data set or field RO from the database DB, for example by reading the initial data set RO in the main memory (for example RAM) of the computer. The initial RO data set may include all the contents of the DB database or a subset thereof.

[0017] El proceso de la Figura 1 incluye una secuencia de procedimientos de calculo principales PI, P2 que operan para generar un resultado final R2 basado en el conjunto de datos inicial RO. En concreto, un primer procedimiento PI opera en el conjunto de datos inicial RO para producir un resultado intermedio R1, y el segundo procedimiento P2 opera en el resultado intermedio para producir el resultado final R2.[0017] The process of Figure 1 includes a sequence of PI, P2 major calculation procedures that operate to generate a final result R2 based on the initial RO data set. In particular, a first PI procedure operates on the initial data set RO to produce an intermediate result R1, and the second procedure P2 operates on the intermediate result to produce the final result R2.

[0018] Un primer elemento de seleccion S1, que puede originarse o no a partir de una entrada del usuario, controla el primer procedimiento P1. De forma similar, un segundo elemento de seleccion S2, que puede originarse o no a partir de una entrada del usuario, controla el segundo procedimiento P2. Cada elemento de seleccion S1, S2 puede incluir cualquier combinacion de variables y para funciones matematicas que definen los datos de entrada al respectivo procedimiento, es decir, el conjunto de datos RO y el resultado intermedio R1, respectivamente.[0018] A first selection element S1, which may or may not originate from a user input, controls the first procedure P1. Similarly, a second selection element S2, which may or may not originate from a user input, controls the second procedure P2. Each selection element S1, S2 can include any combination of variables and for mathematical functions that define the input data to the respective procedure, i.e., the data set RO and the intermediate result R1, respectively.

[0019] La Fig. 1 tambien indica que el proceso de extraccion interactua con una memoria informatica 10 (habitualmente memoria RAM o cache), con el primer y el segundo procedimiento PI, P2 operando para almacenar elementos de datos en la memoria 10 y recuperar los elementos de datos de la memoria 10. En el ejemplo ilustrado, el primer procedimiento PI opera para memorizar y recuperar los identificadores, generalmente designados por ID, y los resultados intermedios R1; y el segundo procedimiento P2 opera para memorizar y recuperar los identificadores, generalmente designados por ID, los resultados intermedios R1 y los resultados finales R2. En lo sucesivo, el procedimiento de almacenamiento de identificadores y resultados en la memoria informatica 10 tambien se denominara "almacenamiento en cache".[0019] Fig. 1 also indicates that the extraction process interacts with a computer memory 10 (usually RAM or cache memory), with the first and second PI procedures, P2 operating to store data elements in the memory 10 and recover the data elements of the memory 10. In the illustrated example, the first PI procedure operates to store and retrieve the identifiers, generally designated by ID, and the intermediate results R1; and the second method P2 operates to store and retrieve the identifiers, generally designated by ID, the intermediate results R1 and the final results R2. In the following, the method of storing identifiers and results in the computer memory 10 will also be referred to as "caching".

[0020] Los diferentes identificadores se generan habitualmente mediante los procedimientos PI, P2 en funcion de uno o mas parametros del proceso, tales como otro identificador y/o un elemento de seleccion S1, S2 y/o un resultado R1, R2. Las diferentes funciones pueden usarse o no para generar diferentes identificadores. La/las funcion/funciones para generar un identificador puede(n) ser un algoritmo de hash que genere una huella digital del/de los parametro(s) de proceso pertinente(s). La funcion/funciones se configura(n) adecuadamente de manera que cada combinacion de valores de parametro unicos tiene como resultado un valor identificador que es unico entre todos los valores identificadores que se generan para todos los diferentes identificadores en el proceso. En este contexto, "unico" no solo incluye valores de identificador teoricamente unicos, sino tambien valores de identificador estadisticamente unicos. Un ejemplo no limitativo de dicha funcion es un algoritmo de hash que genera una huella digital de al menos 256 bits. [0020] The different identifiers are usually generated by the PI, P2 procedures in function of one or more process parameters, such as another identifier and / or a selection element S1, S2 and / or a result R1, R2. The different functions can be used or not to generate different identifiers. The function (s) to generate an identifier can be a hash algorithm that generates a fingerprint of the relevant process parameter (s). The function / functions are suitably configured so that each combination of unique parameter values results in an identifier value that is unique among all the identifier values that are generated for all the different identifiers in the process. In this context, "unique" not only includes theoretically unique identifier values, but also statistically unique identifier values. A non-limiting example of such a function is a hash algorithm that generates a fingerprint of at least 256 bits.

[0021] En un proceso ilustrado mas adelante en la Fig. 2, el primer procedimiento P1 se configura para que calcule un primer valor de identificador de seleccion ID 1 en funcion del primer elemento de seleccion S1, es decir ID1=f(S1) y el segundo procedimiento P2 se configura para que calcule un segundo valor de identificador de seleccion ID3 en funcion del segundo elemento de seleccion S2 y el resultado intermedio R1, es decir ID3=f(S2, Rl). El primer procedimiento P1 tambien se configura para que almacene ID1 y el resultado intermedio R1 como objetos asociados en una estructura de datos 12 en la memoria informatica, y el segundo procedimiento P2 se configura para que almacene ID3 y R2 como objetos asociados en la estructura de datos 12. De este modo, la estructura de datos 12 en la memoria informatica 10 se configura para que almacene un conjunto heterogeneo de objetos, es decir, objetos de diferentes tipos.[0021] In a process illustrated below in Fig. 2, the first method P1 is configured to calculate a first selection identifier value ID 1 as a function of the first selection element S1, i.e. ID1 = f (S1) and the second method P2 is configured to calculate a second selection identifier value ID3 as a function of the second selection element S2 and the intermediate result R1, ie ID3 = f (S2, R1). The first method P1 is also configured to store ID1 and the intermediate result R1 as associated objects in a data structure 12 in the computer memory, and the second procedure P2 is configured to store ID3 and R2 as associated objects in the structure of the computer. data 12. In this way, the data structure 12 in the computer memory 10 is configured to store a heterogeneous set of objects, that is, objects of different types.

[0022] Este proceso permite una reduccion del tiempo de respuesta en el proceso de extraccion y/o una reduccion de los requisitos de tratamiento del ordenador que implementa el proceso de extraccion reduciendo la necesidad de ejecutar los procedimientos de calculo principales PI, P2 para calcular el resultado intermedio R1 y el resultado final R2, respectivamente. Por ejemplo, el proceso de extraccion se puede configurar para que use la estructura de datos 12, siempre que sea posible, para encontrar el resultado final R2 basado en el primer elemento de seleccion S1 y el segundo elemento de seleccion S2. Asi, cuando el proceso descubre una necesidad de calcular el resultado final R2, basado en S1 y S2, puede generar ID1=f(S1) y acceder a la estructura de datos 12 basada en ID1. Si se ha usado un primer elemento de seleccion S1 identico con el primer procedimiento P1 precedente, es posible encontrar el valor generado de ID1 en la estructura de datos 12 y que este se asocie con el resultado intermedio correspondiente R1. De este modo, el resultado intermedio R1 se puede recuperar a partir de la estructura de datos 12 en lugar de calculandolo mediante el procedimiento P1. Si el resultado intermedio R1 no se encuentra en la estructura de datos 12, el proceso puede hacer que el primer procedimiento P1 calcule el resultado intermedio R1. Ademas, despues de la obtencion del resultado intermedio R1, el proceso puede generar ID3=f(R1; S2) y acceder a la estructura de datos 12 basada en ID3. Nuevamente, si la misma operacion ha sido ejecutada por el procedimiento anterior, es posible que el valor generado de ID3 se encuentre en la estructura de datos 12 y se asocie al resultado final correspondiente R2. Por lo tanto, el resultado final R2 se puede recuperar de la estructura de datos 12 en lugar de calculandolo mediante el procedimiento P2.[0022] This process allows a reduction of the response time in the extraction process and / or a reduction of the treatment requirements of the computer that implements the extraction process reducing the need to execute the main calculation procedures PI, P2 to calculate the intermediate result R1 and the final result R2, respectively. For example, the extraction process can be configured to use data structure 12, whenever possible, to find the final result R2 based on the first selection element S1 and the second selection element S2. Thus, when the process discovers a need to calculate the final result R2, based on S1 and S2, it can generate ID1 = f (S1) and access the data structure 12 based on ID1. If a first selection element S1 identical with the first preceding procedure P1 has been used, it is possible to find the generated value of ID1 in the data structure 12 and that this is associated with the corresponding intermediate result R1. In this way, the intermediate result R1 can be recovered from the data structure 12 instead of calculating it by the procedure P1. If the intermediate result R1 is not found in the data structure 12, the process can cause the first procedure P1 to calculate the intermediate result R1. Furthermore, after obtaining the intermediate result R1, the process can generate ID3 = f (R1; S2) and access the data structure 12 based on ID3. Again, if the same operation has been executed by the above procedure, it is possible that the generated value of ID3 is in the data structure 12 and is associated with the corresponding final result R2. Therefore, the final result R2 can be retrieved from the data structure 12 instead of calculating it by the procedure P2.

[0023] En el proceso ilustrado mas adelante en la Fig. 3, el primer procedimiento P1 se configura posteriormente para calcular un primer valor de identificador de resultado ID2 en funcion del resultado intermedio R1. El primer procedimiento P1 tambien se configura para que almacene ID 1 e ID2 como objetos asociados en la estructura de datos 12 y para que almacene ID2 y el resultado intermedio R1 como objetos asociados en la estructura de datos 12.[0023] In the process illustrated below in Fig. 3, the first method P1 is subsequently configured to calculate a first value of the result identifier ID2 as a function of the intermediate result R1. The first method P1 is also configured to store ID 1 and ID2 as associated objects in data structure 12 and to store ID2 and intermediate result R1 as associated objects in data structure 12.

[0024] Esta forma de realizacion del proceso permite una reduccion del tamano de la memoria informatica requerida por el proceso, ya que cada resultado intermedio R1 se almacena solo una vez en la estructura de datos 12, aunque dos o mas primeros elementos de seleccion S1 produzcan resultados intermedios identicos R1. Esta forma de realizacion es particularmente pertinente cuando los resultados intermedios R1 son altos, que es con frecuencia el caso cuando se procesa informacion de bases de datos.[0024] This embodiment of the process allows a reduction in the size of the computer memory required by the process, since each intermediate result R1 is stored only once in the data structure 12, although two or more first selection elements S1 produce identical intermediate results R1. This embodiment is particularly relevant when intermediate results R1 are high, which is often the case when database information is processed.

[0025] El calculo del primer valor de identificador de resultado ID2 tambien permite una forma de realizacion, ilustrada en la Fig. 4, en la que el resultado intermedio R1 se representa mediante el primer valor de identificador de resultado ID2 en el calculo del segundo valor de identificador de seleccion ID3, es decir ID3=f(ID2, S2).[0025] The calculation of the first result identifier value ID2 also allows an embodiment, illustrated in Fig. 4, in which the intermediate result R1 is represented by the first result identifier value ID2 in the calculation of the second ID3 selection ID value, ie ID3 = f (ID2, S2).

[0026] Esta forma de realizacion reduce la necesidad de almacenar el resultado intermedio R1 en la estructura de datos 12, ya que se puede recuperar el resultado final R2 de la estructura de datos 12 basandose en ID3, que se genera basandose en ID2, no el resultado intermedio R1. Esto permite calcular eficazmente el resultado final R2, aunque el resultado intermedio R1 se ha eliminado de la estructura de datos 12. Por ejemplo, el proceso se puede configurar para que use la estructura de datos 12, siempre que sea posible, para encontrar el resultado final R2 basandose en el primer elemento de seleccion S1 y el segundo elemento de seleccion S2. Asi, cuando el proceso descubre una necesidad de calcular el resultado final R2, basandose en S1 y S2, puede generar ID1=f(S1) y acceder a la estructura de datos 12 basandose en ID1 para recuperar ID2 asociado con ello, si previamente se ha usado un primer elemento de seleccion identico S1 con el primer procedimiento P1. A continuacion, el proceso puede generar ID3=f(ID2; S2) y acceder a la estructura de datos 12 basandose en ID3 para recuperar el resultado final R2 asociado con ello, si el segundo procedimiento P2 ha operado en un resultado intermedio identico R1 y un segundo elemento de seleccion identico S2 previamente. En este ejemplo, el resultado final R2 puede recuperarse asi de la estructura de datos 12 aunque se haya eliminado el resultado intermedio R1.[0026] This embodiment reduces the need to store the intermediate result R1 in the data structure 12, since the final result R2 of the data structure 12 can be recovered based on ID3, which is generated based on ID2, not the intermediate result R1. This allows the final result R2 to be efficiently calculated, although the intermediate result R1 has been removed from the data structure 12. For example, the process can be configured to use the data structure 12, whenever possible, to find the result final R2 based on the first selection element S1 and the second selection element S2. Thus, when the process discovers a need to calculate the final result R2, based on S1 and S2, it can generate ID1 = f (S1) and access the data structure 12 based on ID1 to retrieve ID2 associated therewith, if previously has used a first identical selection element S1 with the first procedure P1. Then, the process can generate ID3 = f (ID2; S2) and access the data structure 12 based on ID3 to retrieve the final result R2 associated therewith, if the second procedure P2 has operated in an identical intermediate result R1 and a second identical selection element S2 previously. In this example, the final result R2 can thus be recovered from the data structure 12 even if the intermediate result R1 has been eliminated.

[0027] En un proceso ilustrado en la Fig. 5, el primer procedimiento P1 se configura posteriormente para que calcule un segundo valor de identificador de resultado ID4 en funcion del resultado final R2. El segundo procedimiento P2 tambien se configura para que almacene ID3 e ID4 como objetos asociados en la estructura de datos 12 y para que almacene ID4 y el resultado final R2 como objetos asociados en la estructura de datos 12. [0027] In a process illustrated in FIG. 5, the first method P1 is subsequently configured to calculate a second value of the result identifier ID4 as a function of the final result R2. The second method P2 is also configured to store ID3 and ID4 as associated objects in the data structure 12 and to store ID4 and the final result R2 as associated objects in the data structure 12.

[0028] Este proceso permite reducir el tamano de la memoria informatica requerida por el proceso, ya que cada resultado final R2 se almacena solo una vez en la estructura de datos 12, aunque dos o mas segundos elementos de seleccion S2 produzcan resultados finales identicos R2. Este proceso es particularmente pertinente cuando los resultados finales R2 son altos.[0028] This process allows to reduce the size of the computer memory required by the process, since each end result R2 is stored only once in the data structure 12, although two or more second selection elements S2 produce identical end results R2 . This process is particularly relevant when the final R2 results are high.

[0029] Hasta ahora, se ha supuesto que la base de datos DB, y por tanto el conjunto de datos R0, es estatica. Si la base de datos es dinamica, puede ser adecuado generar el primer identificador de seleccion ID1 en funcion del primer elemento de seleccion S1 y el conjunto de datos R0, es decir, ID1=f(S1, R0). Con dicha modificacion, todos los procesos y formas de realizacion descritos en relacion con la Fig. 1-5 se pueden aplicar igualmente a una base de datos dinamica, es decir, una base de datos que puede cambiar en cualquier momento.[0029] Up to now, it has been assumed that the database DB, and therefore the data set R0, is static. If the database is dynamic, it may be appropriate to generate the first selection identifier ID1 according to the first selection element S1 and the data set R0, that is, ID1 = f (S1, R0). With said modification, all the processes and embodiments described in relation to Fig. 1-5 can equally be applied to a dynamic database, that is, a database that can change at any time.

[0030] La Fig. 6 es un diagrama de flujo que ilustra una implementacion de ejemplificacion del proceso mostrado en la Fig. 5, adaptado para que opere en una base de datos dinamica. El proceso comienza con la entrada del conjunto de datos R0 (etapa 600), el primer elemento de seleccion S1 (etapa 602) y el segundo elemento de seleccion S2 (604). A continuacion, se genera un valor del primer identificador de seleccion ID1 en funcion de S1 y R0 (etapa 606). Se lleva a cabo una busqueda en la estructura de datos basada en ID1 (etapa 608). Si se encuentra el valor de ID1 en la estructura de datos, es decir, se ha almacenado en cache en una iteracion precedente, el proceso recupera el valor del primer identificador de resultado ID2 asociado con ello (etapa 610) y procede a la etapa 612.[0030] Fig. 6 is a flow chart illustrating an exemplary implementation of the process shown in Fig. 5, adapted to operate in a dynamic database. The process begins with the input of the data set R0 (step 600), the first selection element S1 (step 602) and the second selection element S2 (604). Next, a value of the first selection identifier ID1 is generated as a function of S1 and R0 (step 606). A search is performed in the data structure based on ID1 (step 608). If the value of ID1 is found in the data structure, ie, it has been cached in a preceding iteration, the process retrieves the value of the first result identifier ID2 associated therewith (step 610) and proceeds to step 612 .

[0031] Si no se encuentra el valor de ID1 en la estructura de datos en la etapa 608, el proceso provoca que el primer procedimiento P1 calcule R1, operando S1 en R0 (etapa 614). A continuacion, el valor de ID2 se genera en funcion de R1 (etapa 616), y los valores de ID1, ID2 y R1 se almacenan en la estructura de datos en pares asociados ID1:ID2 e lD2:R1 (etapa 618). Seguidamente, el proceso continua a la etapa 612.[0031] If the value of ID1 is not found in the data structure in step 608, the process causes the first procedure P1 to calculate R1, by operating S1 in R0 (step 614). Next, the value of ID2 is generated as a function of R1 (step 616), and the values of ID1, ID2 and R1 are stored in the associated pair data structure ID1: ID2 and ID2: R1 (step 618). Next, the process continues to step 612.

[0032] En la etapa 612, el valor del segundo identificador de seleccion ID3 se genera en funcion de S2 e ID2. Despues, se lleva a cabo una busqueda en la estructura de datos basada en ID3 (etapa 620). Si se encuentra el valor de ID3 en la estructura de datos, es decir, se ha almacenado en cache en una iteracion precedente, el proceso recupera el valor del segundo identificador de resultado ID4 asociado con ello (etapa 622). Se lleva a cabo una busqueda adicional en la estructura de datos basada en ID4 (etapa 624). Si se encuentra el valor de ID4 en la estructura de datos, es decir, se ha almacenado en cache en una iteracion precedente, el proceso recupera el resultado final R2 asociado con ello (etapa 626).[0032] In step 612, the value of the second selection identifier ID3 is generated as a function of S2 and ID2. Then, a search is performed on the data structure based on ID3 (step 620). If the value of ID3 is found in the data structure, ie, it has been cached in a preceding iteration, the process retrieves the value of the second ID4 result ID associated therewith (step 622). An additional search is carried out in the data structure based on ID4 (step 624). If the value of ID4 is found in the data structure, that is, it has been cached in a preceding iteration, the process retrieves the final result R2 associated therewith (step 626).

[0033] Si no se encuentra el valor de ID3 en la estructura de datos en la etapa 620, se lleva a cabo una busqueda adicional en la estructura de datos basada en el valor de ID2 determinado en la etapa 610 o en la etapa 616 (etapa 628). Si se encuentra el valor de ID2 en la estructura de datos, es decir, se ha almacenado en cache en una iteracion precedente, el proceso recupera el primer resultado R1 asociado con ello (etapa 630). El proceso provoca a continuacion que el segundo procedimiento P2 calcule R2, operando S2 en R1 (etapa 632). Para actualizar la estructura de datos, el proceso tambien genera el valor de ID4 en funcion de R2 (etapa 634) y almacena los valores de ID3, ID4 y R2 en la estructura de datos en pares asociados ID3:ID4 e ID4:R2 (etapa 636).[0033] If the value of ID3 is not found in the data structure in step 620, an additional search is carried out in the data structure based on the value of ID2 determined in step 610 or step 616 ( step 628). If the value of ID2 is found in the data structure, that is, it has been cached in a preceding iteration, the process retrieves the first result R1 associated therewith (step 630). The process then causes the second procedure P2 to calculate R2, by operating S2 on R1 (step 632). To update the data structure, the process also generates the value of ID4 as a function of R2 (step 634) and stores the values of ID3, ID4 and R2 in the data structure in associated pairs ID3: ID4 and ID4: R2 (stage 636).

[0034] Si no se encuentra el valor de ID2 en la estructura de datos en la etapa 628, el proceso provoca que el primer procedimiento P1 calcule R1, operando S1 en R0 (etapa 638), y almacena los valores de ID2 y R1 en la estructura de datos en un par asociado ID2:R1 (etapa 640). El proceso procede a continuacion a la etapa 632. Sin embargo, debe tenerse en cuenta que si el resultado intermedio R1 ya se ha calculado en la etapa 614, no es necesario realizar las etapas 628, 630, 638 y 640. En tal caso, si ID3 no se encuentra en la etapa 620, el proceso puede proceder directamente a la etapa 632, donde se provoca que el segundo procedimiento P2 calcule R2, operando S2 en R1.[0034] If the value of ID2 is not found in the data structure in step 628, the process causes the first procedure P1 to calculate R1, by operating S1 in R0 (step 638), and stores the values of ID2 and R1 in the data structure in an associated pair ID2: R1 (step 640). The process then proceeds to step 632. However, it should be noted that if the intermediate result R1 has already been calculated in step 614, it is not necessary to perform steps 628, 630, 638 and 640. In such case, if ID3 is not in step 620, the process can proceed directly to step 632, where the second procedure P2 is caused to calculate R2, by operating S2 in R1.

[0035] Si no se encuentra el valor de ID4 en la estructura de datos en la etapa 622, el proceso provoca que el segundo procedimiento P2 calcule R2, operando S2 en R1 (etapa 642). Para actualizar la estructura de datos, el proceso tambien genera el valor de ID4 en funcion de R2 (etapa 644) y almacena los valores de ID4 y R2 en la estructura de datos en un par asociado ID4:R2 (etapa 646).[0035] If the value of ID4 is not found in the data structure in step 622, the process causes the second procedure P2 to calculate R2, by operating S2 in R1 (step 642). To update the data structure, the process also generates the value of ID4 as a function of R2 (step 644) and stores the values of ID4 and R2 in the data structure in an associated pair ID4: R2 (step 646).

[0036] El experto entendera facilmente que los procesos y ejemplos de realizacion de las Figs. 2-4 provocan los procesos de almacenamiento y recuperacion correspondientes, utilizando sin embargo combinaciones de identificadores diferentes. Para mantener la brevedad de la presentacion, estos procesos no se ilustran en las tablas de flujo, sino que se presentan meramente como procesos de ejemplificacion y formas de realizacion en la seccion Resumen precedente.[0036] The expert will readily understand that the processes and embodiments of Figs. 2-4 cause the corresponding storage and recovery processes, however using combinations of different identifiers. To maintain the brevity of the presentation, these processes are not illustrated in the flow tables, but are presented merely as exemplification processes and forms of realization in the preceding Summary section.

[0037] Debe comprenderse que se puede utilizar cualquier estructura de datos 12, lineal o no lineal, para almacenar los identificadores y resultados. Sin embargo, por cuestiones de velocidad de tratamiento, puede ser preferible usar una estructura de datos 12 con un sistema de indice eficaz, como una lista ordenada, una tabla hash o un arbol binario, como un arbol AVL. [0037] It should be understood that any data structure 12, linear or non-linear, can be used to store the identifiers and results. However, for reasons of processing speed, it may be preferable to use a data structure 12 with an effective index system, such as an ordered list, a hash table or a binary tree, such as an AVL tree.

Formas de realizacion espedficas, aplicaciones y ejemplosSpecific forms of implementation, applications and examples

[0038] En lo sucesivo, se comentaran y ejemplificaran en detalle formas de realizacion de la invencion, aplicaciones y ejemplos.[0038] In the following, embodiments of the invention, applications and examples will be discussed and exemplified in detail.

[0039] En algunas formas de realizacion de la invencion, se usan calculos y resultados previos en el procesamiento de las peticiones sucesivas para nuevos datos y calculos. Con este fin, el proceso de extraccion se disena para almacenar en cache los resultados durante el procesamiento de las solicitudes de datos. Cuando se procesa una solicitud posterior, el proceso de extraccion determina si ya se ha generado y almacenado en cache un resultado precedente apropiado. Si es asi, el resultado precedente se usa en el procesamiento de la solicitud posterior. Ya que los calculos previos no necesitan regenerarse, se puede reducir considerablemente el tiempo de procesamiento para la solicitud posterior.[0039] In some embodiments of the invention, calculations and previous results are used in the processing of successive requests for new data and calculations. To this end, the extraction process is designed to cache the results during the processing of data requests. When a subsequent request is processed, the extraction process determines whether an appropriate precedent result has already been generated and cached. If so, the preceding result is used in the processing of the subsequent request. Since the previous calculations do not need to be regenerated, the processing time for the subsequent request can be considerably reduced.

[0040] En algunas formas de realizacion de la invencion, se usan identificadores digitales (huellas digitales) para identificar la informacion almacenada en cache y, de esta manera, tambien se puede reutilizar un resultado almacenado en cache cuando se alcanza de forma diferente al calculo precedente.[0040] In some embodiments of the invention, digital identifiers (fingerprints) are used to identify the information stored in cache and, in this way, a cached result can also be reused when it is reached in a different way to the calculation preceding.

[0041] En algunas formas de realizacion de la invencion, los identificadores digitales se almacenan en la cache. Concretamente, el identificador de la entrada para un procedimiento de calculo se almacena junto con el identificador digital de la salida del procedimiento de calculo. Por lo tanto, tambien se puede alcanzar el resultado final de una operacion de muchas etapas cuando el/los resultado(s) intermedio(s) complejo(s) requerido(s) se ha(n) eliminado de la cache. Solo se necesita el identificador digital del/de los resultado(s) intermedio(s).[0041] In some embodiments of the invention, the digital identifiers are stored in the cache. Specifically, the identifier of the input for a calculation procedure is stored together with the digital identifier of the output of the calculation procedure. Therefore, the end result of a multi-stage operation can also be achieved when the required complex intermediate result (s) has been removed from the cache. Only the digital identifier of the intermediate result (s) is needed.

[0042] En algunas formas de realizacion de la invencion, la cache se implementa mediante una estructura de datos que puede almacenar objetos heterogeneos, tales como tablas, subconjuntos de datos, matrices e identificadores digitales.[0042] In some embodiments of the invention, the cache is implemented by a data structure that can store heterogeneous objects, such as tables, subsets of data, arrays, and digital identifiers.

[0043] Las formas de realizacion de la invencion pueden, por lo tanto, servir para minimizar, o al menos reducir, los tiempos de respuesta para un usuario que consulta un almacenamiento de datos mediante una consulta ejecutada recientemente por el mismo u otro usuario.[0043] The embodiments of the invention can, therefore, serve to minimize, or at least reduce, the response times for a user consulting a data storage by means of a query executed recently by the same or another user.

[0044] Las formas de realizacion de la invencion tambien pueden servir para minimizar, o al menos reducir, el uso de memoria por la cache al reutilizar la misma entrada de cache para multiples consultas o calculos diferentes, en el caso de que dos consultas o calculos produzcan el mismo resultado.[0044] The embodiments of the invention can also serve to minimize, or at least reduce, the use of memory by the cache by reusing the same cache entry for multiple queries or different calculations, in the case of two queries or calculations produce the same result.

[0045] Las formas de realizacion de la invencion se pueden aplicar para extraer cualquier tipo de informacion de cualquier tipo de base de datos conocida, como bases de datos relacionales, bases de datos postrelacionales, bases de datos orientadas a objetos, bases de datos jerarquicas, etc. Internet tambien se puede considerar como una base de datos en el contexto de la presente invencion.[0045] The embodiments of the invention can be applied to extract any type of information from any type of known database, such as relational databases, postrelational databases, object-oriented databases, hierarchical databases , etc. The Internet can also be considered as a database in the context of the present invention.

[0046] La Fig. 7 divulga una forma de realizacion espedfica de la invencion, que consiste en un proceso de extraccion o busqueda de informacion que implica una consulta de base de datos con un calculo del grafico posterior basada en el resultado de la consulta. El resultado del calculo del grafico, denominado Resultado del grafico, consiste habitualmente en datos que se agregan, clasifican o reagrupan en una, dos o multiples dimensiones, por ejemplo en forma de un cubo multidimensional tal y como se ha comentado en la seccion de Contexto.[0046] FIG. 7 discloses a specific embodiment of the invention, which consists of a process of extracting or searching for information that involves a database query with a calculation of the subsequent graphic based on the result of the query. The result of the calculation of the graph, called Result of the graph, usually consists of data that are aggregated, classified or grouped into one, two or multiple dimensions, for example in the form of a multidimensional cube as discussed in the Context section. .

[0047] En un primer paso, se define el Campo de busqueda de informacion. En el caso de una consulta de base de datos, el campo se define mediante las tablas incluidas en una instruccion SELECCIONAR (o equivalente) y como estas se unen. Para una busqueda en internet, el campo puede ser un indice de las paginas web encontradas, normalmente tambien organizadas como una o mas tablas. De esta forma, la salida de la primera etapa es un conjunto de datos (cf. R0 en las Figs. 1-6).[0047] In a first step, the information search field is defined. In the case of a database query, the field is defined by the tables included in a SELECT statement (or equivalent) and how they join. For a search on the internet, the field can be an index of the web pages found, usually also organized as one or more tables. In this way, the output of the first stage is a set of data (see R0 in Figs 1-6).

[0048] En una segunda etapa, un usuario hace una Seleccion en el conjunto de datos, lo que provoca que un Motor de inferencia evalue un numero de filtros en el conjunto de datos. El motor de inferencia podria ser, por ejemplo, un motor de base de datos, una herramienta de consulta o una herramienta de inteligencia empresarial. Por ejemplo, en una consulta en una base de datos que almacena datos de ordenes transmitidas, esto podria requerir que el ano de la orden sea "2007" y el grupo de productos "Productos lacteos". De esta forma, la seleccion puede definirse unicamente mediante una lista de campos incluidos y, para cada campo, una lista de valores seleccionados o, de manera mas general, una condicion.[0048] In a second step, a user makes a Selection in the data set, which causes an Inference Engine to evaluate a number of filters in the data set. The inference engine could be, for example, a database engine, a query tool or a business intelligence tool. For example, in a query in a database that stores data of transmitted orders, this could require that the year of the order be "2007" and the product group "Dairy products". In this way, the selection can be defined only by a list of included fields and, for each field, a list of selected values or, more generally, a condition.

[0049] Basandose en la seleccion (cf. S1 en las Figs. 1-6), el motor de inferencia ejecuta un procedimiento de calculo (cf. P1 en las Figs. 1-6) para generar un Subconjunto de datos (cf. R1 en las Figs. 1-6) que representa una parte del campo (cf. R0 en las Figs. 1-6). El subconjunto de datos puede contener asi un conjunto de registros de datos pertinentes del campo, o una lista de referencias (por ejemplo indices, indicadores o numeros binarios) a estos registros de datos pertinentes. En el ejemplo anterior, los registros de datos pertinentes unicamente serian los registros de datos que pertenecen al ano "2007" y al grupo de productos "Productos lacteos".[0049] Based on the selection (see S1 in Figs 1-6), the inference engine executes a calculation procedure (cf. P1 in Figs 1-6) to generate a Subset of data (cf. R1 in Figs 1-6) that represents a part of the field (see R0 in Figs 1-6). The data subset can thus contain a set of data records relevant to the field, or a list of references (eg indices, indicators or binary numbers) to these relevant data records. In the previous example, the relevant data records only data records that belong to the year "2007" and the product group "Dairy products".

[0050] Si la seleccion no se ha hecho previamente, el motor de inferencia de la Fig. 7 se acciona para que calcule el subconjunto de datos. Sin embargo, si el calculo se ha hecho previamente, el motor de inferencia se acciona en su lugar para que reutilice el resultado precedente accediendo a una estructura de datos especifica: una "cache".[0050] If the selection has not been made previously, the inference engine of Fig. 7 is operated to calculate the data subset. However, if the calculation has been done previously, the inference engine is activated in its place so that it reuses the previous result by accessing a specific data structure: a "cache".

[0051] Con frecuencia, la etapa siguiente consiste en hacer algunos calculos adicionales, por ejemplo agregacion/agregaciones y/o ordenacion/ordenaciones y/o agrupamiento(s), basados en el subconjunto de datos. En el ejemplo de la Figura 7, estos calculos posteriores se llevan a cabo mediante un Motor del grafico que calcula el Resultado del grafico basado en el subconjunto de datos y un conjunto seleccionado de Propiedades del grafico (cf. S2 en las Figs. 1-6). De esta forma, el motor del grafico ejecuta un procedimiento de calculo del grafico (cf. P2 en las Figs. 1-6) para que genere el resultado del grafico (cf. R2 en las Figs. 1-6). Si estos calculos no se han hecho antes, el motor del grafico de la Fig. 7 se acciona para que genere el resultado del grafico. Sin embargo, si estos calculos se han hecho antes, el motor del grafico se acciona en su lugar para que reutilice el resultado precedente accediendo a la cache antes mencionada. El resultado del grafico puede, a continuacion, ser visualizado por un usuario en tablas dinamicas o, de forma grafica, en graficos 2D y 3D.[0051] Frequently, the next step is to make some additional calculations, for example aggregation / aggregations and / or ordering / rankings and / or grouping (s), based on the subset of data. In the example of Figure 7, these subsequent calculations are carried out by a Graph Engine that calculates the Result of the graph based on the subset of data and a selected set of Properties of the graph (see S2 in Figs. 6). In this way, the graphics engine executes a graph calculation procedure (see P2 in Figs 1-6) to generate the result of the graph (see R2 in Figs 1-6). If these calculations have not been done before, the motor of the graph of Fig. 7 is activated to generate the result of the graph. However, if these calculations have been made before, the graphics engine is activated in its place so that it reuses the previous result by accessing the aforementioned cache. The result of the graph can then be visualized by a user in dynamic tables or, graphically, in 2D and 3D graphics.

[0052] La Fig. 7 tambien ilustra el proceso de uso de la cache, donde f representa el algoritmo de hash que se opera para generar los identificadores digitales, donde ID1-ID4 representan los identificadores digitales generados de esta forma y las flechas de linea continua representan el flujo de datos para generar los identificadores ID1-ID4. Mas adelante en la Fig. 7, las flechas discontinuas representan las busquedas cache.[0052] Fig. 7 also illustrates the process of using the cache, where f represents the hash algorithm that is operated to generate the digital identifiers, where ID1-ID4 represent the digital identifiers generated in this way and the line arrows continuous represent the data flow to generate the identifiers ID1-ID4. Further on in Fig. 7, the dashed arrows represent the cache searches.

[0053] En la Fig. 7, cuando un usuario lleva a cabo una seleccion nueva, el motor de inferencia calcula el subconjunto de datos. Asimismo, el identificador ID1 para la seleccion junto con el campo se genera basandose en los filtros en la seleccion y el campo. Posteriormente, el identificador ID2 para el subconjunto de datos se genera a partir de la definicion del subconjunto de datos, habitualmente una secuencia de bits que define el contenido del subconjunto de datos. Finalmente, el ID2 se almacena en la cache utilizando ID1 como identificador de busqueda. Asimismo, la definicion de subconjunto de datos se almacena en la cache usando ID2 como identificador de consulta.[0053] In Fig. 7, when a user performs a new selection, the inference engine calculates the data subset. Likewise, the identifier ID1 for the selection together with the field is generated based on the filters in the selection and the field. Subsequently, the identifier ID2 for the data subset is generated from the definition of the data subset, usually a sequence of bits that defines the content of the data subset. Finally, ID2 is stored in the cache using ID1 as the search identifier. Also, the data subset definition is stored in the cache using ID2 as the query identifier.

[0054] En la Fig. 7, el calculo del grafico se desarrolla de manera similar. En este caso, hay dos conjuntos de informacion: el subconjunto de datos y las propiedades del grafico pertinentes. El ultimo es, normalmente aunque no restringido a ello, una funcion matematica junto con variables de calculo y variables de clasificacion (dimensiones). Ambos conjuntos de informacion se utilizan para calcular el resultado del grafico y ambos conjuntos de informacion se utilizan tambien para generar el identificador ID3 para la entrada en el calculo del grafico. ID2 ya se habia generado en la etapa precedente, e ID3 se genera como la primera etapa en el procedimiento de calculo del grafico.[0054] In Fig. 7, the calculation of the graph proceeds in a similar manner. In this case, there are two sets of information: the subset of data and the relevant graph properties. The last one is, normally although not restricted to it, a mathematical function together with variables of calculation and variables of classification (dimensions). Both sets of information are used to calculate the result of the graph and both sets of information are also used to generate the identifier ID3 for the entry in the calculation of the graph. ID2 had already been generated in the previous stage, and ID3 is generated as the first stage in the calculation procedure of the graph.

[0055] El identificador ID3 se forma a partir de ID2 y las propiedades del grafico pertinentes. ID3 puede verse como un identificador para una instancia especifica de generacion del grafico, que incluye toda la informacion necesaria para calcular un resultado del grafico especifico. Ademas, se crea un identificador de resultado del grafico ID4 a partir de la definicion de resultado del grafico, generalmente una secuencia de bits que define el resultado del grafico. Finalmente, ID4 se almacena en la cache usando ID3 como identificador de busqueda. Asimismo, la definicion de resultado del grafico se almacena en la cache usando ID4 como identificador de busqueda.[0055] Identifier ID3 is formed from ID2 and the relevant graph properties. ID3 can be seen as an identifier for a specific instance of graph generation, which includes all the information necessary to calculate a result of the specific graph. In addition, a result identifier of the ID4 graph is created from the graph result definition, generally a sequence of bits that defines the result of the graph. Finally, ID4 is stored in the cache using ID3 as the search identifier. Also, the graph's result definition is stored in the cache using ID4 as a search identifier.

[0056] En este ejemplo especifico, se lleva a cabo un almacenamiento en cache del resultado en dos etapas, tanto en el procedimiento de inferencia como en el procedimiento de calculo del grafico. En el procedimiento de inferencia, ID1 e ID2 representan cosas diferentes: la seleccion y la definicion del subconjunto de datos, respectivamente. Si dos selecciones diferentes producen el mismo subconjunto de datos, lo cual es bastante probable, el almacenamiento en cache en dos etapas (ID1:ID2; ID2: subconjunto de datos) provoca que el subconjunto de datos se almacene en cache solo una vez. Esto se denominara en lo sucesivo Combinacion de objetos, es decir, diferentes objetos de datos que en la cache comparten la misma entrada cache. De forma similar, en el procedimiento de calculo del grafico, ID3 e ID4 representan cosas diferentes: la instancia de generacion del grafico y la definicion del resultado del grafico, respectivamente. Si dos instancias de generacion del grafico diferentes producen el mismo resultado del grafico, lo cual es bastante probable, el almacenamiento en cache en dos etapas (ID3:ID4; ID4: resultado del grafico) provoca que el resultado del grafico se almacene en cache solo una vez.[0056] In this specific example, a caching of the result in two stages is carried out, both in the inference procedure and in the calculation procedure of the graph. In the inference procedure, ID1 and ID2 represent different things: the selection and definition of the subset of data, respectively. If two different selections produce the same subset of data, which is quite likely, two-stage caching (ID1: ID2; ID2: subset of data) causes the data subset to be cached only once. This will be referred to below as Combinations of objects, that is, different data objects that share the same cache entry in the cache. Similarly, in the graph calculation procedure, ID3 and ID4 represent different things: the graph generation instance and the definition of the graph result, respectively. If two instances of different graphic generation produce the same result of the graph, which is quite probable, the caching in two stages (ID3: ID4; ID4: result of the graphic) causes the result of the graphic to be stored in cache only once.

[0057] Ademas, almacenando en cache ID3, el resultado del grafico tambien se puede recrear si la definicion del subconjunto de datos se ha eliminado de la cache. Esto supone una ventaja relevante, ya que la definicion del subconjunto de datos puede ser muy amplia y, por lo tanto, propensa a ser eliminada de la cache si se implementa un mecanismo de eliminacion de la cache. Mas adelante, se describe un ejemplo no limitativo de dicho mecanismo. [0057] In addition, by caching ID3, the result of the graph can also be recreated if the definition of the data subset has been removed from the cache. This is a relevant advantage, since the definition of the subset of data can be very broad and, therefore, prone to be eliminated from the cache if a cache elimination mechanism is implemented. Below, a non-limiting example of said mechanism is described.

[0058] Durante el proceso de extraccion, los identificadores se calculan a partir de la seleccion, las propiedades del grafico pertinentes, etc. y se usa para buscar resultados de calculo posiblemente almacenados en cache, como indican las flechas discontinuas en la Fig. 7. Si se encuentra el identificador, se reutilizara el resultado almacenado en cache correspondiente. Si no se encuentra, el proceso de extraccion generara nuevos identificadores y los almacenara en cache con el resultado respectivo.[0058] During the extraction process, the identifiers are calculated from the selection, the relevant graphic properties, etc. and it is used to look for calculation results possibly stored in cache, as indicated by the dashed arrows in Fig. 7. If the identifier is found, the result stored in the corresponding cache will be reused. If it is not found, the extraction process will generate new identifiers and store them in cache with the respective result.

[0059] Para ejemplificar de forma adicional el proceso de extraccion, cabe considerar la seleccion antes mencionada del ano de orden "2007" y del grupo de productos "Productos lacteos". La primera etapa consiste en generar un identificador digital ID1 en funcion de esta seleccion, por ejemplo (escrito en notacion hexadecimal): "31dca7ad013964891df428095ad9b78ad7a69eaaa1ca3886bcf05d8f8184e84a". [0059] To further exemplify the extraction process, the aforementioned selection of the year of order "2007" and of the product group "Dairy products" can be considered. The first step is to generate a digital identifier ID1 based on this selection, for example (written in hexadecimal notation): "31dca7ad013964891df428095ad9b78ad7a69eaaa1ca3886bcf05d8f8184e84a".

[0060] Con el objetivo de mantener la brevedad, en el siguiente ejemplo, cada identificador se representa por sus 4 caracteres iniciales. Asi, ID1 se vuelve, en cambio, "31dc". Asimismo, por motivos de claridad, las tablas ilustrativas siguientes incluyen etiquetas de identificador, por ejemplo " iD l:" delante de los identificadores digitales. Esto no es necesario en la solucion real.[0060] In order to maintain brevity, in the following example, each identifier is represented by its 4 initial characters. Thus, ID1 becomes, instead, "31dc". Also, for reasons of clarity, the following illustrative tables include identifier labels, for example "iD l:" in front of the digital identifiers. This is not necessary in the real solution.

[0061] El proceso de extraccion posterior es el siguiente: cuando se ha generado ID1, este se busca en la cache. La primera vez que se lleva a cabo la seleccion, este identificador no se encontrara en la cache, por lo que el subconjunto de datos resultante debe calcularse de la forma normal. Una vez esto se haya hecho, se puede generar ID2 desde el subconjunto de datos para que sea, por ejemplo, "d2b8". A continuacion, ID1 se almacena en cache, senalando a ID2; e ID2 se almacena, senalando a la secuencia de bits que define el subconjunto de datos resultante. Esta secuencia de bits puede tener un tamano considerable. El contenido de la cache se muestra en Tabla 1 mas abajo.[0061] The subsequent extraction process is as follows: when ID1 has been generated, it is searched in the cache. The first time the selection is carried out, this identifier will not be found in the cache, so the resulting subset of data must be calculated in the normal way. Once this has been done, ID2 can be generated from the subset of data to be, for example, "d2b8". Next, ID1 is cached, pointing to ID2; and ID2 is stored, pointing to the sequence of bits that defines the resulting data subset. This sequence of bits can have a considerable size. The content of the cache is shown in Table 1 below.

Tabla 1:Table 1:

ID Valor en cacheID Value in cache

ID1:31dc ID2:d2b8ID1: 31dc ID2: d2b8

ID2:d2b8 <registros de datos en el subconjunto de datosID2: d2b8 <data records in the data subset

resultante>resulting>

[0062] En la siguiente ocasion en la que se realice la misma seleccion, el proceso sera diferente: ahora, ID1 se encuentra en la cache, senalando a "ID2:d2b8", que a su vez se usa para una segunda busqueda, tras lo cual la secuencia de bits del subconjunto de datos resultante se encuentra, recupera y utiliza en lugar de un calculo, que consumiria mucho tiempo.[0062] On the next occasion in which the same selection is made, the process will be different: now, ID1 is in the cache, signaling "ID2: d2b8", which in turn is used for a second search, after which the sequence of bits of the resulting data subset is found, retrieved and used instead of a calculation, which would consume a lot of time.

[0063] Cabe considerar ahora el caso en el que se realiza una seleccion diferente, pero que produce el mismo subconjunto de datos resultante. Por ejemplo, puede ocurrir que un usuario seleccione exactamente los clientes que han comprado "Productos lacteos" sin solicitar explicitamente "Productos lacteos" y estos han comprado exclusivamente productos lacteos. En este caso, ID1 se genera como, por ejemplo, "f142" y no se encontrara en la cache. Asi, el subconjunto de datos resultante debe calcularse de la forma normal. Una vez esto este hecho, ID2 se puede generar a partir del subconjunto de datos y se encuentra como "d2b8", que ya se ha almacenado en la cache. Asi, la necesidad del algoritmo solo anade una entrada a la cache, donde "ID1 :f142" senala a "ID2:d2b8". El contenido de la cache se muestra en Tabla 2 mas abajo.[0063] Consider now the case in which a different selection is made, but which produces the same resulting data subset. For example, it can happen that a user selects exactly the customers who have purchased "Dairy Products" without explicitly requesting "Dairy Products" and they have exclusively purchased dairy products. In this case, ID1 is generated as, for example, "f142" and will not be found in the cache. Thus, the resulting subset of data must be calculated in the normal way. Once this is done, ID2 can be generated from the subset of data and is found as "d2b8", which has already been stored in the cache. Thus, the need for the algorithm only adds one entry to the cache, where "ID1: f142" points to "ID2: d2b8". The content of the cache is shown in Table 2 below.

Tabla 2:Table 2:

ID Valor en cacheID Value in cache

ID1 :f142 ID2:d2b8ID1: f142 ID2: d2b8

ID1:31dc ID2:d2b8ID1: 31dc ID2: d2b8

resultante>resulting>

[0064] Esta vez no se ha ahorrado tiempo de calculo, pero las entradas cache se reutilizan para prevenir que la cache aumente de forma innecesaria. Asi, tanto "ID1 :f142" como "ID1:31dc" senalan a la entrada cache que contiene el mismo subconjunto de datos resultante: "ID2:d2b8", y ambos se pueden usar en busquedas posteriores. Esto es, por lo tanto, un ejemplo de la "combinacion de objetos" antes mencionada.[0064] This time no calculation time has been saved, but the cache entries are reused to prevent the cache from increasing unnecessarily. Thus, both "ID1: f142" and "ID1: 31dc" point to the cache entry that contains the same resulting data subset: "ID2: d2b8", and both can be used in later searches. This is, therefore, an example of the "combination of objects" mentioned above.

[0065] Una ventaja adicional de almacenar en cache los identificadores digitales resultara evidente cuando se lleve a cabo el calculo del grafico posterior. Asi, se asume que las selecciones anteriores se han llevado a cabo y se ha realizado el calculo del grafico posterior. ID3 e ID4 se han generado como "e40A" y "7505", respectivamente, y almacenado en la cache. El contenido de la cache se muestra en la Tabla 3 mas abajo. [0065] An additional advantage of caching the digital identifiers will be evident when calculating the subsequent graph. Thus, it is assumed that the previous selections have been carried out and the calculation of the subsequent graphic has been made. ID3 and ID4 have been generated as "e40A" and "7505", respectively, and stored in the cache. The content of the cache is shown in Table 3 below.

Tabla 3:Table 3:

ID Valor en cacheID Value in cache

ID1 :f142 ID2:d2b8ID1: f142 ID2: d2b8

ID1:31dc ID2:d2b8ID1: 31dc ID2: d2b8

resultante>resulting>

ID3:e40A ID4:7505ID3: e40A ID4: 7505

ID4:7505 <matriz de numeros que representa el resultadoID4: 7505 <number matrix that represents the result

del grafico>of the graphic>

[0066] De las cinco entradas de la Tabla 3, es muy probable que una sea considerablemente mayor que las demas: "ID2:d2b8", que contiene toda la secuencia de bits que define el subconjunto de datos potencialmente grande. Su tamano la hace una candidata para ser eliminada cuando/si la cache se mantiene, tal y como se describe de forma adicional mas adelante. Asi, despues de un tiempo, el contenido de la cache puede ser tal y como se muestra en la Tabla 4 mas adelante.[0066] Of the five entries in Table 3, it is very likely that one is considerably larger than the others: "ID2: d2b8", which contains all the bit sequence that defines the potentially large data subset. Its size makes it a candidate to be eliminated when / if the cache is maintained, as described further below. Thus, after a while, the content of the cache may be as shown in Table 4 below.

Tabla 4:Table 4:

ID Valor en cacheID Value in cache

ID1 :f142 ID2:d2b8ID1: f142 ID2: d2b8

ID1:31dc ID2:d2b8ID1: 31dc ID2: d2b8

ID3:e40A ID4:7505ID3: e40A ID4: 7505

del grafico>of the graphic>

[0067] Sin embargo, ya que los identificadores digitales se han almacenado en cache, todavia es posible obtener el resultado del grafico sin tener que recalcular el subconjunto de datos intermedio. En cambio, cuando se lleva a cabo la seleccion, se calcula ID1. A continuacion, se realiza una busqueda de ID1 en la cache, que provoca que se recupere ID2. Posteriormente, ID3 se genera a partir de la combinacion de las propiedades del grafico pertinente e ID2. Se lleva a cabo una busqueda de ID3 en la cache y se recupera ID4. Finalmente, se realiza una busqueda de ID4 en la cache y se recupera el resultado del grafico. Por lo tanto, el resultado del grafico se encuentra sin calculos pesados, sino simplemente basandose en los identificadores digitales, que se pueden generar mediante operaciones rapidas y eficientes durante el procesamiento.[0067] However, since the digital identifiers have been cached, it is still possible to obtain the result of the graph without having to recalculate the intermediate data subset. On the other hand, when the selection is carried out, ID1 is calculated. Next, an ID1 search is performed in the cache, which causes ID2 to be recovered. Subsequently, ID3 is generated from the combination of the properties of the relevant graph and ID2. An ID3 search is carried out in the cache and ID4 is recovered. Finally, an ID4 search is performed in the cache and the result of the graph is recovered. Therefore, the result of the graph is without heavy calculations, but simply based on the digital identifiers, which can be generated by fast and efficient operations during processing.

[0068] A partir de lo anterior, se entiende que los identificadores digitales deberian ser unicos de modo que el significado de cada identificador en la cache no sea ambiguo. En una forma de realizacion, los identificadores digitales se generan utilizando un algoritmo o funcion hash. Los algoritmos de hash son transformaciones que se sirven de una entrada de tamano arbitrario (el mensaje) y devuelven una cadena de tamano fijo llamada valor de hash (resumen de mensaje). Normalmente el algoritmo corta y mezcla, por ejemplo, sustituye o traspone, la entrada para crear una huella digital de la misma. Los algoritmos de hash mas sencillos y antiguos son operaciones modulo por primo sencillas. Los algoritmos de hash se usan para una variedad de fines computacionales, incluyendo la criptografia. En terminos generales, un algoritmo de hash deberia comportarse, dentro de lo posible, como una funcion aleatoria, generando cualquier cadena de tamano fijo posible con igual "probabilidad", mientras todavia es verdaderamente determinista.[0068] From the foregoing, it is understood that the digital identifiers should be unique so that the meaning of each identifier in the cache is not ambiguous. In one embodiment, the digital identifiers are generated using an algorithm or hash function. Hash algorithms are transformations that use an arbitrary size entry (the message) and return a fixed-size string called a hash value (message digest). Normally the algorithm cuts and mixes, for example, replaces or transposes, the input to create a fingerprint of it. The simplest and oldest hash algorithms are simple modulo operations. Hash algorithms are used for a variety of computational purposes, including cryptography. In general terms, a hash algorithm should behave, as far as possible, as a random function, generating any chain of possible fixed size with equal "probability", while still being truly deterministic.

[0069] Hay multiples algoritmos de hash conocidos y usados frecuentemente que se pueden utilizar para generar los identificadores digitales anteriormente mencionados. Los diferentes algoritmos de hash se optimizan para usos diferentes, donde algunos se optimizan para una computacion eficaz y rapida del valor de hash, mientras que otros se disenan para una seguridad criptografica alta. Un algoritmo con una alta seguridad criptografica se disena para que sea dificil calcular un mensaje que coincida con un valor de hash determinado en un tiempo razonable, y para encontrar un segundo mensaje que genere el mismo valor de hash que un primer mensaje determinado. Dichos algoritmos de hash incluyen SHA (por sus siglas en ingles, algoritmo de hash seguro) y MD5 (por sus siglas en ingles, algoritmo de resumen de mensaje 5). Los algoritmos de hash eficientes en el procesamiento habitualmente muestran una menor seguridad criptografica. Dichos algoritmos de hash incluyen los algoritmos FNV (Fowler/Noll/Vo), disenados para que sean rapidos y mantener generalmente un indice de colision muy bajo. Un algoritmo FNV habitualmente comienza con una base de desplazamiento, que en principio podria ser cualquier cadena de valores aleatoria, pero generalmente, por tradicion, es siempre la firma del inventor en codigo hexadecimal a traves del algoritmo FNV-0 original. Para generar un valor de hash FNV de 256-bits, normalmente se usa la base de desplazamiento siguiente:[0069] There are multiple known and frequently used hash algorithms that can be used to generate the aforementioned digital identifiers. The different hash algorithms are optimized for different uses, where some are optimized for an efficient and fast computation of the hash value, while others are designed for a high cryptographic security. An algorithm with high cryptographic security is designed so that it is difficult to calculate a message that matches a given hash value in a reasonable time, and to find a second message that generates the same hash value as a given first message. Said hash algorithms include SHA (for its acronym in English, secure hash algorithm) and MD5 (for its acronym in English, message summary algorithm 5). The efficient hash algorithms in the processing usually show less cryptographic security. These hash algorithms include the FNV algorithms (Fowler / Noll / Vo), designed to be fast and generally maintain a very low collision rate. An FNV algorithm usually starts with a displacement base, which in principle could be any random value chain, but generally, by tradition, it is always the inventor's signature in hexadecimal code through the original FNV-0 algorithm. To generate a 256-bit FNV hash value, the following displacement base is usually used:

"0xdd268dbcaac550362d98c384c4e576ccc8b1536847b6bbb31023b4c8caee0535"."0xdd268dbcaac550362d98c384c4e576ccc8b1536847b6bbb31023b4c8caee0535".

[0070] Para cada byte en la entrada al algoritmo de hash, el desplazamiento se multiplica primero por un numero primo grande, posteriormente se compara con el byte de la entrada y finalmente se calcula la diferencia simetrica a nivel de bits (XOR) para formar el valor de hash para el bucle siguiente. Los numeros primos apropiados se encuentran en la literatura disponible. Cualquier numero primo grande funcionara, pero algunos son mas resistentes a la colision que otros.[0070] For each byte in the input to the hash algorithm, the displacement is first multiplied by a large prime number, then it is compared with the byte of the input and finally the symmetric difference at the bit level (XOR) is calculated to form the hash value for the next loop. The appropriate prime numbers are found in the literature available. Any large prime number will work, but some are more resistant to collision than others.

[0071] Los identificadores digitales se pueden generar utilizando cualquier algoritmo de hash que sea razonablemente resistente a la colision. En una forma de realizacion, los identificadores se generan utilizando un algoritmo de hash rapido con alta resistencia a la colision y baja seguridad criptografica.[0071] The digital identifiers can be generated using any hash algorithm that is reasonably resistant to collision. In one embodiment, the identifiers are generated using a fast hash algorithm with high collision resistance and low cryptographic security.

[0072] En una forma de realizacion especifica, se puede crear un identificador de 256-bits mediante concatenacion de cuatro valores de hash FVN de 64-bits, donde cada uno se ha generado utilizando un multiplicador primo diferente. Usando cuatro valores de hash mas cortos y concatenandolos, el identificador se puede generar mas rapido. Para aumentar la velocidad de la generacion del identificador, el algoritmo se puede modificar para que use no solo un byte de la entrada por bucle, sino cuatro bytes. Este puede suponer una perdida de seguridad criptografica, mientras que la resistencia a la colision se mantiene aproximadamente igual.[0072] In a specific embodiment, a 256-bit identifier can be created by concatenation of four 64-bit FVN hash values, where each has been generated using a different prime multiplier. Using four shorter hash values and concatenating them, the identifier can be generated faster. To increase the speed of the identifier generation, the algorithm can be modified to use not only one byte of the input per loop, but four bytes. This can suppose a loss of cryptographic security, while the resistance to the collision stays approximately equal.

[0073] Los identificadores con una longitud de al menos 256 bits pueden producir una resistencia a la colision beneficiosa. Un valor de hash de 256-bits significa que hay aproximadamente 1E+77 valores identificadores posibles. Este numero se puede comparar con el numero de atomos en el universo, que se ha estimado en 1E+80. Esto significa que el riesgo de colisiones, es decir, el riesgo de que dos selecciones/subconjuntos de datos/propiedades del grafico/resultados del grafico diferentes produzcan el mismo identificador no es solo extremadamente pequeno, sino insignificante. De esta forma, se puede decir con certeza que el riesgo de colisiones es aceptablemente pequeno. Esto significa que, aunque el algoritmo de hash no genera identificadores teoricamente unicos, si que genera, sin embargo, identificadores estadisticamente unicos. No obstante, debe entenderse que los identificadores con menores longitudes de bit, como 64 o 128 bits, pueden ser lo bastante estadisticamente unicos para una aplicacion especifica.[0073] Identifiers with a length of at least 256 bits can produce a beneficial collision resistance. A 256-bit hash value means that there are approximately 1E + 77 possible identifier values. This number can be compared to the number of atoms in the universe, which has been estimated at 1E + 80. This means that the risk of collisions, that is, the risk of two different selections / subsets of data / graph properties / different graph outputs producing the same identifier is not only extremely small, but insignificant. In this way, it can be said with certainty that the risk of collisions is acceptably small. This means that, although the hash algorithm does not generate theoretically unique identifiers, it nevertheless generates statistically unique identifiers. However, it should be understood that identifiers with smaller bit lengths, such as 64 or 128 bits, may be statistically unique enough for a specific application.

[0074] Como se ha mencionado anteriormente, se puede implementar un mecanismo de eliminacion para eliminar la cache de entradas antiguas o sin usar. Una estrategia puede ser eliminar la(s) entrada/entradas con menos uso en la cache. Sin embargo, se puede implementar un mecanismo de eliminacion mas avanzado para apoyar la optimizacion tanto del uso del procesador como del uso de la memoria. Una forma de realizacion de dicho mecanismo de eliminacion avanzado opera en tres parametros: Uso, Tiempo de calculo y Memoria necesaria.[0074] As mentioned above, a removal mechanism can be implemented to eliminate the cache of old or unused entries. One strategy may be to eliminate the entry (s) with the least use in the cache. However, a more advanced elimination mechanism can be implemented to support the optimization of both the use of the processor and the use of memory. One embodiment of said advanced elimination mechanism operates in three parameters: Use, Calculation time and Required memory.

[0075] El parametro de Uso es un valor numerico que puede considerar tanto si se ha accedido a una entrada "recientemente, pero no a menudo" como si se ha accedido a la entrada "a menudo, pero no recientemente". Esto se puede lograr asociando cada entrada a un parametro de uso U que se aumenta mediante, por ejemplo, una unidad cada vez que se accede a la entrada, pero reduce su valor exponencialmente, o por cualquier otra funcion, a lo largo del tiempo. En una implementacion, todos los valores de U en la cache se reducen periodicamente en una cantidad fija. Asi, el parametro de uso tiene una vida media, similar a la desintegracion radiactiva. El valor de U reflejara ahora con que frecuencia y hace cuanto tiempo se ha accedido a la entrada.[0075] The Use parameter is a numerical value that can be considered whether a "recently, but not often" entry has been accessed or "often, but not recently" accessed. This can be achieved by associating each input with a usage parameter U that is increased by, for example, a unit each time the input is accessed, but reduces its value exponentially, or by any other function, over time. In an implementation, all the values of U in the cache are periodically reduced by a fixed amount. Thus, the use parameter has a half-life, similar to radioactive decay. The value of U will now reflect how often and how long the entry has been accessed.

[0076] Si el tiempo de procesador necesario para calcular una entrada es considerable, entonces la entrada se deberia mantener durante mas tiempo en la cache. Por el contrario, si el tiempo de procesador necesario para el calculo es poco, entonces el coste de recalcular es bajo y el beneficio de mantener la entrada en la cache tambien es bajo. Asi, cada entrada se asocia a un parametro temporal T que representa el tiempo de calculo estimado.[0076] If the processor time necessary to calculate an entry is considerable, then the entry should be kept longer in the cache. On the contrary, if the processor time necessary for the calculation is little, then the cost of recalculating is low and the benefit of keeping the entry in the cache is also low. Thus, each input is associated with a time parameter T that represents the estimated calculation time.

[0077] Si el espacio de memoria necesario para memorizar una entrada es considerable, entonces supone el uso de muchos de los recursos de la cache para mantenerlo y deberia eliminarse de la cache antes que una entrada que requiera menos espacio de memoria. Por el contrario, una entrada que requiera poco espacio de memoria se puede mantener durante mas tiempo en la cache. Asi, cada entrada se asocia con un parametro de memoria M que representa la memoria necesaria estimada.[0077] If the memory space necessary to memorize an entry is considerable, then it involves the use of many of the resources of the cache to maintain it and should be removed from the cache before an entry requiring less memory space. Conversely, an entry that requires little memory space can be kept longer in the cache. Thus, each input is associated with a memory parameter M that represents the estimated memory required.

[0078] Por cada entrada en la cache, los valores de los parametros U, T y M se evaluan mediante una funcion de peso W obtenida por: W = U * T / M.[0078] For each entry in the cache, the values of the parameters U, T and M are evaluated by a function of weight W obtained by: W = U * T / M.

[0079] Un valor de W alto para una entrada indica que hay buenas razones para mantener esta entrada en la cache. Asi, las entradas con valores W altos deberian mantenerse en la cache y aquellas con valores W bajos deberian ser eliminadas.[0079] A high W value for an entry indicates that there are good reasons to keep this entry in the cache. Thus, entries with high W values should remain in the cache and those with low W values should be eliminated.

[0080] Un mecanismo de eliminacion eficaz puede implicar ordenar la cache segun los valores W y eliminar la cache ordenada en un extremo, es decir, las entradas con los valores W mas bajos. Un metodo posible, pero no necesario, para mantener una cache ordenada podria ser almacenar los identificadores, resultados y valores U, T, M y W como un arbol AVL (Adelson-Velsky y Landis), es decir, un arbol binario de busqueda equilibrado.[0080] An effective elimination mechanism may involve ordering the cache according to the W values and eliminating the ordered cache at one end, that is, the entries with the lowest W values. A possible, but not necessary, method to maintain an ordered cache could be to store the identifiers, results and U, T, M and W values as an AVL tree (Adelson-Velsky and Landis), that is, a balanced search binary tree .

[0081] El mecanismo de eliminacion puede eliminar intermitentemente todas las entradas con un valor W que esten por debajo de un valor umbral predeterm inado. [0081] The elimination mechanism can intermittently eliminate all inputs with a value W that are below a predetermined threshold value.

[0082] De forma alternativa, el mecanismo de eliminacion se puede controlar mediante la cantidad de memoria disponible en el ordenador o la proporcion de memoria disponible de la memoria total. Asi, siempre que el tamano de la memoria cache campo un valor umbral de memoria, el mecanismo de eliminacion elimina entradas de las entradas cache basandose en sus respectivos valores W. Ajustando el umbral de memoria, es posible adaptar el tamano de cache a las condiciones del hardware local, por ejemplo intercambiando la potencia de procesamiento con la memoria. Por ejemplo, es posible compensar un procesador mas lento en un ordenador anadiendo mas memoria principal al ordenador y aumentando el umbral de memoria. Asi, se retendran mas resultados en la cache y la necesidad de procesamiento se reducira.[0082] Alternatively, the elimination mechanism can be controlled by the amount of available memory in the computer or the available memory proportion of the total memory. Thus, whenever the memory size caches a memory threshold value, the elimination mechanism removes entries from the cache entries based on their respective values W. By adjusting the memory threshold, it is possible to adapt the cache size to the conditions of local hardware, for example by exchanging processing power with memory. For example, it is possible to compensate for a slower processor in a computer by adding more main memory to the computer and increasing the memory threshold. Thus, more results will be retained in the cache and the need for processing will be reduced.

[0083] Las formas de realizacion de la invencion tambien se refieren a un aparato para realizar cualquiera de los algoritmos, metodos, procesos y procedimientos descritos previamente. Este aparato puede construirse especialmente para el fin requerido o puede incluir un ordenador general que se activa o reconfigura selectivamente mediante un programa informatico almacenado en el ordenador.[0083] The embodiments of the invention also refer to an apparatus for performing any of the previously described algorithms, methods, processes and procedures. This apparatus can be specially constructed for the required purpose or it can include a general computer that is activated or selectively reconfigured by a computer program stored in the computer.

[0084] La Fig. 8 es un diagrama de bloques de un entorno de ordenador para implementar cualquiera de las formas de realizacion de la invencion. Un usuario 1 interactua con un sistema de tratamiento de datos 2, que incluye un procesador 3 que ejecuta el software del sistema operativo, asi como uno o mas programas de aplicacion que implementan una forma de realizacion de la invencion. El usuario introduce informacion en el sistema de tratamiento de datos 2 usando uno o mas dispositivos de entrada conocidos 4, como un raton, un teclado, un panel tactil, etc. De forma alternativa, la informacion se puede introducir con o sin intervencion del usuario con cualquier otro tipo de dispositivo de entrada, tal como un lector de tarjetas, un lector optico u otro sistema informatico. La respuesta visual se puede hacer llegar al usuario mostrando caracteres, simbolos graficos, ventanas, teclas, etc., en una pantalla 5. El sistema de tratamiento de datos incluye ademas la memoria 10 antes mencionada. El software ejecutado por el procesador 3 almacena informacion acerca de su operacion en la memoria 10 y recupera informacion apropiada desde la memoria 10. La memoria 10 incluye habitualmente una memoria principal (como RAM, memoria cache, etc.) y una memoria secundaria no volatil (disco duro, memoria flash, soporte extraible). La base de datos se puede almacenar en la memoria 10 del sistema de tratamiento de datos o se puede acceder a ella en un dispositivo de memoria externa a traves de una interfaz de comunicaciones 6 en el sistema de tratamiento de datos 2.[0084] FIG. 8 is a block diagram of a computer environment for implementing any of the embodiments of the invention. A user 1 interacts with a data processing system 2, which includes a processor 3 running the operating system software, as well as one or more application programs that implement an embodiment of the invention. The user enters information into the data processing system 2 using one or more known input devices 4, such as a mouse, a keyboard, a touch panel, etc. Alternatively, the information may be entered with or without user intervention with any other type of input device, such as a card reader, optical reader or other computer system. The visual response can be made available to the user by displaying characters, graphic symbols, windows, keys, etc., on a screen 5. The data processing system also includes the aforementioned memory 10. The software executed by the processor 3 stores information about its operation in the memory 10 and recovers appropriate information from the memory 10. The memory 10 usually includes a main memory (such as RAM, cache memory, etc.) and a non-volatile secondary memory. (hard disk, flash memory, removable media). The database can be stored in the memory 10 of the data processing system or can be accessed in an external memory device through a communications interface 6 in the data processing system 2.

[0085] La invencion se ha descrito antes principalmente en referencia a algunas formas de realizacion. Sin embargo, como el experto en la tecnica podra notar facilmente, tambien son posibles otras formas de realizacion aparte de las que se han descrito previamente.[0085] The invention has been described above mainly with reference to some embodiments. However, as the person skilled in the art will be able to easily notice, other forms of realization other than those previously described are also possible.

[0086] Por ejemplo, la presente invencion no solo se puede aplicar para calcular cubos multidimensionales, sino que puede ser util en cualquier situacion en la que se extrae informacion de una base de datos que se sirve de una cadena de calculos.[0086] For example, the present invention can not only be applied to calculate multidimensional cubes, but it can be useful in any situation in which information is extracted from a database that uses a chain of calculations.

[0087] Ademas, el proceso de extraccion inventivo se puede aplicar a una cadena de calculos que implica mas de dos calculos consecutivos. Por ejemplo, cada uno de dos o mas resultados intermedios en una cadena de calculos se puede almacenar en cache y recuperar posteriormente de forma similar al resultado intermedio descrito anteriormente.[0087] In addition, the inventive extraction process can be applied to a string of calculations involving more than two consecutive calculations. For example, each of two or more intermediate results in a string of calculations can be cached and subsequently retrieved in a manner similar to the intermediate result described above.

[0088] Asimismo, el proceso de extraccion inventivo no requiere almacenar en cache y posteriormente recuperar el resultado final, sino que puede operar solo para almacenar en cache y recuperar uno o mas resultados intermedios en una cadena de calculos.[0088] Likewise, the inventive extraction process does not require caching and subsequently recovering the final result, but it can operate only to cache and retrieve one or more intermediate results in a chain of calculations.

[0089] Adicionalmente, se debe tener en cuenta que la etapa inicial de extraccion de un conjunto o campo de datos inicial de la base de datos se puede omitir y el proceso de extraccion puede, en cambio, operar directamente en la base de datos. [0089] Additionally, it must be taken into account that the initial stage of extraction of an initial set or field of data from the database can be omitted and the extraction process can, instead, operate directly in the database.

Claims

1. Method implemented in the computer for extracting information from a database, where said method includes a sequential string of main calculations including a first principal calculation (P1) that operates a first selection element (S1) in a set data (R0) representing the database to produce an intermediate result (R1), and a second main calculation (P2) that operates a second selection element (S2) in the intermediate result (R1) to produce a final result (R2), where said method also includes the recovery of the final result through the steps of:

(a) calculating a first selection identifier value (ID1) as a statistically unique fingerprint generated by a hash function of at least the first selection element (S1);

(b) searching, in the objects of the data structure, the first selection identifier value (ID1) and, if the first selection identifier value (ID1) is found, locating and retrieving a first result identifier (ID2) ), stored with the first selection identifier value (ID1), as associated objects in a preceding iteration;

(c) if the first result identifier (ID2) is found in sub-step (b),

calculating a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of at least the second selection element (S2) and the first result identifier (ID2) recovered, and

find, in the objects of the data structure, the second value of the selection identifier (ID3) and, if the second value of the selection identifier (ID3) is found, locate and recover a final result (R2), stored with the second selection identifier value (ID3), as associated objects in a preceding iteration;

(d) if the first result identifier (ID2) is not found in sub-step (b),

executing the first principal calculation (P1) to produce the intermediate result (R1) and the first result identifier value (ID2) as a fingerprint generated by a hash function of the intermediate result (R1),

storing the first selection identifier value (ID1) and the first result identifier value (ID2) as associated objects in the data structure; Y

store the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure,

calculating a second selection identifier value (ID3) as a statistically unique fingerprint generated by a hash function of the first result identifier value (ID2) and the second selection element (S2), and

search the objects of the data structure based on the second selection identifier value (ID3) and, if the second selection identifier value (ID3) is found, locate and retrieve a final result (R2) stored with the second selection identifier value (ID3) as associated objects in a preceding iteration;

(e) if the final result (R2) is not found in sub-step (c) or (d),

search, in the objects of the data structure based on the first result identifier value (ID2); (f) if the first result identifier value (ID2) is not found in sub-step (e),

storing the first result identifier value (ID2) and the intermediate result (R1) as associated objects in the data structure, and

executing the second main calculation (P2) to produce the final result (R2) and storing the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure; Y

(g) if the first result identifier value (ID2) is found in sub-step (e),

recovering the intermediate result (R1) stored with the first result identifier value (ID2) as associated objects in a preceding iteration, and

execute the second main calculation (P2) to produce the final result (R2) and store the second selection identifier value (ID3) and the final result (R2) as associated objects in the data structure.

2. Method according to claim 1, wherein the fingerprint comprises at least 256 bits.

Method according to claim 1 or claim 2, further including the step of selectively removing the data records containing said associated objects in the data structure, based at least on the size of the data record.

Method according to claim 3, wherein the step of selective deletion is configured to cause the elimination of data records containing said first result (R1).

Method according to claim 3 or 4, further comprising the step of associating each data record with a weight value, which is calculated as a function of a usage parameter for each data record, a time calculation parameter for each data record and a size parameter for each data record, where the usage parameter is a numeric value that represents how often and how long ago the data record was accessed, where the time calculation parameter represents the Calculation time estimated for the data record and the size parameter represents the size of the data record.

Method according to claim 5, wherein the weight value is calculated by evaluating a weight function obtained by W = U * T / M, where U is the use parameter, T is the calculation time parameter and M is the size parameter.

Method according to claim 5 or 6, wherein the value of the usage parameter is increased whenever the data record is accessed, while it decreases exponentially as a function of time.

Method according to any of claims 4-7, wherein the step of selective elimination is based on the weight value of the data record in the data structure.

The method according to any of claims 4-8, wherein the step of selective elimination is triggered based on a comparison between a current size of the data structure and a threshold value.

Method according to any one of the preceding claims, wherein the database is a dynamic database and, therefore, a database that can change at any time, and the first selection identifier value (ID1) is calculated as a function of at least the first selection element (S1) and the data set (R0).

The method according to any one of the preceding claims, wherein the first selection element (S1) defines a set of fields in the data set (R0) and a condition for each field, where the intermediate result (R1) is representative of a subset of the data set (R0), where the second selection element (S2) defines a mathematical function, one or more calculation variables included in the intermediate result (R1) and one or more classification variables included in the intermediate result (R0) R1), and where the final result (R2) is a multidimensional cube data structure containing the result of operating the mathematical function in said one or more calculation variables for each unique value of each classification variable.

12. A computer-readable medium that stores a computer program that, when executed by means of a computer, is capable of carrying out the method of any of claims 1-11.

13. Apparatus for extracting information from a database, wherein said apparatus comprises means for executing a sequential string of main calculations comprising a first principal calculation (P1) that operates a first selection element (S1) in a set of data (R0) representing the database to produce an intermediate result (R1), and a second main calculation (P2) that operates a second selection element (S2) in the intermediate result (R1) to produce a final result ( R2), wherein said apparatus further comprises means for recovering the final result by the steps of:

(b) searching, in the objects of the data structure, the first selection identifier value (ID1) and, if the first selection identifier value (ID1) is found, locating and retrieving a first result identifier (ID2) ) stored with the first selection identifier value (ID1) as associated objects in a preceding iteration;

(c) if the first result identifier (ID2) is found in sub-step (b),

find, in the objects of the data structure, the second value of selection identifier (ID3) and, if the second value of selection identifier (ID3) is found, locate and recover a final result (R2) stored with the second selection identifier value (ID3) as associated objects in a preceding iteration;

(d) if the first result identifier (ID2) is not found in sub-step (b),

(e) if the final result (R2) is not found in sub-step (c) or (d),

search the objects of the data structure based on the first result identifier value (ID2);

(f) if the first result identifier value (ID2) is not found in sub-step (e)

(g) if the first result identifier value (ID2) is found in sub-step (e)