ITPI20100117A1

ITPI20100117A1 - METHOD TO PERFORM THE AUTOMATIC COMPARISON OF TESTUAL DOCUMENTS

Info

Publication number: ITPI20100117A1
Application number: IT000117A
Authority: IT
Inventors: Roberto Montelatici
Original assignee: Roboing S R L
Priority date: 2010-10-12
Filing date: 2010-10-12
Publication date: 2012-04-13

Description

TITOLO TITLE

METODO PER ESEGUIRE IL CONFRONTO AUTOMATICO METHOD FOR PERFORMING THE AUTOMATIC COMPARISON

DI DOCUMENTI TESTUALI OF TEXTUAL DOCUMENTS

SETTORE TECNICO TECHNICAL FIELD

La presente invenzione concerne un metodo per il confronto di documenti di testo e, in modo specifico ma non esclusivo, per il confronto di codici sorgente di programmi per elaboratori. The present invention relates to a method for comparing text documents and, specifically but not exclusively, for comparing source codes of computer programs.

STATO DELL’ARTE STATE OF THE ART

Il confronto di documenti testuali editabili tramite editors di testi in elaboratori elettronici ha rilevante importanza in vari settori e per svariati motivi. Tra i principali tipi di documenti testuali per i quali risulta particolarmente importante poter avere a disposizione strumenti per eseguire un confronto efficace ve ne sono due tipologie: i documenti di testo formattati per essere letti da un essere umano, quali ad esempio quelli realizzati tramite l’applicativo Microsoft Word o mediante applicazioni sostanzialmente equivalenti; ed i documenti testuali che costituiscono codici sorgente di programmi per elaboratore, vale a dire serie di istruzione scritte in specifici linguaggi che letti da un elaboratore elettronico gli consentono di eseguire specifiche operazioni. Per il primo tipo di documenti testuali un applicativo efficace per il confronto di due documenti è quello che permette di rendere immediatamente visibile all’utente, in maniera chiara ed immediata, le differenze tra due documenti, solitamente tra un documento ed una sua revisione. Per i programmi per elaboratore, un confronto efficace, ad esempio tra documenti che costituiscono due successive fasi di sviluppo di un programma, è sotto specifici aspetti ancor più importante in quanto può permettere anche di ottimizzare la produzione del software e di valutare l’operato e la produttività del programmatore ed il costo dei progetti software. The comparison of editable text documents using text editors in electronic processors is of great importance in various sectors and for various reasons. Among the main types of text documents for which it is particularly important to have tools available to perform an effective comparison, there are two types: text documents formatted to be read by a human being, such as those created through the Microsoft Word application or by means of substantially equivalent applications; and the textual documents which constitute source codes of computer programs, ie series of instructions written in specific languages which, read by an electronic computer, allow it to perform specific operations. For the first type of text documents, an effective application for comparing two documents is the one that allows you to make the differences between two documents immediately visible to the user, in a clear and immediate manner, usually between a document and its revision. For computer programs, an effective comparison, for example between documents that constitute two successive development phases of a program, is even more important under specific aspects as it can also allow optimizing the production of the software and evaluating the work and the productivity of the programmer and the cost of software projects.

Esistono numerosi applicativi che consentono di eseguire il confronto di documenti testuali. Ad esempio esistono applicazioni per il confronto di documenti testuali scritti in specifici linguaggi di programmazione quale il Cobol. Inoltre, in ambiente Windows esiste una utility che permette il confronto di documenti all’ interno degli ambienti di sviluppo Visual Studio. Anche aH’intemo del programma Word di Microsoft, così come in analoghi programmi di scrittura, esiste una specifica funzione che permette di confrontare due documenti o, preferibilmente due diverse versioni di uno stesso documento. There are a number of applications that allow you to compare text documents. For example, there are applications for comparing textual documents written in specific programming languages such as Cobol. In addition, in the Windows environment there is a utility that allows the comparison of documents within the Visual Studio development environments. Also within the Microsoft Word program, as well as in similar writing programs, there is a specific function that allows you to compare two documents or, preferably two different versions of the same document.

La maggior parte degli applicativi per il confronto di documenti testuali esegue il confronto per righe di testo e, laddove le righe non siano esattamente individuabili in virtù della formattazione del testo, creano delle righe fittizie prendendo in lettura un certo numero di caratteri dal documento per poter eseguire il confronto. In alcuni casi, come nel caso di Word, il confronto avviene per singole parole. Most of the applications for the comparison of textual documents perform the comparison by lines of text and, where the lines are not exactly identifiable by virtue of the formatting of the text, they create dummy lines by reading a certain number of characters from the document in order to perform the comparison. In some cases, as in the case of Word, the comparison occurs for individual words.

Una delle principali lacune degli applicativi per il confronto di documenti testuali ad oggi noti è la limitazione ad operare nell’ ambito di uno specifico sistema operativo e, in molti casi addirittura all’interno di uno specifico applicativo (il caso di Word). Inoltre, anche l’efficacia e la correttezza del confronto possono essere migliorate. Infatti, negli applicativi per il confronto dei codici sorgente di programmi per elaboratore esiste, ad esempio, il problema dell’esistenza di righe uguali: gli algoritmi di confronto di questo tipo di documenti, una volta trovato che una certa riga compare in uno dei due documenti e non nell’altro, scorrono l’altro file fino al raggiungimento di quella riga contabilizzando le opportune righe come aggiunte o eliminate. Chiaramente se tale riga è una riga gemella, la sua identificazione come la riga stessa che veniva cercata provoca errori di attribuzione, contabilizzando come aggiunte o eliminate righe che in realtà sono uguali nei due file. One of the main shortcomings of applications for comparing textual documents known to date is the limitation to operate within a specific operating system and, in many cases even within a specific application (the case of Word). In addition, the effectiveness and accuracy of the comparison can also be improved. In fact, in the applications for the comparison of the source codes of computer programs there is, for example, the problem of the existence of equal lines: the comparison algorithms of this type of documents, once it is found that a certain line appears in one of the two documents and not in the other, scroll through the other file until reaching that line, accounting for the appropriate lines as added or deleted. Clearly if this row is a twin row, its identification as the same row that was searched causes attribution errors, accounting as added or deleted rows that are actually the same in the two files.

SINTESI DELL’INVENZIONE SUMMARY OF THE INVENTION

Scopo della presente invenzione è allora quello di proporre un metodo per il confronto di documenti testuali dalla efficienza migliorata. The aim of the present invention is therefore to propose a method for comparing textual documents with improved efficiency.

In particolare uno scopo della presente invenzione è proporre un metodo per il confronto di documenti testuali che sia indipendente dal sistema operativo installato. In particular, an object of the present invention is to propose a method for comparing textual documents which is independent of the operating system installed.

Un altro scopo della presente invenzione è proporre un metodo per il confronto di codici sorgente di programmi di elaboratore che consenta di valutare in termini quantitativi e qualitativi, le modifiche apportate ad un certo codice sorgente rispetto ad una sua versione precedente. Another object of the present invention is to propose a method for the comparison of source codes of computer programs which allows to evaluate in quantitative and qualitative terms, the modifications made to a certain source code with respect to a previous version thereof.

Secondo un aspetto della presente invenzione, gli scopi suddetti sono raggiunti per mezzo di un metodo per il confronto di documenti testuali in cui il metodo comprende: According to an aspect of the present invention, the above objects are achieved by means of a method for the comparison of textual documents in which the method comprises:

la generazione di un primo vettore di elementi da un primo documento testuale e di un secondo vettore di elementi da un secondo documento testuale, the generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document,

l’ordinamento alfa-numerico degli elementi all’ interno di detto primo vettore e di detto secondo vettore; the alpha-numerical ordering of the elements within said first vector and said second vector;

Γ eliminazione delle ripetizioni di elementi che si presentano più volte nel vettore con conteggio delle relative occorrenze, la comparazione dell’ ottenuto primo vettore con l’ottenuto secondo vettore per verificare le corrispondenze con elementi del secondo vettore. Γ elimination of the repetitions of elements that occur several times in the vector with counting of their occurrences, the comparison of the obtained first vector with the obtained second vector to verify the correspondence with elements of the second vector.

Vantaggiosamente il suddetto metodo comprende ulteriori preventive fasi di: Advantageously, the aforementioned method comprises further preventive steps of:

generazione di un primo vettore di elementi da un primo documento testuale e di un secondo vettore di elementi da un secondo documento testuale, generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document,

la prima comparazione del primo vettore con il secondo vettore per verificare le corrispondenze di elementi del primo vettore con elementi del secondo vettore, the first comparison of the first vector with the second vector to verify the correspondence of elements of the first vector with elements of the second vector,

la verifica se nel primo vettore e nel secondo vettore sono presenti elementi ripetuti nello stesso vettore, e se sono presenti elementi ripetuti: the verification if in the first vector and in the second vector there are repeated elements in the same vector, and if there are repeated elements:

la generazione di un nuovo primo vettore e di un nuovo secondo vettore, the generation of a new first vector and a new second vector,

la ripetizione della suddetta prima comparazione eseguita con detto nuovo primo vettore e detto nuovo secondo vettore, the repetition of the aforementioned first comparison performed with said new first vector and said new second vector,

la verifica se nel nuovo primo vettore e nel nuovo secondo vettore sono presenti elementi ripetuti nello stesso vettore, e se sono presenti elementi ripetuti: the verification if in the new first vector and in the new second vector there are repeated elements in the same vector, and if there are repeated elements:

la generazione di un ulteriore nuovo primo vettore e di un ulteriore nuovo secondo vettore, the generation of a further new first vector and a further new second vector,

una seconda comparazione dell 'ulteriore nuovo primo vettore con Γ ulteriore nuovo secondo vettore, a second comparison of the further new first vector with Γ further new second vector,

la scrittura dei risultati del suddetto confronto; writing the results of the aforementioned comparison;

in cui la generazione del suddetto ulteriore nuovo primo vettore avviene eliminando le ripetizioni degli elementi ripetuti di detto nuovo primo vettore e annotandone le occorrenze, e la generazione del suddetto ulteriore nuovo secondo vettore avviene eliminando le ripetizioni degli elementi ripetuti di detto nuovo secondo vettore e annotandone le occorrenze. in which the generation of the aforementioned further new first vector takes place by eliminating the repetitions of the repeated elements of said new first vector and noting their occurrences, and the generation of the aforementioned further new second vector takes place by eliminating the repetitions of the repeated elements of said new second vector and noting them occurrences.

Ancora vantaggiosamente nel metodo della presente invenzione la generazione di detto nuovo primo vettore e di detto nuovo secondo vettore avvengono scomponendo detto primo e secondo documento testuale, rispettivamente, in parole a cui sono associati i relativi articoli e preposizioni. Still advantageously in the method of the present invention, the generation of said new first vector and of said new second vector takes place by breaking down said first and second textual document, respectively, into words to which the relative articles and prepositions are associated.

Alternativamente la generazione di detto nuovo primo vettore e di detto nuovo secondo vettore avvengono scomponendo detto primo e secondo documento testuale, rispettivamente, in righe di codice sorgente, ad ognuna di dette righe essendo associato il nome di una routine in cui è contenuta. Alternatively, the generation of said new first vector and of said new second vector takes place by breaking down said first and second textual document, respectively, into lines of source code, each of said lines being associated with the name of a routine in which it is contained.

Vantaggiosamente i documenti testuali di cui avviene il confronto con il metodo della presente invenzione sono due versioni del codice sorgente di un programma per elaboratore ed i risultati del confronto vengono incrociati con banche dati contenenti informazioni su elementi costitutivi del linguaggio di programmazione in cui sono scritti detti codici sorgente, ad ognuno di detti elementi costitutivi essendo associato un peso o punteggio, il risultato del suddetto incrocio essendo costituito da una valorizzazione delle differenze tra dette due versioni di codice sorgente. Advantageously, the textual documents which are compared with the method of the present invention are two versions of the source code of a computer program and the results of the comparison are crossed with databases containing information on constituent elements of the programming language in which said source codes, a weight or score being associated with each of said constituent elements, the result of the aforementioned crossing being constituted by an evaluation of the differences between said two versions of source code.

Vantaggiosamente il primo vettore viene generato da un codice sorgente di un software ed il secondo vettore viene generato in seguito a: la realizzazione di un documento a) comprendente detto codice sorgente, ad ogni riga di detto documento essendo abbinato un numero progressivo; l’esecuzione del suddetto software mediante un kit di test di software e, ad ogni esecuzione, la generazione di un documento c) in cui vengono memorizzate informazioni relative alle righe del codice sorgente del software eseguite ed il relativo numero progressivo; Γ unione di tutti i documenti c) generati in un unico documento d); la generazione di detto secondo vettore da detto documento d). Advantageously, the first vector is generated from a source code of a software and the second vector is generated as a result of: the creation of a document a) comprising said source code, a progressive number being associated with each line of said document; the execution of the aforementioned software by means of a software test kit and, at each execution, the generation of a document c) in which information relating to the lines of the source code of the software executed and their sequential number are stored; Γ union of all documents c) generated in a single document d); the generation of said second vector from said document d).

Secondo un altro aspetto della presente invenzione gli scopi suddetti sono raggiunti tramite un mezzo che può essere letto da un elaboratore elettronico, detto mezzo essendo caratterizzato dal fatto di contenere un codice che quando eseguito in un elaboratore elettronico dà origine ad un processo che comprende: According to another aspect of the present invention, the aforementioned purposes are achieved by means of a means that can be read by an electronic computer, said means being characterized by the fact that it contains a code which when executed in an electronic computer gives rise to a process which comprises:

Γ ordinamento alfa-numerico degli elementi all’ interno di detto primo vettore e di detto secondo vettore; Γ alpha-numeric ordering of the elements within said first vector and said second vector;

Γ eliminazione delle ripetizioni di elementi che si presentano più volte nel vettore con conteggio delle relative occorrenze, Γ elimination of repetitions of elements that occur several times in the vector with counting of their occurrences,

la comparazione dell’ ottenuto primo vettore con l’ottenuto secondo vettore per verificare le corrispondenze con elementi del secondo vettore. the comparison of the obtained first vector with the obtained second vector to verify the correspondence with elements of the second vector.

Vantaggiosamente il processo generato dal codice memorizzato nel mezzo dell’invenzione comprende ulteriori preventive fasi di: generazione di un primo vettore di elementi da un primo documento testuale e di un secondo vettore di elementi da un secondo documento testuale, Advantageously, the process generated by the code stored in the middle of the invention includes further preventive steps of: generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document,

BREVE DESCRIZIONE DEI DISEGNI BRIEF DESCRIPTION OF THE DRAWINGS

Queste ed altre caratteristiche dell 'invenzione risulteranno più facilmente comprensibili dalla seguente descrizione di una forma realizzati va preferita dell 'invenzione, fornita come esempio non limitativo, con riferimento alle figure allegate nelle quali: These and other characteristics of the invention will be more easily understood from the following description of a preferred embodiment of the invention, given as a non-limiting example, with reference to the attached figures in which:

la figura 1 mostra uno schematico diagramma di flusso che illustra in modo molto generale le modalità di funzionamento del metodo dell’invenzione in diverse tipologie di documenti di testo; Figure 1 shows a schematic flow chart that illustrates in a very general way the methods of operation of the method of the invention in different types of text documents;

la figura 2 mostra un diagramma di flusso che illustra la modalità di funzionamento generale di un metodo di confronto secondo la presente invenzione; Figure 2 shows a flow chart illustrating the general operating mode of a comparison method according to the present invention;

la figura 3 mostra un diagramma di flusso che illustra in dettaglio le fasi del metodo di fig.2. Figure 3 shows a flow chart which illustrates in detail the steps of the method of Figure 2.

DESCRIZIONE DELLE FORME REALIZZATIVE PREFERITE DESCRIPTION OF THE PREFERRED MANUFACTURING FORMS

Con riferimento alla figura 1 è indicato complessivamente con 100 un diagramma di flusso che illustra in modo molto generale le modalità di funzionamento del metodo della presente invenzione ed i possibili campi di applicazione. Con 101 è indicato il confronto tra due documenti testuali eseguito riga per riga secondo il metodo della presente invenzione. NeH’eseguire il confronto secondo il metodo della presente invenzione si possono presentare diversi casi che vengono gestiti in modo diverso. Quando il documento di testo è un documento di dati, 103, solitamente non si incontrano problemi e dal confronto riga per riga si ottiene un risultato corretto che può essere immediatamente prodotto. Quando il documento di testo è un documento scritto ad esempio in italiano o altra lingua per la lettura da parte di esseri umani si può incontrare il problema della individuazione delle righe da confrontare, 102, in questo caso il metodo della presente invenzione utilizza una particolare modalità di scomposizione parola per parola, 105, che permette di produrre un risultato esente da errori. Come ulteriore sviluppo, 107, in seguito aH’utilizzo del metodo dell’ invenzione possono essere individuate nel documento di partenza ed in quello di arrivo le parole o diciture più numerose, per esempio per individuare l’argomento di cui tratta il documento. Quando il documento di testo è un codice sorgente di un programma per elaboratore, 104, si può presentare il problema della presenza di righe uguali ripetute in parti diverse del documento. In questo caso con il metodo dell’invenzione le righe di codice uguali vengono distinte assegnando loro ulteriori attributi, 106, così da poter operare correttamente il confronto. In questo caso, oltre a produrre un risultato corretto il confronto eseguito con il metodo dell’invenzione può portare alla possibilità di implementare ulteriori funzioni particolarmente interessanti quali l’inserimento di istruzioni che eseguono la tracciatura delle varie routine/funzioni del codice sorgente, 108, particolarmente utili in fase di test del programma, oppure funzioni di automatizzazione della valutazione quantitativa e qualitativa delle modifiche apportate al programma rispetto ad una versione precedente dello stesso, 109. With reference to Figure 1, 100 generally indicates a flow chart which illustrates in a very general way the operating modes of the method of the present invention and the possible fields of application. 101 indicates the comparison between two textual documents performed line by line according to the method of the present invention. In carrying out the comparison according to the method of the present invention, several cases may arise which are handled differently. When the text document is a data document, 103 usually there are no problems and the row-by-row comparison produces a correct result that can be produced immediately. When the text document is a document written for example in Italian or another language for reading by human beings, the problem of identifying the lines to be compared, 102, can be encountered, in this case the method of the present invention uses a particular method of word-by-word decomposition, 105, which allows you to produce an error-free result. As a further development, 107, following the use of the method of the invention, the most numerous words or terms can be identified in the source and arrival documents, for example to identify the topic of which the document is about. When the text document is a source code of a computer program, 104, the problem may arise of the presence of identical lines repeated in different parts of the document. In this case, with the method of the invention, the same lines of code are distinguished by assigning them further attributes, 106, so that the comparison can be performed correctly. In this case, in addition to producing a correct result, the comparison performed with the method of the invention can lead to the possibility of implementing further particularly interesting functions such as the insertion of instructions that trace the various routines / functions of the source code, 108, particularly useful in the testing phase of the program, or functions for automating the quantitative and qualitative evaluation of the changes made to the program compared to a previous version of the same, 109.

Il metodo di confronto della presente invenzione è definito nelle sue linee generali con riferimento al diagramma di flusso, 200, di fig. The comparison method of the present invention is defined in its general lines with reference to the flow chart, 200, of fig.

2. In seguito alla selezione di due documenti testuali da confrontare, 201, i documenti vengono convertiti in vettori e si avanza alternativamente nei due documenti eseguendo un confronto riga per riga, 202. In seguito a questa operazione tramite il metodo si cercano di individuare righe eliminate e righe aggiunte del documento di partenza rispetto a quello di arrivo, 203. Se nel documento di partenza o nel documento di arrivo non ci sono righe ripetute, 204, allora è possibile produrre direttamente il risultato del confronto, 206. Se invece ci sono righe che si ripetono nel documento di partenza o nel documento di arrivo, 205, si presentano due possibilità: o le righe non sono ben definite, 208, oppure siamo di fronte ad un codice sorgente in cui una stessa riga ritorna più volte, 209. Nel primo caso, secondo il metodo dell’ invenzione si procede a suddividere il documento per singole parole, accorpando però nella parola articoli e preposizioni, 207. In questo modo vengono definiti nuovi vettori con i quali operare il confronto riga per riga. Se all’esecuzione del confronto non ci sono righe ripetute, 212 si produce il risultato, 206. Se invece si riscontra ancora il problema della presenza di righe uguali nel vettore di partenza o in quello di arrivo si esegue un ordinamento delle righe eliminando in ogni vettore le righe ripetute e conteggiando le occorrenze delle stesse, 213. Viene quindi eseguito nuovamente il confronto tra i due nuovi vettori ottenuti e viene prodotto il risultato, 214. Se i due documenti sono invece due versioni del codice sorgente di un programma per elaboratore si esegue una diversa suddivisione dei due vettori da confrontare applicando ad ogni riga un ulteriore identificativo che rappresenta il nome della routine a cui tale riga appartiene, 211. I due nuovi vettori ottenuti vengono confrontati e se non ci sono righe ripetute si produce il risultato, 214. Se invece ci sono ancora righe ripetute nell’uno o nell’ altro vettore si esegue un ordinamento con conteggio delle occorrenze come sopra delineato, 213, e quindi si procede nuovamente all’esecuzione del confronto producendo il risultato, 214. 2. Following the selection of two textual documents to be compared, 201, the documents are converted into vectors and the two documents are advanced alternately by performing a line by line comparison, 202. Following this operation, the method attempts to locate lines deleted and added lines of the source document with respect to the target one, 203. If there are no repeated lines in the source document or in the target document, 204, then it is possible to directly produce the result of the comparison, 206. If, on the other hand, there are lines that are repeated in the source document or in the target document, 205, there are two possibilities: either the lines are not well defined, 208, or we are dealing with a source code in which the same line returns several times, 209. In the first case, according to the method of the invention, the document is subdivided into single words, but by combining articles and prepositions, 207 in the word. ovi vectors with which to compare line by line. If there are no repeated rows when performing the comparison, 212 produces the result, 206. If, on the other hand, the problem of the presence of identical rows in the starting vector or in the destination vector is still performed, the rows are sorted by eliminating in each vector the repeated lines and counting the occurrences of the same, 213. The comparison is then performed again between the two new vectors obtained and the result is produced, 214. If the two documents are instead two versions of the source code of a computer program, performs a different subdivision of the two vectors to be compared by applying to each row an additional identifier that represents the name of the routine to which that row belongs, 211. The two new vectors obtained are compared and if there are no repeated rows, the result is produced, 214 . If, on the other hand, there are still lines repeated in one or the other vector, a sort order is performed with counting of the occurrences as above delineat o, 213, and then the comparison is performed again, producing the result, 214.

Con il metodo dell’invenzione si ottiene un risultato esente da errori dovuti a righe ripetute in uno dei due documenti semplicemente eseguendo al massimo tre volte la scomposizione in vettori sui quali eseguire il confronto: la prima volta secondo l’immediata e convenzionale scomposizione in righe di testo, la seconda volta scomponendo o in parole con i relativi articoli e preposizioni oppure in righe provviste di un ulteriore attributo che rappresenta la routine in cui sono inserite, la terza volta ordinando le righe, eliminando quelle ripetute e conteggiando le occorrenze. Con un numero massimo di tre passaggi si ha la certezza che il confronto tra i due documenti non generi risultati falsi dovuti alla presenza di righe ripetute nell’uno o nell’altro documento. With the method of the invention, a result free from errors due to repeated lines in one of the two documents is obtained simply by carrying out a maximum of three times the decomposition into vectors on which to perform the comparison: the first time according to the immediate and conventional decomposition into lines of text, the second time by breaking down either into words with the relative articles and prepositions or into lines with an additional attribute that represents the routine in which they are inserted, the third time by ordering the lines, eliminating the repeated ones and counting the occurrences. With a maximum number of three steps you can be sure that the comparison between the two documents does not generate false results due to the presence of repeated lines in one or the other document.

Sulla base del metodo di confronto sopra delineato e le cui modalità di funzionamento verranno descritte con maggiore dettaglio nel seguito è possibile creare una molteplicità di applicativi ed Utilities che possono essere anche riuniti in pacchetti in funzione del campo di utilizzo. Ad esempio alcune specifiche procedure del metodo di confronto possono essere utilizzate per eseguire ricerche di una o più parole all’interno di un documento o di una cartella e per controllare gli standard di programmazione. Altre procedure, in particolare quella di ordinamento, possono essere utilizzate individualmente per eseguire l’ordinamento degli elementi di un documento con eliminazione e conteggio delle occorrenze delle righe ripetute. In questo caso gli elementi con un numero maggiore di occorrenze costituiranno presumibilmente l’argomento principale del documento. Con semplici aggiunte di specifiche procedure possono essere create particolari Utilities per i creatori di programmi per elaboratori. Ad esempio può essere eseguita la tracciatura dei documenti inserendo nel codice delle righe inserite automaticamente. Dopo quasi tutte le righe del codice sorgente possono essere inserite righe di codice che valorizzano una opportuna variabile che contiene il testo dell’ultima istruzione “vera” precedente. Nel costrutto di gestione dell’ errore viene quindi fatto scrivere su un documento di testo le informazioni relative al documento, al blocco di codice e alle ultime istruzioni. Nel caso che avvenga una generazione massiccia di righe di report può essere utilizzato il metodo “sort”, descritto di seguito, per effettuare una eliminazione selettiva delle informazioni meno importanti. La suddetta metodologia di tracciatura può essere utilizzata per eseguire il debug del programma in modo rapido ed efficiente. Ancora a vantaggio di programmatori, possono essere creati kit di test particolarmente efficienti. Inoltre, per tutti i documenti di una cartella contenenti codice sorgente possono essere analizzati i punti salienti del linguaggio, scelte logiche ed elaborazioni cicliche, riportandole in reports che permettono all’analista di verificare se il programma è conforme alle specifiche e agli standard di programmazione. E possibile inoltre creare strumenti per valutare lo sviluppo di progetti software. Infatti, possono essere analizzate le caratteristiche intrinseche del software quali scelte logiche, elaborazioni cicliche, numero di righe del documento e dei blocchi di codice di documenti di input o di output, etc, indipendentemente dal linguaggio di programmazione del progetto. In questo modo possono essere creati report, ad esempio giornalieri che danno una misura dello sviluppo del progetto. Inoltre, nell’ esecuzione del confronto di codici software ogni riga di codice può essere suddivisa in due parti: da un lato le parole chiave, cioè parole standard del linguaggio utilizzato, la cui modifica comporta un cambiamento delle funzionalità, dall’altra parte i nomi di variabili e di altri elementi scritti a discrezione dell’utente, la cui modifica può essere massiva senza effetti complessivamente rilevanti. In questo modo si ottiene anche una corretta valutazione a prescindere dalle eventuali modifiche massive eseguite sul codice sorgente. Addirittura, utilizzando il metodo dell’invenzione può essere realizzato uno strumento per valutare la completezza dei test del codice sorgente e per l’individuazione di parti di programma non testate o che addirittura non verranno mai raggiunte. Uno strumento come sopra delineato può essere realizzato come segue. Si realizza un documento a) con il codice sorgente del software da testare abbinato ad un numero progressivo che indica la posizione della riga. Si realizza un codice sorgente alternativo b) che per ogni riga di programma fa scrivere in un documento, quando il programma viene eseguito, la riga eseguita e il relativo numero progressivo. Il codice sorgente alternativo deve ovviamente aggiungere questa funzionalità di scrittura riga per riga senza alterare la funzionalità del software originale. Si effettuano quindi tutti i test possibili del software con il programma alternativo generando una certa quantità di documenti c). Si esegue l’unione dei suddetti documenti. Si ordina il documento assemblato d) riunendo insieme le righe uguali per numero e contenuto e conteggiando quante volte tali righe sono state trovate. Si confrontano i documenti a) e d) e si ottiene che le righe che compaiono nel documento a) (listato del programma) e non nel documento d) (righe effettivamente eseguite) sono evidentemente righe che il programma non ha elaborato o perché non sono raggiungibili o perché i test non sono completi. On the basis of the comparison method outlined above and whose operating modes will be described in greater detail below, it is possible to create a multiplicity of applications and Utilities that can also be grouped into packages according to the field of use. For example, some specific procedures of the comparison method can be used to search for one or more words within a document or folder and to check programming standards. Other procedures, in particular the sorting procedure, can be used individually to sort the elements of a document with the elimination and counting of the occurrences of repeated rows. In this case, the elements with a greater number of occurrences will presumably constitute the main topic of the document. With simple additions of specific procedures, particular Utilities can be created for the creators of computer programs. For example, documents can be traced by inserting automatically inserted lines in the code. After almost all the lines of the source code, lines of code can be inserted that value a suitable variable that contains the text of the last previous "true" statement. In the error management construct, the information relating to the document, the code block and the latest instructions is then written to a text document. In the event that a massive generation of report lines occurs, the “sort” method, described below, can be used to selectively delete less important information. The aforementioned tracing methodology can be used to debug the program quickly and efficiently. Again for the benefit of programmers, particularly efficient test kits can be created. In addition, for all documents in a folder containing source code, the salient points of the language, logical choices and cyclical processing can be analyzed, reporting them in reports that allow the analyst to check whether the program complies with programming specifications and standards. It is also possible to create tools to evaluate the development of software projects. In fact, the intrinsic characteristics of the software such as logical choices, cyclical processing, number of lines of the document and of the code blocks of input or output documents, etc. can be analyzed, regardless of the programming language of the project. In this way reports can be created, for example daily that give a measure of the development of the project. Furthermore, in the execution of the software code comparison each line of code can be divided into two parts: on the one hand the keywords, i.e. standard words of the language used, the modification of which involves a change of functionality, on the other hand the names of variables and other elements written at the user's discretion, the modification of which can be massive without overall significant effects. In this way, a correct evaluation is also obtained regardless of any massive changes performed on the source code. Indeed, using the method of the invention, a tool can be created to evaluate the completeness of the source code tests and to identify parts of the program that have not been tested or that will never even be reached. A tool as outlined above can be made as follows. A document is created a) with the source code of the software to be tested combined with a progressive number that indicates the position of the line. An alternative source code is created b) which, for each program line, causes the executed line and its progressive number to be written in a document when the program is executed. The alternative source code must obviously add this line-by-line writing functionality without affecting the functionality of the original software. All possible tests of the software are then carried out with the alternative program, generating a certain amount of documents c). The union of the aforementioned documents is carried out. The assembled document is sorted d) by joining together the identical lines in number and content and counting how many times such lines have been found. We compare documents a) and d) and we obtain that the lines that appear in document a) (program listing) and not in document d) (lines actually executed) are obviously lines that the program has not processed or because they are not reachable or because the tests are not complete.

Questi ed altri applicativi possono essere generati applicando in tutto o in parte il metodo di confronto della presente invenzione di cui nel seguito viene descritta in dettaglio procedure di confronto. These and other applications can be generated by applying in whole or in part the comparison method of the present invention of which comparison procedures are described in detail below.

Con riferimento alla fig. 3, è indicata complessivamente con 300 una procedura di confronto tra due vettori. I vettori sono ottenuti convertendo opportunamente due documenti testuali da confrontare. Tramite la definizione di procedure cicliche che verranno nel seguito descritte si esegue il confronto tra elementi del primo vettore ed elementi del secondo vettore avanzando opportunamente nei due vettori in modo da ottenere un confronto efficace ed esente da errori. With reference to fig. 3, generally indicated with 300 a comparison procedure between two vectors. The vectors are obtained by suitably converting two textual documents to be compared. By defining the cyclic procedures which will be described below, the comparison is made between elements of the first vector and elements of the second vector by suitably advancing in the two vectors in order to obtain an effective and error-free comparison.

La procedura di confronto comprende Γ esecuzione di un ciclo sul primo vettore, 305, che consiste nell’ eseguire su ognuno degli elementi del primo vettore una determinata procedura. Tale procedura è a sua volta un ciclo sul secondo vettore, 310, interno al primo, in cui vengono eseguite specifiche procedure su elementi del secondo vettore. Il ciclo 310 viene realizzato sugli elementi del secondo vettore che vanno dall’ elemento successivo all’ultimo elemento del secondo vettore di cui sia stata già stabilita l’uguaglianza o meno con un corrispondente elemento del primo vettore fino all’ultimo elemento del secondo vettore. Le suddette procedure interne al ciclo 310 comprendono il confronto, 315, tra un elemento del primo vettore con un elemento del secondo vettore selezionati in funzione dell’avanzamento nei suddetti cicli 305 e 310. Se dal confronto 315 risulta che i due elementi non sono uguali e se è stato confrontato l’ultimo elemento del secondo vettore allora si registra l’elemento del primo vettore come elemento eliminato, 320, altrimenti si avanza all’elemento successivo del secondo vettore proseguendo il ciclo 310. Se, invece, dal confronto 315 risulta che i due elementi sono uguali si esce dal ciclo 310 memorizzando quale elemento del secondo vettore risulta uguale all 'elemento del primo vettore, 325. A questo punto, se non sono stati trovati precedentemente, nell’ esecuzione del ciclo 310, elementi non corrispondenti, si memorizzano gli elementi confrontati come “uguali”, 330, e si avanza all’elemento successivo del primo vettore, altrimenti il valore di una certa variabile X di tipo “true or false”, 335, che, per chiarezza, potremmo definire come variabile di conferma di aver trovato nel secondo vettore l’elemento corrispondente ad un certo elemento del primo vettore viene impostata su un valore che indica la non conferma. Una volta impostato il valore della suddetta variabile la procedura prevede di eseguire nuovamente i cicli di confronto sul primo vettore e al suo interno sul secondo vettore in cui però i suddetti cicli comprendono gli elementi dei due vettori determinati come descritto di seguito. Se la lunghezza del primo vettore è superiore a ciò che è già stato esplorato nel secondo vettore (determinato dall’ indice del secondo vettore precedentemente memorizzato in 325) allora si imposta la fine del ciclo interno sul primo vettore, 350, all’indice del punto già esplorato nel secondo vettore, 340, altrimenti imposto di percorrere il primo vettore fino all’ultimo elemento, 345. Il ciclo interno sul secondo vettore, 355, avviene invece dall’elemento successivo all’ultimo già confermato in 230 fino all’ultimo elemento esplorato il cui indice è stato memorizzato in 325. Nell’ambito di questi cicli la procedura prevede di volta in volta il confronto, 360, degli elementi selezionati dei due vettori. Se dal confronto 360 risulta che i due elementi non sono uguali e se è stato confrontato l’ultimo elemento del secondo vettore allora si prosegue il ciclo interno 350 sul primo vettore avanzando di un elemento, altrimenti si avanza all’elemento successivo del secondo vettore proseguendo il ciclo 355. Se, invece, dal confronto 360 risulta che i due elementi sono uguali si esce dal ciclo interno 350 e la suddetta variabile X viene impostata sul valore di conferma, 365. Infine, terminato il ciclo interno 350, viene eseguita una procedura che prende in considerazione il valore della variabile X. Se il valore della variabile è di non conferma (335) allora viene eseguito un ciclo di memorizzazione/scrittura degli elementi del vettore due su cui è stato eseguito il ciclo interno 355 come “elementi aggiunti”, 370. Se invece il valore della variabile X è di conferma (365) allora viene eseguito un ciclo di memorizzazione/scrittura relativo al primo vettore in cui gli elementi del primo vettore da quello successivo all 'ultimo già determinato all 'ultimo esplorato vengono contrassegnati come elementi eliminati, 380. In seguito il valore della variabile X viene impostato sul valore non confermato, 380, e quindi si toma all’esecuzione del ciclo principale sul primo vettore (305). Nel caso, invece, che il ciclo principale sia terminato si procede a memorizzare/scrivere, 385, relativamente al secondo vettore, che gli elementi da quello successivo all’ultimo confermato fino all’ultimo elemento del vettore sono elementi inseriti. The comparison procedure includes Γ execution of a cycle on the first vector, 305, which consists in performing a specific procedure on each of the elements of the first vector. This procedure is in turn a cycle on the second vector, 310, inside the first, in which specific procedures are performed on elements of the second vector. The cycle 310 is carried out on the elements of the second vector ranging from the next element to the last element of the second vector whose equality or not with a corresponding element of the first vector has already been established up to the last element of the second vector. The aforesaid procedures internal to the cycle 310 include the comparison, 315, between an element of the first vector with an element of the second vector selected according to the progress in the aforementioned cycles 305 and 310. If from the comparison 315 it results that the two elements are not equal and if the last element of the second vector has been compared, then the element of the first vector is recorded as the deleted element, 320, otherwise one advances to the next element of the second vector by continuing the cycle 310. If, on the other hand, from the comparison 315 it results that the two elements are equal, one exits from cycle 310 by memorizing which element of the second vector is equal to the element of the first vector, 325. At this point, if non-corresponding elements have not been found previously, in the execution of cycle 310, the compared elements are stored as "equal", 330, and advance to the next element of the first vector, otherwise the value of a certain variable X of type "true or false ", 335, which, for clarity, we could define as a confirmation variable that the element corresponding to a certain element of the first vector has been found in the second vector is set to a value that indicates non-confirmation. Once the value of the aforesaid variable has been set, the procedure provides for carrying out the comparison cycles again on the first vector and inside it on the second vector, in which however the aforementioned cycles include the elements of the two vectors determined as described below. If the length of the first vector is greater than what has already been explored in the second vector (determined by the index of the second vector previously stored in 325) then the end of the inner cycle is set on the first vector, 350, at the point index already explored in the second vector, 340, otherwise forced to travel the first vector up to the last element, 345. The inner loop on the second vector, 355, takes place instead from the element following the last one already confirmed in 230 up to the last element explored whose index has been stored in 325. Within these cycles the procedure provides each time the comparison, 360, of the selected elements of the two vectors. If from the comparison 360 it results that the two elements are not equal and if the last element of the second vector has been compared, then the inner cycle 350 is continued on the first vector by advancing by one element, otherwise one advances to the next element of the second vector by continuing cycle 355. If, on the other hand, the comparison 360 shows that the two elements are the same, the internal cycle 350 is exited and the aforementioned variable X is set to the confirmation value, 365. Finally, once the internal cycle 350 is finished, a procedure is performed which takes into account the value of the variable X. If the value of the variable is non-confirmation (335) then a storage / writing cycle of the elements of vector two is performed on which the internal cycle 355 has been performed as "added elements" , 370. If instead the value of the variable X is confirmation (365) then a storage / writing cycle is performed relative to the first vector in which the elements of the first vector d to the one following the last one already determined to the last one explored, 380 are marked as deleted elements. Subsequently, the value of the variable X is set to the unconfirmed value, 380, and then returns to the execution of the main cycle on the first vector ( 305). If, on the other hand, the main cycle has ended, one proceeds to store / write, 385, relative to the second vector, that the elements from the one following the last confirmed one up to the last element of the vector are inserted elements.

La procedura sopra descritta consente di identificare in modo efficace gli elementi uguali dei due vettori, gli elementi eventualmente eliminati nel primo vettore e gli elementi eventualmente aggiunti nel secondo vettore. Tuttavia, la procedura suddetta potrebbe risultare non accurata nel caso che all’ interno di almeno uno dei due vettori ci fossero elementi che si ripetono identici. In questo caso, come già accennato, per superare tale problema il metodo della presente invenzione prevede che si proceda a eseguire secondo una modalità diversa la scomposizione in vettori dei due documenti di cui deve essere effettuato il confronto. In ultima istanza, qualora il problema delle righe ripetute si presentasse anche dopo aver trasformato i documenti in vettori con la seconda modalità di scomposizione si esegue per la terza volta la procedura di confronto dopo aver però eseguito un cosiddetto “sort”. Il sort, secondo il metodo dell’invenzione, consiste nel trasformare i due documenti in vettori secondo la prima o la seconda modalità di trasformazione, nell 'ordinare quindi, ad esempio in ordine alfanumerico, gli elementi di ciascun vettore, e nell 'eliminare le ripetizioni degli elementi uguali annotando separatamente per ogni elemento quante volte si presenta nel vettore. Dopo aver eseguito la suddetta operazione di sort viene di nuovo eseguita una procedura di confronto sostanzialmente uguale alla procedura 300 sopra descritta. Le differenze rispetto alla suddetta procedura riguardano la fase 320, in cui quando si memorizza/scrive l’elemento del primo vettore come eliminato se ne conteggia anche il numero di occorrenze e nella la fase 330, in cui prima di poter memorizzare/scrivere i due elementi come uguali la procedura prevede una fase di verifica della corrispondenza tra le occorrenze dei due elementi: se il numero di occorrenze nel primo vettore è maggiore del numero di occorrenze nel secondo vettore si hanno degli elementi eliminati nel primo vettore, se il numero di occorrenze nel primo vettore è inferiore al numero di occorrenze nel secondo vettore si hanno degli elementi aggiunti nel secondo vettore, se, infine, il numero di occorrenze è uguale allora anche i due elementi sono uguali. The procedure described above makes it possible to effectively identify the identical elements of the two vectors, the elements possibly eliminated in the first vector and the elements possibly added in the second vector. However, the aforementioned procedure may be inaccurate if there are elements within at least one of the two vectors that repeat identical. In this case, as already mentioned, in order to overcome this problem, the method of the present invention provides for the decomposition into vectors of the two documents to be compared in a different way. Ultimately, if the problem of repeated lines occurs even after transforming the documents into vectors with the second decomposition mode, the comparison procedure is performed for the third time after having performed a so-called "sort". The sort, according to the method of the invention, consists in transforming the two documents into vectors according to the first or second transformation modality, in ordering, for example in alphanumeric order, the elements of each vector, and in eliminating the repetitions of the same elements noting separately for each element how many times it occurs in the vector. After carrying out the aforementioned sort operation, a comparison procedure substantially identical to the procedure 300 described above is again performed. The differences with respect to the above procedure concern phase 320, in which when the element of the first vector is stored / written as deleted, the number of occurrences is also counted and in phase 330, in which, before being able to memorize / write the two elements as equal the procedure foresees a phase of verification of the correspondence between the occurrences of the two elements: if the number of occurrences in the first vector is greater than the number of occurrences in the second vector, there are elements eliminated in the first vector, if the number of occurrences in the first vector it is less than the number of occurrences in the second vector there are elements added in the second vector, if, finally, the number of occurrences is equal then the two elements are also equal.

Secondo un aspetto particolarmente vantaggioso della presente invenzione il confronto secondo le procedure sopra descritte viene eseguito tra i codici sorgente di un programma per elaboratore, ad esempio tra due versioni successive del codice sorgente in fase di sviluppo. Nel caso del codice sorgente è possibile creare delle banche dati in cui vengono memorizzati e classificati gli elementi fondamentali di un certo linguaggio di programmazione. In tal modo, i risultati del confronto eseguito secondo il metodo dell’invenzione possono essere incrociati con le informazioni contenute nelle suddette banche dati in modo da verificare quale tipologia di modifiche è stata apportata al codice sorgente tra una versione e la successiva. Ad esempio, si possono conteggiare il numero di scelte logiche, di elaborazioni cicliche o di altri elementi costitutivi del linguaggio che sono stati modificati, mentre si può non tener conto di modifiche, anche massive, di poca importanza quali il cambiamento di nome di una variabile o altro. Ai vari elementi del linguaggio di programmazione viene inoltre assegnato un peso che ne stabilisce la rilevanza. In tal modo si ottiene un vero e proprio punteggio delle modifiche apportate. According to a particularly advantageous aspect of the present invention, the comparison according to the procedures described above is performed between the source codes of a computer program, for example between two successive versions of the source code under development. In the case of the source code, it is possible to create databases in which the fundamental elements of a certain programming language are stored and classified. In this way, the results of the comparison performed according to the method of the invention can be crossed with the information contained in the aforementioned databases in order to verify what type of changes have been made to the source code between one version and the next. For example, it is possible to count the number of logical choices, cyclical elaborations or other constituent elements of the language that have been modified, while it is possible not to take into account changes, even massive, of little importance such as the change of name of a variable. or other. The various elements of the programming language are also assigned a weight that establishes their relevance. In this way you get a real score of the changes made.

Ovviamente varianti e modifiche possono essere apportate alle procedure di confronto di documenti testuali ed ulteriori applicazioni delle stesse possono essere elaborate, pur sempre rimanendo all’ interno dell’ambito di protezione definito dalle rivendicazioni seguenti. Obviously, variations and changes can be made to the comparison procedures of textual documents and further applications of the same can be processed, while still remaining within the scope of protection defined by the following claims.

Claims

CLAIMS 1. Method for comparing textual documents where the method includes: the generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document, the alpha-numerical ordering of the elements within said first vector and said second vector; Γ elimination of the repetitions of elements that occur several times in the vector with counting of their occurrences, the comparison of the obtained first vector with the obtained second vector to verify the correspondence of the related elements.

2. Method for comparing textual documents according to claim 1 wherein the method comprises further preventive steps of: generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document, the first comparison of the first vector with the second vector to verify the correspondence of elements of the first vector with elements of the second vector, the verification if in the first vector and in the second vector there are repeated elements in the same vector, and if there are repeated elements: the generation of a new first vector and a new second vector, the repetition of the aforementioned first comparison performed with said new first vector and said new second vector, the verification if in the new first vector and in the new second vector there are repeated elements in the same vector, and if there are repeated elements: the generation of a further new first vector and a further new second vector, a second comparison of the further new first vector with Γ further new second vector, writing the results of the aforementioned comparison; in which the generation of said further new first vector and further new second vector take place by eliminating the repetitions of the repeated elements of said new first vector and of said new second vector and noting their occurrences.

Method according to the preceding claim in which said generation of said new first vector and said new second vector takes place by breaking down said first and second textual document, respectively, into words to which the relative articles and prepositions are associated.

4. Method according to claim 2 wherein said generation of said new first vector and said new second vector take place by breaking down said first and second textual document, respectively, into source code lines, each of said lines being associated with the name of a routine in which it is contained.

Method according to one of the preceding claims in which said text documents are two versions of the source code of a computer program and in which the results of the comparison method are combined with databases containing information on constituent elements of the programming language in which they are written said source codes, with each of said constituent elements being associated with a weight or score, the result of the aforementioned crossing consists of an evaluation of the differences between said two versions of source code.

6. Method according to claim 1 wherein said generation of a first vector of textual elements and of a second vector of textual elements takes place with said first vector obtained from a source code of a software and said second vector obtained as a result of: the realization of a document a) comprising said source code, a progressive number being associated with each line of said document a); the execution of the aforementioned source code by means of a software test kit and, at each execution, the generation of a document c) in which information relating to the lines of the software source code executed and the related sequential number are stored; Γ union of all documents c) generated in a single document d); the generation of said second vector from said document d).

7. Means that can be read by an electronic computer, said means being characterized in that it contains a code which, when executed in an electronic computer, gives rise to a process which includes: the generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document, Γ alpha-numeric ordering of the elements within said first vector and said second vector; Γ elimination of repetitions of elements that occur several times in the vector with counting of their occurrences, the comparison of the obtained first vector with the obtained second vector to verify the correspondence with elements of the second vector.

8. Means which can be read by an electronic computer, said means being characterized in that it contains a code which when executed in an electronic computer gives rise to a process according to the preceding claim which comprises further preventive steps of: generation of a first vector of elements from a first textual document and of a second vector of elements from a second textual document, the first comparison of the first vector with the second vector to verify the correspondence of elements of the first vector with elements of the second vector, the verification if in the first vector and in the second vector there are repeated elements in the same vector, and if there are repeated elements: the generation of a new first vector and a new second vector, the repetition of the aforementioned first comparison performed with said new first vector and said new second vector, the verification if in the new first vector and in the new second vector there are repeated elements in the same vector, and if there are repeated elements: the generation of a further new first vector and a further new second vector, a second comparison of the further new first carrier with the further new second carrier, writing the results of the aforementioned comparison; in which the generation of the aforementioned further new first vector takes place by eliminating the repetitions of the repeated elements of said new first vector and noting their occurrences, and the generation of the aforementioned further new second vector takes place by eliminating the repetitions of the repeated elements of said new second vector and noting them occurrences.