WO2021110763A1

WO2021110763A1 - Computer-implemented method for allocating an accounting document to a pair of debtor/creditor accounts and the accounting entry

Info

Publication number: WO2021110763A1
Application number: PCT/EP2020/084308
Authority: WO
Inventors: François BLAYO
Original assignee: Neoinstinct Sa
Priority date: 2019-12-04
Filing date: 2020-12-02
Publication date: 2021-06-10
Also published as: CH716889A2; EP4070265A1

Abstract

The present invention relates to the field of computer-implemented methods for automatic allocation of accounting documents. More particularly, the present invention concerns a computer-implemented method by which an accounting document is automatically allocated to a pair of debtor/creditor accounts and by which the corresponding accounting entry is automatically effected in this pair of accounts. In particular, the present invention relates to a method for automatically allocating accounting documents based on a learning, based on an algorithm referred to as a self-organised map algorithm, which takes into account the past accounting entries.

Description

Computer-implemented method for the allocation of an accounting document to a couple of debit / credit accounts and the accounting entry

Technical Field of the Invention The present invention relates to the field of computer-implemented methods for the automatic allocation of accounting documents. More particularly, the present invention relates to a computer-implemented method by which an accounting document is automatically allocated to a pair of debtor / creditor accounts and by which the corresponding accounting entry is automatically made in this pair of accounts. In particular, the present invention relates to a method for the automatic allocation of accounting documents based on learning based on a so-called self-organized card algorithm which takes into account the accounting entries of the past. State of the art

The production of a balance sheet and a profit and loss account requires the processing of all accounting data such as invoices, bank statements, bank card statements, supporting documents. This processing requires manual processing to proceed with the allocation of accounts in accordance with each company's chart of accounts. This manual processing is tedious, a source of numerous errors and subject to interpretation. It is often carried out by accounting assistants under the supervision of a chartered accountant.

The keeping of accounts is, in the majority of countries, a legal obligation for any entity carrying out commercial operations. Correctly kept, the accounts represent a real tool of proof making it possible to justify and to follow up various operations. In legal matters, it is a means of proof admissible in court in the event of a dispute between traders. In tax matters, it makes it possible to avoid an automatic taxation procedure calculated on bases recalculated by the administration. Finally, accounting is an essential information tool for the benefit of third parties (managers, partners, employees, administrations and institutions, etc.).

The proper keeping of accounts requires systematically carrying out operations of recording information relating to the activity carried out by the entity: keeping a documentary evidence for each transaction carried out, classifying the supporting documents in chronological order of payment date or collection, keep a cash, post or bank book, with daily or weekly updates depending on the volume of business activities, record each transaction by entering the date, the number of the supporting document, the nature of the transaction (for example purchase, costs, collection, etc.), the corresponding amount and the balance of the day and finally regularly record bank movements (for example payments made by debtors or payments for the benefit of creditors).

An entity's chart of accounts is a list of account numbers where the various operations carried out by this entity are recorded. Each account number represents an item allowing the preparation of the balance sheet and the income statement. The chart of accounts is thus made up of account numbers and their descriptions. Account numbers are grouped into a number of account classes. Usually, transaction records are made on the basis of available information (invoices, bank statements, fees, etc.) in paper or electronic form. They are generally carried out manually by an accountant using accounting software which makes it easy to operate the company's chart of accounts. Keeping the accounts of an entity is an action that requires two phases: 1. the creation of the entity's chart of accounts and 2. the regular recording of all transactions. The creation of the chart of accounts is carried out during the creation of the entity and the chart of accounts is not normally modified thereafter. The recording of transactions is carried out continuously throughout the legal existence of the entity. This recording is done for the most part manually because it requires reading and interpretation of the information contained in the accounting documents as well as compliance with the rules issued by third parties (for example tax authorities, social organizations, banks, etc.).

It is important to stress that the vast majority of accounting documents are presented in an unstructured format (eg paper, fax, e-mail, office documents) and this particularly in the case of accounts payable.

When the number of transactions increases (for example for large entities), the processing time lengthens or the number of accountants increases to meet the deadlines imposed by law: produce a balance sheet and an income statement at least one times a year. In addition, the third parties concerned by the information contained in the balance sheet and the income statement do not have it available until late, at least 3 months after the end of the financial year. Balance sheets and income statements, supposed to be useful for decision-making, are thus only photographs of the past. In the current economic context characterized by an increasingly high speed of change, this delay becomes unacceptable.

The main difficulties encountered by accountants can be summed up in a few points: - The processing of large volumes of invoices from different suppliers or cost centers, in many models / models and various delivery formats - mail, fax, email, EDI ;

- Errors associated with manual data entry - it is often necessary to enter the same transaction several times, which increases the cost of the transaction;

- The increased complexity of data entry processes if they are linked to different cost centers and / or involve currency conversion; - Long delays due to manual routing, processing and validation procedures involving people from different departments and possibly different countries - early payment rebates are often lost and charges may be incurred for outgoing payments. delay;

- Inefficiency due to the time spent gathering physical documents to match invoices to purchase orders, delivery notes, contracts and records in the accounting system;

- The risk of loss or damage to documents, as documents are received at multiple points of entry and may be routed or classified incorrectly;

- Costs associated with the manual routing of documents, shipping, messaging, storage, archiving and auditing;

- The inability to easily control invoice processing and payments to match document flow to the accounting system.

Various measures have been proposed to facilitate the flow of accounting data exchange processes and thus help reduce processing time. Digital storage improves the retrieval capability of this data and increases security during audits and controls.

Nevertheless, the interpretation of the data remains for the most part carried out manually in order to assign the allocation of an accounting document to a pair of debtor / credit accounts and to make the corresponding accounting entry for the accounting entry.

In order to circumvent the problems mentioned above, several solutions are known from the prior art. For example, in US Pat. No. 8,126,785 B2 the accounting data is classified in order to facilitate processing and transaction management. Database-based rules are implemented to classify transaction data into accounting categories. Accounting information is processed according to data-driven rules and classified accordingly. This approach involves the identification of particular database-based rules to be applied to accounting information, rule application and processing. The main difficulty with this solution is to find the rules manually so that you can then apply them automatically.

In US patent application 5,117,356, the automation of the production of the account balance is discussed. An automated system for keeping the General Ledger accounts, that is to say all the accounts receivable and the accounts payable of the chart of accounts organized in a tree structure in "T", provides the balances to the minute in all the accounts. General Ledger accounts whenever data relating to a completed transaction is entered. Compliance with user-definable accounting procedures is ensured by the use of an accounting control table that contains symbolic codes used by document-keeping procedures to authorize and control the creation and updating of accounting files. general ledger account data and transaction records. In this patent, the automatic assignment of accounts from postings is not addressed. This phase remains to be carried out manually.

In patent application US 2005/0055289 A1, an accounting software engine accepts the input of user-defined company specific textual or digital information and conventional financial accounting data and converts them into computer data multidimensional on a star schema indexed as a journal entry. Upon user request, the journal entry is analyzed using a relational methodology or other database methodology to generate conventional accounting statements such as General Ledger, Balance Sheet, Cash Flow Statements, Statements profit and loss and earned income statements for user-specified time periods.

Multidimensional business information accounting software engine can also generate data accounting reports broken down into a range of company-specific digital information or accounting statements relevant to a company-specific subject, specific information for a specified period of time, providing the user with the sensitivity of different specific information on the accounting statements, so that the user can change their business practices in real time. In this patent, the main development relates to the way of coding the accounting information. Typically, accounting information entered into an accounting system is typically coded in three places: in one journal and in two general ledger receivables / payables when referring to double-entry accounting. In this patent application, the accounting information is no longer coded in the credit and debit accounts, it is only coded in the journal. The latter becomes the sole source of information for the establishment of the accounts. As such, the issue of debtor / creditor allocation does not arise and is therefore not addressed in this document.

US Patent 6330545 B1 proposes an activity-based information accounting method and system. The activity information accounting system records a table of account titles corresponding to the activity information and performs the accounting procedures based on the entered activity information and an account title corresponding to the entered activity information. . The accounting system displays the types of system activities, including buying and acquiring activities, selling and revenue activities, spending activities, investing and financing activities, and trading activities. production. If a user selects one of the activity types displayed, the accounting system displays a screen allowing the user to enter activity information for the selected activity type. The accounting system determines whether the information of the input activity is an internal activity or an external activity, executes the accounting procedures on the basis of the determined result and the table of titles of the accounts. The accounting method and system allows those who are not trained in the field of accounting to prepare accounting reports by simply entering information on business activity. Accounting reports prepared include balance sheets, income statements, cash flow statements and other accounting reports that provide different measures of value and overall business performance, without having to go through the complex process of journal entries. The disclosed accounting method and system does not provide financial information. The disclosed accounting method and system does not provide financial information combined with business information since it only collects information on activities entered by the user.

There is therefore a need for a computer-implemented method which allows, in a safe and rapid manner, the automatic allocation of an accounting document to a pair of debit / credit accounts and to automatically carry out the corresponding accounting entry.

Summary of the invention

An aim of the present invention is therefore to propose a method implemented by computer for the automatic allocation of a pair of debit / credit accounts to a new accounting document and for the accounting entry associated with this accounting document making it possible to overcome the limitations mentioned previously.

According to the invention, these aims are achieved through the objects of the independent claim. More specific aspects of the present invention are set out in the dependent claims as well as in the description.

More specifically, an object of the invention is achieved by virtue of a computer-implemented method for the automatic allocation of an accounting document to a pair of debit / credit accounts of a chart of accounts and for the entry. accountant associated with this accounting document comprising the steps: a. Extraction of "description" information corresponding to a descriptive text as well as "accounting" information corresponding to the pair of debtor / credit accounts for each of a number Z of entries accountants previously recorded in one or more General ledgers; b. Transformation of the “description” information from an alphanumeric format into a digital value of dimension K by means of a vectorization function of words F for each of the Z extracted digital writings; vs. Allocation of a vector V of dimension K + M to each of the Z accounting entries extracted, the vector V comprising the K digital values of the "description" information transformed by the function F and M distances between the debit account and the credit account associated with the accounting entry and all the other accounts of the chart of accounts, the distance between two accounts being defined as the number of jumps that must be made in the tree of the chart of accounts to join two accounts by the shortest route; d. Learning of a self-organized map C on the basis of vectors V, each node of the self-organized map C corresponding to a weight vector P of dimension K + M; e. Extraction of "description" information and "amount" information from a new accounting document; f. Transformation of the alphanumeric value of the “description” information of the new accounting document into a digital value of dimension K by means of the vectorization function of F words; g. Allocation of a vector W of dimension K and comprising the digital values of the “description” information transformed by the function F; h. Determination of a BMU vector of the map C as being the weight vector P closest to W; i. Extraction of the debit account Dmin corresponding to the smallest component among the components M of the vector BMU correspond to the components of the accounts receivable; j. Extraction of the credit account Cmin corresponding to the smallest component among the components M of the vector BMU correspond to the components of the accounts payable; k. Allocation to the new accounting document the pair of accounts receivable / creditor Dmin / Cmin; and

L. Writing the "amount" information extracted from the new accounting document in the debit account and the credit account assigned to the accounting document.

Thanks to the method of the present invention, it is possible to automatically allocate a pair of debit / credit accounts to a new accounting document on the basis of the past accounting entries. In addition, after determining the couple of accounts receivable / payable for the new accounting document, the complete accounting entry can be made automatically. This method not only saves time by automating account allocation, but above all avoids errors. In fact, the information extracted from the new accounting document is extracted and objectified before being compared with the corresponding information from past accounting entries which have been organized in a self-organized card. Thus, many of the account allocation errors that often occur and which are due to an incorrect subjective interpretation of the information contained in the accounting records can be avoided. The allocation of the new accounting document to a pair of accounts is therefore made on the basis of objective information and not, as in the methods normally used, on subjective information.

It should be noted that the extraction of “description” information from the accounting entries recorded in one or more General ledgers can easily be done by means known to those skilled in the art. In particular, this can be done in an automated manner if the Ledgers of previous accounting years exist in electronic form, for example in accounting software. However, it is important to note that this method can be implemented also if this information exists only on paper. In this case, the information can be entered by hand into data processing software or it can be automatically extracted from scanned copies of Ledgers. Similarly, the "description" and "amount" information of the new accounting document can be entered manually into software implementing the present invention or extracted from scanned copies.

In addition, it is important to also note that this method does not depend on the format of the chart of accounts on which the Ledgers of previous accounting years are based. It is sufficient that it is possible to define the concept of distance between accounts in order to be able to implement this method.

Learning the self-organized map C in step d. can be implemented anywhere unsupervised learning algorithm known from the prior art, such as for example, the so-called Kohonen algorithm.

In such an algorithm, the self-organized map C is composed of a grid of low-dimensional neurons. When the grid is one-dimensional, each neuron has two neighbors. When the grid is two-dimensional, the arrangement of neurons is done in a rectangular way where each neuron has 4 neighbors (rectangular topology) or in a hexagonal way where each neuron has 6 neighbors (hexagonal topology). Neurons are recognized by their number and their location on the grid.

The data vectors V are projected from their initial space, or input space, to the card or output space. Each neuron on the map is associated with a referent vector, also called a prototype or prototype vector, belonging to the input space. By denoting by P the total number of neurons in the map, the referent vector of the neuron p of dimension N is denoted by: Wp with p G {1, ..., P} and w _p ER ^N

The goal of map training is to update the referent vectors to best approximate the distribution of the input vectors while reproducing the self-organization of the map neurons. The card is learned in sequential mode, also called incremental, or in deferred mode (batch).

Each iteration t of sequential learning comprises two stages. The first step is to randomly choose an observation x (t) from the set W, and present it to the network in order to determine its winning neuron. The winning neuron (Best Matching Unit) of an observation is the neuron whose referent vector is closest to it within a given distance (eg: Euclidean distance). If c is the winning neuron of the vector x (t), c is determined as follows:

In the second step, the winning neuron is activated. Its referent vector is updated to approximate the input vector presented to the network. This update does not only concern the winning neuron as in the methods of competitive learning (Winner take ail), but also the neurons which are neighboring to it and which then see their referent vectors adjust towards this vector of Entrance. The magnitude of this adjustment is determined by the value of a learning step a (t) and the value of a neighborhood function h (t).

The parameter a (t) regulates the speed of learning. It is initialized with a large value at the beginning then decreases with the iterations in order to slow down as the learning process progresses. The function h (t) defines membership in the neighborhood. It depends both on the location of the neurons on the map and on a certain neighborhood radius. In the first iterations, the neighborhood radius is large enough to update a large number of neighboring neurons to the winning neuron, but this radius gradually narrows to contain only the winning neuron with its neighbors. immediate, or even the winning neuron only. The rule for updating the referent vectors is as follows:

where c is the winning neuron of the input vector x (t) presented to the network at iteration t and h is the neighborhood function which defines the proximity between neurons c and p.

A neighborhood function between the winning neuron c and a neuron p of the map is equal to 1 if the neuron p is located inside the square centered on the neuron c and 0 in the other cases. The radius of this square is called the neighborhood radius. It is wide at the start, then narrows with iterations to contain only neuron c with its immediate neighbors at the end of learning or even just neuron c. A more flexible and common neighborhood function is the Gaussian function defined below:

where r _c and r _p are respectively the location of neuron c and neuron p on the map, and o (t) is the radius of the neighborhood at iteration t of the learning process.

With such a neighborhood function, the amplitude of the adjustment is graduated according to the distance from the victorious neuron which reserves the maximum amplitude for itself. The result of this unsupervised learning is the nonlinear projection of all observations on the map. Each observation is attributed to its winning neuron. In addition to the quantification task, this projection preserves the topology of the data through the use of the neighborhood function. Two neighboring neurons on the map will represent nearby observations in the data space.

A variant of learning is said to be “in deferred mode”. In deferred mode, at each iteration t, all the observations are presented to the network and the updating of the prototype vectors is done by taking into account all the observations of the dataset. Each prototype vector is a weighted average of the observation vectors

(xi, i e {1,..., n}) when the square of the Euclidean distance is used for the computation of the winning neuron, the corresponding weights being the values of the neighborhood function h (t).

The rule for updating prototype vectors is given by:

where h is the value of the neighborhood function between the winning neuron a of vector x, and the neuron p.

The updating of the prototype vectors can be formulated in another way by using the fact that the observations which have the same victorious neuron have the same value for the neighborhood function and belong to the Voronoi region whose center is their victorious neuron:

where ni is the number of observations belonging to the Voronoi region represented by the neuron / and x _/ is the average of the observations of this same region. Towards the end of the training, when the neighborhood radius becomes too small to activate only the victorious neuron, each prototype vector constitutes the center of gravity of the observations that it represents and we then fall back on the algorithm of mobile centers, which guarantees a better approximation of the density function of the observations. Moreover, with the absence of the learning step, this algorithm does not present convergence problems. However, the deferred mode could cause twists in large maps. For this reason, a principal component analysis is carried out to initialize the prototype vectors.

Advantageously, the self-organized map is a two-dimensional or three-dimensional map. There are several ways to initialize the C card prior to the actual learning procedure. For example, a first initialization method consists in assigning an initial weight vector P to each node of the self-organized map C. This initial allocation of the weight vectors can for example be a random allocation of a number to each scalar vector of the weight vectors, without stimulation. The term "random" refers to equal probability for any of a set of possible outcomes. The numerical value of these randomly assigned scalar values can be approximately limited to the lower and upper bound by the corresponding extrema observed in the training vectors, i.e. the V vectors. Another method of initialization weight vectors P includes a systematic variation, for example a linear variation, in the range of each dimension of each weight vector to approximately intersect the corresponding range observed in the training vectors V. In another initialization method, the weight vectors are initialized by the values of the vectors ordered along a two-dimensional subspace crossed by the two main eigenvectors of the training vectors V obtained by orthogonalization methods well known in the art, for example by the so-called Gram-Schmidt orthogonalization. In another initialization procedure, the initial values are fixed on samples chosen at random from the training vectors V.

The determination of the BMU vector of the self-organized map C can be carried out according to several criteria well known to those skilled in the art. This can for example be done on the basis of a distance for example the minimum Euclidean distance between all the weight vectors P of the self-organized map C and the vector W. Other methods can for the determination of BMU such as those using the correlation between vectors which has the advantage of offering more robustness to the offset between vectors, the angular difference between vectors which offers the advantage of emphasizing the mutual length of the vectors as long as the information is carried by these quantities, the Minkowsky distance measure which is a generalization of the Euclidean distance measure and which is advantageous when the vectors carry data of a qualitative nature can also be implemented.

In a preferred embodiment of the present invention, the distance between two accounts is determined by the number of hops that must be made in the chart of accounts tree to join two accounts by the shortest route. This first allows you to define a distance that can be measured between two accounts and to easily and quickly determine the distance between two accounts regardless of the exact organization of the underlying chart of accounts.

In another preferred embodiment of the present invention, determining the BMU vector in step h. is performed on the basis of a Euclidean distance measurement between the BMU vector and the W vector. The Euclidean distance between vectors is a measurement that can be determined very quickly regardless of the dimension of the self-organized map C which allows rapid implementation of this method and therefore also rapid allocation of the new accounting document. In addition, determining an Euclidean distance between two vectors requires only few computational resources. It can therefore be done on ordinary desktop computers.

In a following preferred embodiment of the present invention, the self-organized card C is updated after each new accounting entry, after every tenth new accounting entry or after every hundredth new accounting entry. This allows the self-organized card C and therefore the allocation of accounts for new accounting documents to improve on the basis of the new accounting documents which have already been allocated. The present method therefore becomes “self-learning”. The more the number of accounting documents processed by this method increases, the more precise it becomes. In another preferred embodiment of the present invention, step b. is carried out using a so-called hash function F. This function is used to manage categorical variables, that is to say variables which do not have a natural numerical representation. The hash function offers a solution to convert categorical variables into numeric variables. There are several ways to do this. The “Label encoding” where we choose an arbitrary number for each category. The "1 among N" encoding where we create a binary column per category. The “Hasking trick” where we find a small-dimensional subspace that corresponds to the data. The "optimal binning" when relying on learners such as LightGBM or

CatBoost. The "Target coding" where we calculate the average of the target value by category.

Each of these methods has its advantages and disadvantages, and it usually depends on the data and the needs. If a variable has many categories, a "1 of N" coding scheme will produce multiple columns that can cause memory problems. The "Hashing trick" is an effective solution but requires the adjustment of several parameters.

Thanks to a hash method, it is easy to transform alphanumeric information into digital information. In addition, the specific hash function can be chosen in relation to the number of data included in the past accounting entries. If the number of words corresponding to the “description” information is small, it is sufficient for the hash function to transform these words into a numerical value with a small number of digits, for example 128 or 256 digits. On the other hand, if the number of words is important, it is preferable that the hash function vectorizes the words in numerical values with at least 512, 1024 or 2048 digits.

In a following preferred embodiment of the present invention, the “description” and “amount” information is extracted from a scanned copy of the new accounting document. This makes it possible to achieve an allocation of the new fully automated accounting document. Brief description of the drawings

The peculiarities and advantages of the present invention will appear in more detail in the context of the description which follows with an exemplary embodiment given by way of illustration and not by way of limitation with reference to the eleven appended drawings which represent:

- Figure 1 shows a functional diagram of a method according to an embodiment of the present invention;

- Figure 2 shows a functional diagram of an exemplary implementation of the step of extracting past accounting entries; - Figure 3 shows a diagram of a transaction processor used in the embodiment of the present invention;

- Figure 4 shows an example of the "Balance sheet" part of a chart of accounts;

- Figures 5a and 5b show an example of the "Income statement" part of a chart of accounts;

- Figure 6 illustrates the distance between two accounts;

- Figure 7 illustrates the learning step of the self-organized card C;

- Figure 8 shows a functional diagram of the adaptation of the self-organized map;

FIG. 9 illustrates the two subgroups of components of the weight vectors P of the self-organized map;

FIG. 10 illustrates the determination of the BMU vector corresponding to the new accounting document; and FIG. 11 illustrates the determination of the couple of accounts receivable / payable on the basis of the vector BMU.

Detailed description of an embodiment

The invention presented here consists of allowing an allocation of accounts receivable and payable from data extracted from accounting documents and in particular from the descriptions associated with each document. This approach requires the modeling of the allocations made by the accountant during the fiscal years preceding the current fiscal year. The principle of the invention is therefore based on the observation that the accounting entries made by an entity are repeated from year to year. Each entry linked to an accounting document causes the movement of two accounts: the debit account and the credit account or pair of debit accounts / accounts payable. This is called allocating an accounting document to a couple of accounts.

Each allocation results from the analysis, by the accountant, of the so-called "accounting" information contained in the accounting documents (customer, supplier, individual amount per item, total amount, VAT, VAT rate, etc.) but also of elements “Contextual” relating to the accounting document (descriptions, seller, comments, etc.). All of this information constitutes a multidimensional contextual space in which each part can be represented by a point or a vector.

The present invention is based on the idea that it is possible to map the allocations of the debtor / creditors pairs made in the past and to use them to automatically determine the allocation of the debtor / creditors pairs for the new accounting documents. As will be explained in detail below, the mapping relies on a non-linear projection of points from the multidimensional contextual space to a 2-dimensional map.

Figure 1 shows an embodiment of the present invention. The computer-implemented method for the automatic allocation of a accounting document to a pair of accounts receivable / accounts payable according to this embodiment comprises the following steps:

110: Extraction of accounting entries made in the past;

120: Conversion of each accounting entry made into a vector V of a contextual space;

130: Learning of a self-organized map C on the basis of vectors V of past accounting entries;

140: Extraction of the "description" and "amount" information from a new accounting document; 150: Conversion of the "description" information of the new accounting document into a vector W;

160: Determination of the BMU vector of the self-organized map closest to the vector W;

170: Extraction on the basis of the components of the BMU vector of the closest accounts receivable and payable;

180: Allocation to the new document of the pair of debtor / creditors extracted in the previous step;

190: Writing of the “amount” information of the new accounting document in the allocated debit and credit accounts. It is obvious that these steps can be implemented in a different order. It is for example quite possible to carry out the learning of the self-organized card C on the basis of the accounting entries made before extracting the information from the new accounting document. An exemplary embodiment of step 110 of FIG. 1, that is to say the extraction of accounting entries made in the past is illustrated in FIG. 2. This extraction can be done for example on the basis of the information. contained in the entity's accounting software. The accounting entries to be extracted are for example contained in the General Ledger of the accounts for previous years. As illustrated in FIG. 2, the accounting entries 220 are read and analyzed by a transaction processor 230. The latter searches for the data of the accounting entries containing the date of the entry, the debit account number assigned to the entry, the credit account number assigned to the entry, the description of the entry and the amount. The transaction processor 230 writes the data corresponding to its analysis in a database 240 which can be structured as shown in Table 1. At the end of the processing of extracting the accounting data from the previous years, the database 240 contains all the data corresponding to the entries for the years chosen and taken from the accounting software. Of course, it is possible to implement step 110 in another manner known to a person skilled in the art. It is for example possible to manually introduce into the database 240 the writings made in the past. It is thus possible to implement the present invention even if the accounting of the entity has hitherto been carried out on paper. It would also be possible to scan the paper accounts and extract from the scanned documents the data necessary for the establishment of the database 240.

Table 1: example of accounting data extracted from past accounting entries

Accounting data extracted from postings made in the past must be converted to facilitate their understanding by an additional system. This operation is carried out for example by the transaction processor 230 shown in FIG. 3. As illustrated in this figure, the transaction processor 230 is advantageously composed of two mechanisms which perform the conversion of the account numbers into distances between accounts (step 231) and the conversion of the words contained in the “description” field into digital values by a hash function (step 232) . The analysis of the professional practice of accountants has highlighted the fact that the latter allocate an accounting document to a pair of accounts receivable / accounts payable on the basis of contextual information contained in the accounting document. This information is found in the entries that were made in the past. They are found specifically in the account allocations as well as in the “description of the operation” field normally filled in by the accountant.

The allocation of accounts carried out by the accountant is a numerical datum which designates an account in a chart of accounts. By referring for example to the standard Swiss chart of accounts, the account “6200” refers to vehicle and transport charges. This digital data does not make sense as such. It actually designates the node of a tree whose root is the chart of accounts and which is divided into two branches: the balance sheet and the income statement or cash flow. The chart of accounts can be represented in the form of a tree as shown in Figures 4, 5a and 5b. The concept of distance between the accounts of the chart of accounts is illustrated in figure 6. The distance between two accounts of the chart of accounts is defined as the number of jumps that must be made in the tree to join these accounts by the shortest path. . For example, the distance between the accounts "1022-Bank 2" and "6281-Transport costs" is worth 10 because it is necessary to perform 10 jumps to establish the shortest path between these accounts in the tree of the chart of accounts. For each General Ledger account, it is thus possible to determine all the distances that separate it from the other accounts in the chart of accounts tree. For example, the distances which separate the account “6281 -Transport costs” from all the other accounts in the chart of accounts are reported in table 2.

Table 2: example of an extract of the distances between an account in the chart of accounts and all the other accounts

The set of distances separating all the general ledger accounts of an entity can be evaluated once and are stored in a database in the form of a table with NxN entries, N corresponding to the number of "sheets" of the chart of accounts tree. It should be noted that only the sheets of the chart of accounts are useful for the construction of this table. The other nodes of the tree are only used to calculate the shortest path between the pairs of leaves. In the context of a concrete example of an implementation of the present invention on the basis of the standard Swiss chart of accounts, the table of distances between accounts therefore comprises 229 X 229 entries.

The transaction description data, which has been extracted from past accounting entries, contains information that can be used to model the distribution of transactions. “Distribution of transactions” is understood to mean the statistical distribution of transactions with regard to accounting codes. Each operation is a point in a multidimensional space and the set of operations is thus a point cloud in this space constituting a statistical distribution.

To this end, the “description” data item for each past entry extracted is converted into a contextual space. The contextual space is composed of all the words of a vocabulary extracted from the available descriptions. The principle consists in representing each description of an operation as a vector in a Euclidean space of dimension t where each dimension corresponds to a word of the vocabulary. By way of example, part of the vocabulary extracted in the “description of the operation” field from the accounting histories is illustrated in table 3.

Table 3: extract of the vocabulary from the "description" information of past accounting entries

In order to be able to process these words as digital quantities which will be used by the learning algorithm of the present invention, the alphanumeric values of these words are transformed by a method known as “vectorization” of words. This transformation can for example be carried out by means of a hash function F. This technique is known as “hashing vectorization” or “hashing tricks”. The hash function F takes as input all the words of the vocabulary resulting from the descriptions associated with all the scripts and transforms them into a numerical value. In the context of a concrete example of implementation of the present invention, the number of different numerical values to represent all the available descriptions is fixed at 1024. This choice makes it possible to fix the length of the vector which thus becomes independent of the number of words used for the description of each entry. At the end of the process of restating the accounting data for previous years, each accounting operation can therefore be represented by 1482 values organized in a vector V according to the structure illustrated in Table 4. These data are stored in the database 240 represented. in figure 2. The database will therefore contain Z rows of 1482 values, where Z is the number of entries available for year N.

Table 4: example structure of accounting data after restatement

As illustrated in FIG. 1, a mapping of the classes of the accounting entries from the accounting entries made in the past is established in step 130. The data stored in the database 240 and for example organized according to Table 4 are used. to feed an automatic classification system for accounting entries which is based on an algorithm known as “self-organizing maps (SOM)” or “auto-organized map” (see figure 7).

As used in the context of the present invention, the self-organized map C (250 in Figure 7), refers to a technique of grouping and representation of the result, a technique which groups data into classes of such type. so that similar data is usually grouped together in the same class while dissimilar data is not.

The terms "nearer", "nearer", "near" and terms of similar importance, in this context, refer to literal proximity in a self-organizing map. Minor variations in the positioning of data including self-organizing can be tolerated without departing from the underlying description of the self-organizing map as provided herein and in references cited herein and known herein. 'state of the art. The self-organized, first-stated map is a neural network model capable of projecting large-dimensional input data (i.e., multivariate data vectors) onto a smaller-dimensional array, usually at two dimensions. This projection produces a lower dimensional representation which is useful for detecting and analyzing the characteristics of the higher dimensional input space. The term "dimension" in the context of a multivariate data vector refers to the length of the data vector, so each of its multiple variables describes a single dimension. For example, a dimension can refer to a distance between accounts, possibly standardized. The term "dimensional" in the context of a representation (eg, a visual representation) refers to one, two or three dimensional presentations generally used to provide information to a human.

In the context of the present invention, the self-organized C makes it possible to organize the multidimensional data of the database 240, that is to say the Z writes of the past in the form of vectors V with their 1482 components and of represent these writings in the form of a two-dimensional map in which the “similar” writings are grouped together in classes which are represented in the self-organized C by the nodes of this map. The construction of the self-organized map by means of the “self-organized” algorithm can for example follow the steps described in FIG. 8. Initially, an initial weight vector P (step 810) is assigned to each node of the self-organized map. Many methods of initial allocation of the weight vectors P are known to those skilled in the art, including the random allocation of a number to each scalar vector of the weight vectors, without stimulation. The term "random" refers to equal probability for any of a set of possible outcomes. The numerical value of these randomly assigned scalar values can be approximately limited to the lower and upper bound by the corresponding extrema observed in the training vectors, here the V vectors. Another method of initializing the weight vectors includes a systematic (e.g. linear) variation in the range of each dimension of each weight vector P to approximately intersect the corresponding range observed in the training vectors. In another initialization method, the weight vectors P are initialized by the values of the vectors ordered along a two-dimensional subspace crossed by the two principal eigenvectors of the training vectors obtained by orthogonalization methods well known in the art (for example, Gram-Schmidt orthogonalization). In another initialization procedure, the initial values are fixed on samples chosen at random from the training base. In the context of the concrete example of the implementation of the present invention, at each node of the self-organized map is assigned during step 810 a weight vector P of dimension 1482, whose values of the 1482 components can initially assigned in different ways, preferably randomly.

In step 820, a training vector V, that is to say one of the rows of the database 240, is selected. The selection can be random or systematic, preferably random. When a training vector is selected, the Euclidean distance between the selected training vector and each weight vector P associated with the nodes of the self-organized map C is calculated.

In step 830, the weight vector P, and therefore the corresponding node of the self-organized map C, having the smallest Euclidean distance is defined as being “the unit which corresponds best” (Best Matching Unit or BMU). Once a corresponding BMU vector is identified, the neighborhood of this BMU vector, i.e. the values assigned to neighboring nodes, is optionally scaled (step 840) by methods well known in 'art.

In step 850, it is decided whether to repeat steps 820-840 or to terminate the construction of the self-organized map C. This decision is based on whether or not a predefined convergence criterion is met. The term "convergence criterion", in the context of the construction of the self-organized map C, refers to any of the following corresponding to a variety of metrics available to those skilled in the art. These criteria include for example an absolute iteration limit (e.g. 100, 200, 500, 1000, 2000, 5000 or even more), a change in the Euclidean distance between the chosen training vector V and each weight vector. P from card C (e.g., 100, 10, 1, 0.1, 0.01, 0.01, 0.001 and even less), a relatively large change in the Euclidean distance between the chosen training vector V and each weight vector P card C (e.g. 10%, 1%, 0.1%, 0.1%, 0.01% and even less), or any of these criteria coupled in addition to a requirement of the number minimum selection of training vectors V (eg 1, 2, 3, 4, 5, 10, 20, 50, 100 or more). Once convergence is reached, the learning procedure for the self-organized card C ends (step 860).

Once the learning of the self-organized card C has been carried out on the basis of the database 240 which contains the accounting entries of the past, this card C can be used to automatically allocate a pair of debit / credit accounts to a bank. new accounting document. Indeed, at the end of the learning phase of the card C, each weight vector P, that is to say each node of the self-organized card C, codes a pair of a debit account and a credit account and their distances from all other accounts in the chart of accounts. In the context of the concrete example of the implementation of the present invention, each node of the self-organized map C codes 1024 values corresponding to a hash vector of the description data as well as 2 x 229 values corresponding to the distances between them. accounts. From a mathematical point of view, each node can therefore be represented by a vector of dimension 1482. From a mathematical point of view, if the distance measure used to organize the self-organized map C is a Euclidean distance, each weight vector P associated with a node of the map C groups together all the values located in its Voronoi hypercube.

During the phase of use of the self-organized map C, the weight vectors P associated with the nodes are no longer modified. The use of the card for the automatic allocation of a new accounting document consists, on the basis of the “description” information extracted from this document in step 140 (see FIG. 1), in determining what the torque is. debit / credit accounts that should be associated with it based on what was previously assigned by the accountant. This step corresponds to steps 160 and 170 of FIG. 1. FIG. 1 details the steps 150 to 180 required to obtain the class, that is to say the corresponding node of the self-organized card C, of a new accounting document. For a new accounting document, the “description” information can come from the accounting document itself if it is identifiable as such in a digital file, from a metadata associated with the accounting document, from an additional statement to the accounting document added by a manual operator, an automatic analysis system capable of extracting information from the accounting document and in general any system capable of giving contextual information coded in digital form in relation to the accounting document and able to help in its interpretation. In other words, the “description” information in alphanumeric form extracted from the accounting document is transformed into a numerical value thanks to the same hash function F which was used to create the database 240. By this means, for each new accounting document a vector of numerical values W is created. In the context of the concrete example of implementation of the present invention, for each new accounting document, a vector W of dimension 1024 is created on the basis of the “description” information extracted from this document.

The description of the new accounting document, transformed into a vector W of numerical values, is provided as data of the step during which we use the self-organized map C to obtain the class (the node of the map) for the new document accounting. The values of each node of the self-organized map correspond to two subsets of information: those which correspond to a new description and those which correspond to the two distances associated with the description, as shown in figure 9. In d In other words, the weight vector associated with each node of the map can be subdivided into a first subset of “coordinates” which represent the “description” information and a second subset which corresponds to the distances between the counts. In the context of the concrete example of the implementation of the present invention, the first subset contains 1024 numeric values and the second subset 458 (2x229) numeric values.

Referring to figure 9, the subset corresponding to W ^ ^sc carries information on the description of the accounting document in the form of 1024 numerical values. The subset comprising w ^ ^edlt and wf _j ^eblt carry the information relating to the distances between accounts coded by node j of the map.

In the use phase of the self-organized map, only a vector W of dimension equal to the dimension of the sub-assembly W _t f ^esc is available after the extraction and transformation of the "description" information of the new part. accounting. This vector W which, in the context of the concrete example of implementation of the present invention, has a dimension equal to 1024, is represented in FIG. 10 by W _{1 1Q24.}

The determination of the node (or of the class) of the self-organized map C to which the new accounting document must be allocated is carried out using an algorithm which determines the Euclidean distance between the input vector x _{1 1024} and the components W} ^esc of the weight vector P associated with each node of the map. The weight vector P, and therefore the corresponding node, having the smallest Euclidean distance with the input vector x _{1 1024} is declared BMU as illustrated in figure 11. The corresponding map node is therefore assigned to the new one. accounting document.

Once a BMU vector is identified, the components w ^ ^edlt and Wfj ^eblt of BMU which therefore correspond to the distances to the accounts payable respectively debit accounts are extracted. Once these component values have been extracted, it is necessary to determine the accounts which correspond to them, which allows this pair of accounts to be allocated to the new accounting document.

It is important to note that during the learning phase of the self-organized map, the values w ^ ^edlt and wf _j ^eblt were adapted according to the account distances associated with each entry. Recall that according to Table 4, which illustrates the structure of the past accounting data extracted and after restatement, the entries corresponding to the distances are all different from zero except the one which corresponds exactly to the allocation account of the credit or debit entry. Nevertheless, as a consequence of learning the self-organized map C, the distance values do not necessarily contain components with a value equal to 0.

To determine the index of the components of the subset w ^d j ^eblt and Wif ^edlt that we are going to choose in order to extract the couple of accounts receivable / payable, we are going to look for the smallest value of the distances among those corresponding to the credit components and BMU debtors. This will immediately give us the index of the accounts to use and this independently for the debit part and the credit part.

Table 5 shows an example of the values of the Wif ^{edlt subset} in the context of the concrete example of the implementation of the present invention. In order to determine which credit account should be allocated to the new accounting document, it is sufficient to determine the account which has the smallest distance. In this concrete example, the corresponding account is account "5274". Similarly, the debit account can be determined from the distances of the subset wf _j ^eblt corresponding to the node of the card identified. At the end of this process, two accounts are therefore obtained which exactly determine the pair of debit / credit accounts in which we are going to write the “amount” value of table 1, which corresponds to the last step 190 of FIG. 1.

Table 5: example of the distances extracted from the BMU vector for the new accounting document

Advantageously, the present method is implemented using a computer program to perform operations on aspects of the present invention which can be written in any combination of one or more programming languages, including a programming language. object oriented such as Java, Python, C ++, or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. Program code can be run entirely on the user's computer, partially on the user's computer, as a stand-alone software package, in part on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be established to an external computer (for example, through the Internet function using an Internet service provider).

The computer running the program will consist of at least a standard processor (CPU) with its RAM memory of at least 30Giga bytes, a hard disk with a minimum capacity of 1Tera Byte. It could also be composed of a processor to execute several threads simultaneously

(multithreaded). Finally, it can be added hardware acceleration cards such as GPUs (graphie processor Units), TPUs (Tensor Processing Units) and in general any hardware acceleration device available on the market such as RTX2060, RTX 2070, GTX 1070. It is obvious that the present invention is subject to many variations as to its implementation. Although a non-limiting embodiment has been described by way of example, it will be understood that it is not conceivable to identify exhaustively all the possible variations. It is of course conceivable to replace a means described by an equivalent means without departing from the scope of the present invention. All these modifications are part of the common knowledge of a person skilled in the art in the technical field of the present invention.

Claims

1. Method implemented by computer for the automatic allocation of an accounting document to a pair of accounts receivable / creditor of a chart of accounts and for the accounting entry associated with this accounting document comprising the steps: a. Extraction of "description" information corresponding to a descriptive text as well as "accounting" information corresponding to the pair of accounts receivable / creditor for each of a number Z of accounting entries previously recorded in one or more General ledgers ; b. Transformation of the "description" information from an alphanumeric format into a digital value of dimension K by means of a vectorization function of F words for each of the Z extracted digital scripts; vs. Allocation of a vector V of dimension K + M to each of the Z accounting entries extracted, the vector V comprising the K digital values of the "description" information transformed by the function F and M distances between the debit account and the credit account associated with the accounting entry and all other accounts in the chart of accounts. d. Learning of a self-organized map C on the basis of vectors V, each node of the self-organized map C corresponding to a weight vector P of dimension K + M; e. Extraction of "description" information and "amount" information from a new accounting document; f. Transformation of the alphanumeric value of information

“Description” of the new accounting document in a digital value of dimension K by means of the vectorization function of F words; g. Allocation of a vector W of dimension K and comprising the digital values of the “description” information transformed by the function F; h. Determination of a BMU vector of the self-organized map C as being the weight vector P closest to W; i. Extraction of the debit account Dmin corresponding to the smallest component among the components M of the vector BMU correspond to the components of the accounts receivable; j. Extraction of the credit account Cmin corresponding to the smallest component among the components M of the vector BMU correspond to the components of the accounts payable; k. Allocation to the new accounting document the pair of accounts receivable / creditor Dmin / Cmin; and

2. Method according to claim, wherein the distance between two accounts is determined by the number of jumps that must be made in the tree of the chart of accounts to join two accounts by the shortest route;

3. Method according to one of claims 1 or 2, wherein the determination of the BMU vector in step h. is performed on the basis of a Euclidean distance measurement between the BMU vector and the W vector;

4. Method according to one of the preceding claims, in which the self-organized card C is updated after each new accounting entry, after every tenth new accounting entry or after every hundredth new accounting entry.

5. Method according to one of the preceding claims, wherein step b. is performed using a so-called hash function F, such as "Label encoding", "1 of N" encoding, "Hasking trick", or "Optimal binning".

6. Method according to one of the preceding claims, in which the learning of the self-organizing map C is carried out by a Self-Organizing maps algorithm.

7. Method according to one of the preceding claims, in which the "description" and "amount" information is extracted from a scanned copy of the new accounting document.