WO2016186137A1

WO2016186137A1 - Accounting assistance system

Info

Publication number: WO2016186137A1
Application number: PCT/JP2016/064758
Authority: WO
Inventors: 上野裕史; 高島研也
Original assignee: 株式会社スキャる
Priority date: 2015-05-18
Filing date: 2016-05-18
Publication date: 2016-11-24
Also published as: JP2019204535A; JP6835713B2; JPWO2016186137A1

Abstract

Provided is an accounting assistance system (1) having a classification unit (80), a distribution unit (22), and an aggregation unit (23). The classification unit outputs a classification destination for a voucher serving as evidence of a user transaction. The distribution unit distributes and transmits divided data (29), together with identification information indicating the voucher to be classified, to different workers (8) for the purpose of digitization, said divided data having been divided on the basis of the position, within the voucher to be classified, of multiple items of character information in the voucher. The aggregation unit obtains the divided and digitized character information (28) that has been digitized by the different workers (8) and, on the basis of the identification information, generates, from the divided and digitized character information, classification subject data (60) to be classified by the classification unit.

Description

Accounting support system

The present invention relates to a system that supports accounting operations.

Japanese Patent Application Laid-Open No. 2014-235484 provides a cloud-type system in which a user can obtain a journal entry result of a transaction indicated in a voucher in real time simply by transmitting the voucher data from a Web terminal. Are listed. This system includes a server that executes processing of a journal analysis service, a first master that stores a product name and a product group in association with each other, and a journal by a journal pattern of the product group and account item as a pair. A database storing a master shared by all users including a second master that records the number of persons to be processed, and the server analyzes the voucher data transmitted from the Web terminal that requests the journal analysis service. Means for extracting journal element information, and a product group corresponding to the product name included in the element information is obtained from the first master, and among all account items in the second master corresponding to the product group, Means for generating a journal by selecting an account item having the largest number of journal processing persons and presenting the journal as a recommended journal.

There is a need for an accounting support system that can make journaling more efficient.

In one aspect of the present invention, a journal unit that outputs a journal entry of a voucher that is evidence of a user's transaction, and divided data obtained by dividing a plurality of character information included in the journal to be journalized according to a notation position in the voucher, Along with the identification information indicating the voucher to be journalized, a distributed unit that is distributed and transmitted to different workers for digitization, and character information that has been digitized (divided and digitized) by different workers. And an aggregation unit that generates journal data to be journalized by the journal unit based on the identification information from the divided digitized character information.

In this system, information contained in one voucher is digitized by a plurality of different workers. For this reason, one worker only sees a piece of data contained in the voucher. Therefore, the voucher can be efficiently digitized by a plurality of workers while ensuring confidentiality.

Furthermore, since the voucher data is distributed and transmitted in a highly confidential state in the distributed unit, it can be transmitted and received safely via a computer network (cloud), typically the Internet. For this reason, a person who can connect to the network (cloud) can be used for digitizing the voucher, and voucher data can be digitized safely at low cost.

The distribution unit may include a unit that classifies the divided data according to the notation position in the voucher, adds a category based on the display position, and transmits the divided data to different workers for each type. Since the worker digitizes the divided data of the same category (type, group, type), the efficiency of the digitization work can be improved.

The distribution unit is configured to generate a plurality of voucher divided images obtained by dividing an image of a journal to be journalized according to a character information notation position and a divided data including the plurality of voucher divided images through different operations via a computer network. And a unit for transmitting to a person. The transmitting unit may transmit the divided data to different workers via the Internet. By separating and digitizing the data contained in the voucher independently by different workers distributed in places via the Internet (cloud), the voucher data can be digitized more safely.

The journal unit that obtains the journal data including the word sequence extracted from the transaction date, amount, and other text information of the journal voucher will determine the distance between the journal data and multiple journal reference entries, the transaction date, An analogy determination unit that calculates the amount of money and the similarity of each word included in the word array as a parameter may be included. Each of the plurality of journal entry entries includes a transaction date, an amount, and a word array extracted from other character information for each entry in the book including information on past vouchers that have been journalized by the user as entries. The journal unit may further include a first journal output unit that outputs the account item of the journal reference entry having the shortest distance from the journal data as a journal.

In this journal unit, the journal reference entry that is the past journal result and the journal object data to be journaled are acquired as metadata including the transaction date, the amount, and the word array. Furthermore, the similarity determination unit cuts out the elements included in the metadata, in particular, each word included in the word array into words that can be determined by similarity, and converts a plurality of journal reference entries and journal target data into transaction dates, A distance between a journal reference entry and journal data is calculated by mapping the amount and a plurality of words that can be judged to be similar to a multidimensional space. Then, the first journal entry output unit outputs the account item of the journal reference entry having the shortest distance from the journal entry data as the journal entry. For this reason, even if the word included in the word array of the metadata is not defined in advance, an appropriate number of words, collocations, compound words, etc. are automatically identified from the word array (including collocations, compound words, etc.). , And the analogy between the journal reference entry and the journal data can be automatically determined from various directions using similarities between the words and the meaning suggested by the words. Therefore, it is possible to automatically output the account item for journalizing the voucher from which the journal entry data is extracted based on the past journal result with high accuracy.

When the distance between the journal unit and the shortest journal reference entry selected by the first journal output unit is larger than the first threshold, the difference from the amount of the journal data is the largest within the second threshold. A second journal output unit that outputs the account item of the journal reference entry having a close transaction date as the journal may be included. The amount and date are included in the most effective analogy judgment items that determine the journal entry for the voucher. Accordingly, when the distance between the journal entry data and the journal reference entry is too far away due to other factors, there are cases in which the optimal account item can be output by removing the other factors and making a comparison.

Account items can be divided into several categories. The accounts of the plurality of journal entry entries are divided into a plurality of categories, and the categories include at least one account. Each of the plurality of journal reference entries may include category information. The similarity determination unit is a category-based similarity determination unit that calculates the distance between the journal reference entry of the same category as the category determined based on at least the title and the word included in the word array of the journal object data and the journal object data. May be included. By limiting the number of journal entry entries for distance calculation using a common category, the calculation time can be shortened, and the accuracy of similarity determination can be improved. The category may distinguish a difference in transaction direction, or may distinguish a difference in titles of vouchers.

If the recipient or title of the voucher is replaced with information called a category and the comparison is judged, the information may be deleted from the metadata. In a system where the journal is determined by majority vote in the cloud, the user's voucher Can be prevented from spreading on the cloud.

The information “category” may be obtained from the title of the voucher when extracting the journal entry data from the voucher and included in the journal entry data in advance, and the journal entry unit may include a category determination unit. The category determination unit determines which category the journal entry data belongs to based on at least a word indicating the title and destination included in the word array of the journal entry data.

If at least a part of the words included in the word array of the journal entry data includes position information described in the proof of the word, for example, upper right, upper center, upper left, lower right, etc., the journalizing unit A title / address extracting unit for extracting a title and a destination from a word array based on the position information of the word sequence may be included. There is a tendency for the title to be displayed at the center of the voucher and the destination to be displayed on the voucher, such as the top left of the voucher. By referring to the location information, the title and destination of the voucher are automatically determined. Accuracy can be improved.

One of the other aspects of the present invention is a program (program product) that causes a computer to operate as a system having a journal unit that outputs a journal entry for evidence that is evidence of a user's transaction. The program may include a unit (functional unit) that causes the computer to further operate as means for generating a journalized database including a plurality of journal reference entries from the user's past book data input to the computer. . The program (program product) can be recorded on an appropriate recording medium and provided.

One further different aspect of the present invention is a method that includes outputting a voucher journal entry that is evidence of a user's transaction by a computer. The system includes a transmission / reception unit in which a computer exchanges data with a plurality of workers via the Internet, and the method includes the following steps.
1. The computer transmits the divided data obtained by dividing a plurality of pieces of character information included in the voucher to be journalized according to the notation position in the voucher together with identification information indicating the voucher to be journaled to different workers for digitization. To be distributed and transmitted via
2. Journal information that is digitized by different workers, acquired through the transmission / reception unit, and processed in the step of outputting a journal based on identification information from the segmented digitized character information Generate target data.

The step of distributing and transmitting may include the step of classifying the divided data according to the notation position in the voucher and distributing and transmitting to different workers for each type. Further, the step of transmitting in a distributed manner may include the step of transmitting divided data including a plurality of voucher divided images obtained by dividing the journal to be journalized according to the notation position of the character information into different workers.

The computer includes a plurality of journal references each including a word sequence extracted from the transaction date, amount, and other character information for each entry in the book that includes the user's past journalized voucher information as an entry in the memory. You may have a journalized database that includes entries, and the step of outputting the journal may include the following steps.
-The computer obtains the journal data including the word date extracted from the transaction date, amount, and other character information of the voucher to be journalized.
-Calculate the distance between multiple journal reference entries in the journalized database and journal entry data using the transaction date, amount, and similarity of each word included in the word array as parameters.
-Output the account of the journal entry with the shortest distance from the journal entry data as the journal.

Calculating the distance includes calculating the distance between the journal reference entry and the journal target data in the same category as the category determined based on at least the title and destination word included in the word array of the journal target data. But you can.

Acquiring is to acquire information (division and digitized character information) digitized by different workers after a plurality of character information included in the voucher to be journalized is divided according to the notation position in the voucher. The method may include generating journal entry data based on the identification information indicating the voucher of the journal entry from the character information converted into the divided data.

The block diagram which shows the outline | summary of an accounting assistance system. The block diagram which shows the outline | summary of the factory which accepts a voucher. The block diagram which shows the outline | summary of a server. The flowchart which shows the outline | summary of a process of a server. The flowchart which shows the process which divides | segments a voucher image and distributes it to an operator. The figure which shows the example where a voucher is divided | segmented. The block diagram which shows the function of the conversion unit which produces a journal reference entry from the past journal. An example of a subject / category conversion table. The block diagram which shows the outline | summary of a journal entry unit. The block diagram which shows the outline | summary of a title and address extraction unit, and a category determination unit. The flowchart which shows the outline | summary of the process of a journal entry unit.

Fig. 1 shows an example of an accounting support system. This accounting support system (accounting support apparatus) 1 is a system for organizing and journalizing a plurality of users 3 vouchers (certificates), for example, expense settlement vouchers 5. The user 3 may be, for example, a personal accountant, company, or other organization such as an accounting office or a tax accountant office that uses the accounting support system 1. The accounting support system 1 provides services including digitization of the voucher original 5, original management, and journaling work. In addition, the accounting support system 1 uses a plurality of remote workers 8 connected via the Internet (cloud) 9 in order to process enormous digitization work at low cost. In this specification, digitization means information such as handwritten character information, printed character information, etc., which is written in a voucher (voucher) or software that operates on a computer. This indicates that the data is converted into processable data, that is, electronic data, digital data, or the like.

In this system 1, a part of the voucher document 5 is converted into image data and distributed to the terminal of the remote worker 8 connected to the accounting support system 1 via the Internet 9. The remote worker 8 performs work for digitizing a part of the voucher document 5 or verifies it. The accounting support system 1 aggregates the work results of the remote worker 8 via the Internet 9 and then determines the journal entry. The accounting support system 1 can improve the efficiency of the work by separating the data conversion work that does not require specialization but requires man-hours from the journalizing work that requires accounting specialization. It is possible to improve the accounting power.

The voucher 5 is also called voucher or voucher document (may be described in this specification), and provides evidence of transactions such as receipts, invoices, invoices, purchase orders, invoices, payment certificates, etc. An accounting book is prepared based on the contents of the transaction described in the voucher 5. The voucher 5 is obliged to be organized and stored for a predetermined period by each user 3.

Currently, in the world of corporate accounting, human resources with accounting knowledge look at the voucher 5 and make direct journal entries. The accounting support system 1 adopts a new method that radically changes the method. The accounting support system 1 roughly includes an electronic data conversion process and a journalizing process. The main process of the digitization process is to extract a character string from the voucher 5 and digitize it. It is pure string digitization and does not require any accounting expertise. On the other hand, the journalizing process is a task that performs journaling work based on digitized data and determines account items, and requires accounting expertise. In this accounting support system 1, by using the digitized data, the journalizing work can be automated using information technology. It is also possible to set up a process to check the final results by accounting professionals with accounting expertise such as accountants and tax accountants who are familiar with the work of each user 3, verifying the results of automatic sorting work, journaling accuracy Can be guaranteed.

The accounting support system 1 includes a factory department (factory) 11 that accepts the voucher 5 provided by the user 3, organizes, confirms, and images it, and a warehouse department (warehouse) 12 that stores the voucher 5 in the state of the original 6. A data processing department (server) 13 that digitizes the voucher 5 and makes accounting entries based on the imaged data (voucher image) 7 of the voucher 5 is included. In the accounting support system 1, the original 6 of the voucher 5 is stored in the warehouse 12, and the user 3 can access the electronic voucher 5 data of the server 13 via the Internet 9. Furthermore, the user 3 can obtain the original 6 of the voucher 5 stored in the warehouse 12 based on the electronic data of the server 13 or refer to the warehouse 12 as necessary.

The server 13 has computer resources such as a memory and a CPU, and realizes a function provided by a program (program product). The server 13 includes a unit 14 that transmits and receives data to and from the terminal of the remote worker 8 via a computer network (Internet, cloud) 9. The server 13 further uses a data input processing function (data input processing unit) 20 for digitizing the voucher image 7 using the remote worker 8 and the digitized information or accounts for the information. A data mining function (data mining unit) 30 for adding accounting information such as, a database 50 for storing digitized data, and a data display providing function for providing the digitized data to the user 3 via the network 9 (Data display unit) 40.

FIG. 2 shows an outline of processing in the factory 11. Each employee 103 of each user 3 puts the voucher 5 and the expense settlement label 105 in a suitable container, for example, a plastic bag (chuck poly) 107 with a chuck. The voucher 5 journalized for each employee by the chuck poly 107 is put in a dedicated envelope 108 and delivered to the factory 11 by delivery means such as mail or consignment. The identification information (ID) attached to the expense settlement label 105 by a barcode, a two-dimensional code or the like becomes the main ID of the voucher 5 included in the chuck poly 107, and each voucher 5 included in the chuck poly 107 has a main ID. In addition to the branch ID, each voucher 5 is completely identified.

In the factory 11, pre-processing 111, organizing data conversion processing 114, and storage processing 116 are performed. In the pre-processing 111, the voucher 5 included in the incoming chuck poly 107 is confirmed, and identification information (ID) having the expense settlement label as the main ID is associated with each voucher 5 on a one-to-one basis. In the organized data conversion process 114, each voucher 5 associated with the identification information on a one-to-one basis is organized, confirmed, and imaged. Data obtained by imaging the voucher 5 is fed back to the user 3 as a receipt confirmation image 118. Data obtained by imaging the voucher 5 is supplied to the server 13 as image data (voucher image) 7 for digitization. When these processes are completed, the voucher 5 is stored again in the chuck poly 107 and stored as the original 6 in the warehouse 12. Since the same identification information as the voucher image 7 is attached to the original 6, the original 6 stored in the warehouse 12 can be easily reached from the information obtained by digitizing the voucher image 7 if necessary.

FIG. 3 is a block diagram showing functions of the server 13. The server 13 uses a data input processing unit (data input processing function) 20 for digitizing the voucher image 7 and a journalizing unit (journaling apparatus, journalizing system) for journalizing vouchers using the digitized information (data to be journalized) 60. , Data mining function) 30, a database 50 for storing digitized accounting data and the like, and a data display providing function 40 for supplying journalized voucher data to the user 3. The journal unit 30 can also be used as a data mining function (data mining unit) for adding accounting information such as account items to the input information. The data display providing function 40 can also be used as a data display unit 40 that provides digitized data stored in the database 50 to the user 3 via the network 9.

The data mining function 30 includes an automatic journal unit 80 that automatically journals and determines account items. The server 13 includes a functional unit (process) 90 for confirming the result of automatic sorting, and the journalizing result, that is, the automatically output account item is confirmed manually by the accounting specialist 91. The electronic journal data 60 is stored in the database 50 as transaction information 95. Therefore, the database 50 includes a function as the updated journal diary 52 of the user 3.

The data input processing unit (data input processing device) 20 includes an image reading unit (image reading device) 21 that acquires the voucher image 7 to be journalized in units of vouchers, and a plurality of voucher divisions obtained by dividing the voucher image 7 corresponding to the voucher. A distributed unit (distributed device, distributed function) 22 for supplying an image (divided data) 29 to a plurality of workers (remote workers) 8 for digitization via the network 9 and divided data 29 And an aggregating unit (aggregating apparatus, aggregating function) 23 for acquiring character information 28 that has been digitized (divided and digitized) by a plurality of workers 8.

The distribution unit 22 divides the divided data 29 obtained by dividing a plurality of character information included in the voucher to be journalized according to the notation position in the voucher together with identification information 27 indicating the voucher to be journalized into different workers for digitization 8 is distributed and transmitted. For this reason, the distribution unit 22 divides the voucher image 7 according to the notation position of the character information included in the voucher image 7 and generates a plurality of voucher divided images (divided data) 29 (image dividing apparatus, image). Division function) 24. The voucher divided image 29 includes a divided image 29a, OCR information 29b obtained by converting the divided image 29a into character data using OCR, and identification information (ID that associates the voucher divided image 29 with a specific voucher image 7, that is, the voucher 5. 27). The image dividing unit 24 includes a function of classifying divided data by adding a category according to the notation position in the voucher and transmitting the divided data to different workers for each category (for each type and for each group).

The aggregation unit 23 acquires character information 28 divided and digitized by different workers 8, and generates journal object data 60 to be journalized by the journal unit 30 based on the identification information 27 from the character information 28 divided and digitized. . Therefore, the aggregation unit 23 acquires character information (divided digitized character information) 28 in which the character information included in the voucher divided image 29 is digitized by the worker 8 from each worker 8 and stores it in the identification information 27. A generation unit (generation device, generation function) 25 that generates the journalizing target data 60 by aggregating the divided data character information 28 based on the generated data is included.

The process of digitizing information contained in the voucher 5 tends to increase man-hours. In this system 1, a mechanism for reducing time and cost by adopting parallel work by a plurality of workers is adopted. Yes. First, the voucher base paper 5 is converted into an electronic image by an image reading device or the like at the factory 11. Next, it is image-recognized, a rectangular shape is detected, and divided into rectangular units to form divided images 29a. Since the rectangle includes a collection of character string information, it can be converted into OCR information 29b if it can be read by OCR. Existing image recognition software can be used for OCR. However, there are many cases where it cannot be converted into character information by OCR, such as handwritten vouchers, characters are faint, difficult to read, or Kanji characters are easily misread, or misconverted. The worker 8 who performs the distributed work via the network 9 visually checks the divided image 29a included in the voucher divided image 29 to determine whether or not the OCR information 29b is correct. The correct divided data character information 28 is generated by manual input or manual correction.

FIG. 4 is a flowchart showing an outline of processing provided by the server 13 of the accounting support system 1. In step 151, the image reading unit 21 acquires the voucher image 7. In step 152, the distribution unit 22 implemented in the server 13 obtains the divided data (voucher divided image) 29 obtained by dividing the plurality of character information included in the journal voucher (voucher image) 7 according to the notation position in the voucher. In addition to the identification information 27 indicating the voucher to be journalized, the information is distributed and transmitted to different workers 8 via the transmission / reception unit 14 for digitization. In step 155, the worker 8 on the cloud who has received the voucher divided image 29 independently performs digitized work of limited character information within the range included in the divided image 29.

In step 153, the aggregation unit 23 mounted on the server 13 acquires (receives) the segmented digitized character information 28 digitized by different workers 8 on the cloud via the transmission / reception unit 14, Based on the identification information 27, the journal data 60 is generated from the divided electronic character information 28. In step 154, the automatic journal unit 80 performs journal processing for the journal target data 60.

FIG. 5 is a flowchart showing the process (step 152) in the image dividing unit 24 in more detail. First, in step 201, the orientation of the voucher base paper 5 converted into image data from the voucher image 7 is detected, and the rotation is corrected so that the orientation of the image 7 matches the orientation of the base paper 5. As the voucher base paper 5, for example, A4 size paper may be used vertically or horizontally. In most cases, the character information described in the voucher 5 is in the horizontal direction, and it is necessary to cut out the divided images along the line of characters. Therefore, the orientation of the voucher image 7 is corrected so that the direction of the character information matches from the arrangement of characters.

Next, in step 202, the size of the voucher base paper 5 included in the voucher image 7 is automatically detected. By determining the size of the voucher base paper 5, a library preset in accordance with the size is selected. The library includes information about image division such as the size and position of a rectangle to be recognized for each size of the base paper 5. In step 203, the voucher image 7 is recognized by being divided into rectangles having a size set for each size of the base paper 5 by image processing. The information recognized in units of rectangles is stored in a table, which is used as a rectangle list. In order to increase the accuracy, preprocessing such as an image filter may be added. This step is important for image segmentation, and a process for manual segmentation may be inserted when it is determined that automatic image segmentation is insufficient.

The rectangle list includes the result of dividing the voucher image 7 into small rectangles suitable for the size and dividing all areas. In step 204, a rectangle including character information is found from the rectangle list, and each image is cut out to generate a plurality of divided images 29a from one voucher image 7. The divided image 29a is basically an accumulation of rectangular images registered in the rectangular list, and one rectangular image may be included in the plurality of divided images 29a, and such a state includes character information. Can be adjusted by the distribution map of the rectangular divided image 29a.

In step 204, the divided image 29a may be classified (grouped) by adding a character information category based on information included in the library for each position, order, and size of the rectangle including the found character information. For example, the divided image 29a divided by the rectangle including the character information first appearing on the upper side and the left side of the voucher image 7 is likely to include character information related to the title of the voucher, and “title” is included in the divided image 29a. It is possible to attach a character information category. The character information category may indicate a specific type of content such as a title, date, amount, etc., and is mapping information indicating the position, size, etc., of the divided image 29a cut out from the voucher image 7. Also good.

FIG. 6 shows an example of the divided image 29 a generated from the voucher image 7. It is also possible to assign a character information category to the divided image 29a such that the upper right divided image 29a is a voucher number, the upper left divided image 29a is a title, and the next left divided image 29a is a destination. The character information category may not be attached to the divided image 29a, but the character information category may be automatically attached according to the recognition result by OCR described later or the input result by the worker 8, and the worker 8 manually converts the data into divided data. A character information category may be added to the character information 28. The divided image 29a may be a character string unit, and the notation collected in a table or the like may be a unit such as a table or a column.

When the divided images 29a are generated, identification information (ID) 27 is attached to each divided image 29a in step 205. It is desirable that the identification information 27 is a combination of identification information indicating the voucher 5 that has become a division source and identification information indicating individual divided images 29a.

In step 206, the divided image 29a is character-recognized using character recognition software (OCR) to generate OCR data 29b. In step 207, the voucher divided image 29 including the identification information 27, the divided image 29 a and the OCR data 29 b is distributed to the worker 8 via the Internet 9. Each worker 8 looks at one of the divided images 29a shown in FIG. 6 and confirms the OCR data 29b. Therefore, each worker 8 only grasps the information of the fragment of the voucher 5, the contents of the voucher 5 do not leak to the worker 8, and the information regarding the accounting of the user 3 does not leak to the worker 8. Absent.

The worker 8 may perform the digitization work by distributing the divided image 29a of the voucher 5 indiscriminately, that is, regardless of the character information category. By distributing the voucher divided image 29 having the same character information category to the worker 8, it is possible to perform the digitization work more efficiently. For example, if a worker 8 specializes in the work of digitizing the voucher divided image 29 related to the voucher title, the range of character information for interpreting the divided image 29a is limited, and the efficiency and accuracy are improved. If the voucher division image 29 relating to the amount is digitized, the character information can be limited to being a number, and the operation can be performed repeatedly, so that the operation efficiency and accuracy are easily improved.

A plurality of workers 8 are delivered with a rectangular divided image 29a and a character string (OCR data) 29b. However, without performing OCR, only the rectangular divided image 29a is delivered and the worker 8 converts the data into data. Also good. The operator 8 visually checks the rectangular image (divided image) 29a and the character string 29b to confirm whether they are correct. If it is wrong, the character string 29b is corrected. When the character string 29b is not passed, the character string is manually input. A plurality of divided images 29a are not delivered to the worker 8. This is to prevent the business information of the company to which the voucher belongs from being revealed. In the range where the single divided image 29a is handled, it is impossible to determine the business information of the company, and the confidentiality of the company information is ensured. Fragmented character strings (divided data character information) 28 confirmed by the worker 8 are collected by the server 13 and collected into one piece of information.

The aggregation unit 23 of the server 13 collects character information (character information that has been digitized and divided data) that has been digitized by each worker 8 via the Internet 9 and is generated by the generation unit 25 as a journal entry target. Data 60 is generated. The journal entry data 60 includes a word sequence extracted from the transaction date, amount, and other character information of the voucher 5 to be entered. The unit 80 that performs the journaling of the data mining unit 30 calculates the distance L in the multidimensional space between the journal object data 60 and the plurality of journal reference entries 70, and refers to the journal entry with the shortest distance L from the journal object data 60. The account item of entry 70 is output as the journal. The journal reference entry 70 is information obtained by converting the entry of a voucher that has been previously journalized by the user 3 as an entry, for example, a journal diary entry.

As shown in FIG. 3, the data mining unit 30 automatically generates a journal reference entry 70 from the

journal diaries

51 and 52, and generates a reference library 53. Journalizing unit (automatic journaling function, automatic journalizing device) 80 that performs automatically. The journal diary 51 to be converted may be the journal diary 51 used by each user 3 in the past accounting process, or the diary 52 including information journalized by the accounting support system 1. Good. The conversion unit 31 generates a reference library 53 from the journal diaries 51 and 52 (which will be described below with reference to the journal diary 51), which is the past journal data of the user 3. The reference library 53 is a database for finding the most similar past journal entry for a new voucher.

FIG. 7 shows the function of the conversion unit 31. Each entry (diary entry) 51a in the journal diary 51 is managed by an ID 51b. Each entry (journal reference entry) 70 in the reference library 53 is managed by a category 71 described later, and an ID 51b for tracking the entry 51a in the journal diary 51 is included as content. Each entry 51a of the journal diary 51 includes a transaction date 51c, an amount 51d, a debit item 51e, a credit item 51f, a debit tax code 51g, a credit tax code 51h, a debit auxiliary item 51i, a credit auxiliary item 51j, and a summary 51k. included. The conversion unit 31 includes a word extraction function 32, generates a word array 73 by dividing the information included in the summary 51k into word units, and outputs it as one of the keys of the journal entry entry 70. The word extraction function 32 has a general Japanese syntax analysis function. The journal entry entry 70 is linked to the entry 51a of the journal diary 51 through an ID 51b included as content (value).

The conversion unit 31 further includes a category generation unit (category generation function, category generation means) 33. The category generation unit 33 refers to the item / category conversion table 34 and determines a category (journal category) 71 from the debit item 51e and the credit item 51f of the diary entry 51a. Other information of the diary entry 51a, for example, the debit auxiliary subject 51i, the credit auxiliary subject 51j, and the summary k are converted into a word array 73 in which the information is divided into words by the word extraction function 32. In this example, the debit item 51e and the credit item 51f are included in the key information of the journal entry reference 70 as the category 71. Therefore, it is not included in the word array 73. However, the debit subject 51e and the credit subject 51f may be included in the word array 73.

FIG. 8 shows an example of the subject / category conversion table 34. This category (journal category) 71 is a parameter newly defined (independently) in the accounting support system 1. The category 71 may be any information that is clearly and relatively easy to distinguish in accounting and can be easily distinguished without duplication. In the accounting support system 1, as the category 71, four parameters are set according to the combination of the direction of transaction, accounting, and application. The category 71 includes “income recording”, “expenditure recording”, “payment application”, and “withdrawal application”. These categories 71 are suitable for classifying the vouchers 5 according to their properties, and the corresponding category 71 of the voucher 5 can be easily and accurately determined from the title of the voucher 5 and the destination.

FIG. 9 is a block diagram showing the configuration of the journalizing unit 80. The journal unit 80 includes an acquisition unit (acquisition function, acquisition means) 81 for acquiring journal object data 60 that is digitized information of the journal voucher 5, a plurality of journal reference entries 70, and journal object data 60. A first journal that outputs the

account items

51e and 51f of the journal reference entry 70 with the shortest distance L between the similarity determination unit (analysis determination function, similarity determination means) 82 for calculating the distance and the journal entry target data 60 as the journal entry. And a destination output unit (first journal destination output function, first journal destination output means) 86. The journal entry data 60 includes a transaction date 64, an amount 65, and a word array 63 extracted from other character information. The acquisition unit 81 extracts a word 63 a included in the word array 63 of the journal entry data 60. The acquisition unit 81 automatically identifies an appropriate number of words, collocations, compound words, and the like from the word sequence even if the words included in the word sequence that is the metadata of the voucher 5 are not defined in advance. (Including compound words).

The similarity determination unit 82 multi-dimensionalizes the journal target data 60 and the journal reference entry 70 by using not only the transaction dates 64 and 51c and the

amounts

65 and 51d but also the words included in the

word arrays

63 and 73 as similar parameters. The distance L between the journal object data 60 and the plurality of journal reference entries 70 is calculated. Therefore, in the analogy determination unit 82, the transaction dates 64 and 51c and the

amounts

65 and 51d are used as essential parameters, but other information is appropriately extracted from the word array even if not defined in advance. The extracted (extracted) words are used as similar judgment criteria based on the words, the meaning suggested by the words, and the order of the words. Therefore, the analogy between the journal reference entry 70 and the journal object data 60 can be automatically determined from various fields. Therefore, the journal unit 80 can automatically output the account item for journalizing the voucher 5 from which the journal object data 60 is extracted based on the past journal results with high accuracy.

This journal unit 80 constructs a metadata database based on the past journal diary or data corresponding thereto, and uses it as data (journal reference entry) 70 for finding the journal of the voucher 5. The journal entry 70 has a plurality of keys, and the new voucher 5 can be converted into metadata. Therefore, the distance L between the journal object data 60 obtained by converting the voucher 5 into metadata and the journal reference entry 70 is evaluated, and the account item of the journal reference entry 70 having the closest distance is determined as the account item of the new voucher 5. Neither the metadata contained in the journal reference entry 70 nor the metadata contained in the journal target data 60 need to be pre-assigned a common key in the journal unit 80, except for the date and amount required for the voucher 5. The similarity of an arbitrary word extracted from an arbitrary word sequence is calculated as a distance L and evaluated. The function for extracting words preferably has a general Japanese parsing function.

In this journal unit 80, the reference library 53 including the journal reference entry 70 is generated from the journal diary 51 which is the user's past journal data. Therefore, the journal unit 80 searches for the most recent past journal diary entry 51 a for the new voucher 5. Each entry 51a in the journal diary 51 is managed by an ID 51b. By including the ID 51 b in the journal entry 70, a new voucher 5 can be journalized based on the past journal diary 51.

The journal reference entry 70 may be based on the past journal result of another user or the logic of majority vote on the Internet, instead of the past journal diary entry of the user 3. However, the information of the user 3 is opened. For this reason, it is desirable to determine the analogy by replacing the information that leads to the identification of the economic activity of the user 3 with general-purpose information using the category 71 described above.

The journal unit 80 has the difference between the amount 65 of the journal object data 60 and the amount 51d of the journal reference entry 70 within the range of the second threshold value Vt2, and the journal whose transaction date 51c is closest to the transaction date 64 of the journal object data 60. It includes a second journal output unit 87 that finds the reference entry 70 and outputs the

account items

51e and 51f of the journal reference entry 70 as a journal. Specifically, the journal entry 70 that has an amount difference within ± Vt2 is selected and sorted in ascending order. Among them, the one having a date difference within D days is selected, and the journal reference entry 70 having the closest date is found by sorting in the order of close date.

This second journal output unit (second journal output function, second journal output means) 87 is the distance L in the shortest journal reference entry 70 selected by the first journal output unit 86. Is larger than the first threshold value Vt1, and it is determined that it is determined that an accountingally significant journal cannot be selected. Transactions with little difference in transaction dates and similar amounts are likely to be the same or similar transactions, and the proof 5 that is the title of the transaction is likely to be able to be journalized to the same account item .

The similarity determination unit 82 of the journal unit 80 further includes a category to which the journal data 60 belongs (a journal category, a journal target category) based on at least a word indicating the title 63b and the destination 63c included in the word array 63 of the journal data 60. ) Category determination unit (category determination function) 84 for determining 61, and category-specific ratio determination for calculating the distance between the journal reference entry 70 of the same journal entry category 71 as the journal entry category 61 and the journal entry data 60 Unit (distance calculation function, distance calculation unit) 85. Further, the journal unit 80 extracts a title 63b and a destination 63c from the word array 63 based on position information (mapping information) of the word 63a of the word array 63 of the journal data 60, and extracts a title / address extraction unit (title / address extraction). Function) 83.

The title / address extraction unit 83 extracts the title 63b and the destination 63c from the journal data 60, the category determination unit 84 determines the transaction direction from the title 63b and the address 63c, and the category 61 from the transaction direction and the title 63b. Judging.

FIG. 10 shows a more detailed configuration of the title / address extraction unit 83 and the category determination unit 84 in a block diagram. The title / address extraction unit 83 includes a function of detecting a sender in order to determine the address more clearly. That is, the title / address extraction unit 83 includes a title detection function 83a, a destination detection function 83b, a transmission source detection function 83c, and a company name / customer name determination function 83d. The title detection function 83a searches for the title 63b from the journalizing target data 60 that is digitized based on the distribution map 55a including the position information of the title candidates prepared in the library 55 and the title dictionary 55b. The title dictionary 55b includes words that are widely used as the title of the voucher 5 such as “invoice”, “purchase order”, “delivery note”, and “description”.

The address detection function 83b searches for the address 63c from the journalizing target data 60 that is digitized based on the distribution map 55a and the address prefix dictionary 55c based on the position information of the address candidates prepared in the library 55. The prefix dictionary 55c includes words that are widely used to indicate destinations such as “Gion”, “To”, “Sama”, and “Line”.

The transmission source detection function 83c searches for the transmission source from the journalizing target data 60 that is digitized based on the distribution map 55a and the supplier dictionary 55d based on the location information of the transmission source candidates prepared in the library 55. The supplier dictionary 55d includes the names of past suppliers of the user 3.

Since the journal entry data 60 is input or reviewed by the operator 8, it includes character information that is much more accurate than character information obtained simply by OCR. Therefore, the title detection function 83a, the destination detection function 83b, and the source detection function 83c refer to the respective dictionaries 55b to 55d without using the position information on the voucher 5, and based on the character information, the title and destination Further, the sender may be determined. The company name / customer name determination function 83d determines whether the user 3 is a transmission source or a destination. A match search may be performed in character units, a longest match search may be performed, and a matching value may be determined. If the sender has the company name and the destination does not have the company name, the sender is determined to be the company and the destination is determined to be the business partner. If the sender does not have the company name and the destination has the company name, the sender is determined to be the business partner and the destination is determined to be the company. If there is a company name at the source and destination, or there is no company name at the source and destination, automatic determination is impossible and manual determination is performed.

The category determination function 84 includes a transaction direction determination unit 84a and a category selection unit 84b. The transaction direction determination unit 84a determines the direction of the transaction by referring to the transaction direction determination table 56 based on whether the destination 63c and the transmission source 63d are the company name or the customer name and the title 63b. If the title 63b is an invoice and the destination 63c is a supplier name, the transaction direction 69 is outward (OUT). If the title 63b is an invoice and the destination 63c is the company name, the transaction direction 69 is in (IN). When the transaction direction 69 is OUT, the issuer of the voucher 5 is defined as the company, and IN is defined as the issuer of the voucher 5 is the customer.

The category selection unit 84 b determines the category 61 based on the title / category conversion table 57 from the title 63 b and the transaction direction 69. For example, if the title 63b is an invoice and the transaction direction 69 is OUT, the category 61 is determined to be income. The category 61 of the journal object data 60 is compared with the category 71 of the journal reference entry 70. As shown in FIG. 8, if the

Claims

A journal unit that outputs the journal entry for voucher that is evidence of the user's transaction;
A plurality of pieces of character information included in a journal to be journalized are divided and transmitted to different workers for digitization together with identification information indicating the journal to be journalized, along with identification information indicating the journal to be journalized. A decentralized unit to
An aggregation unit that obtains characterized information that has been digitized by the different workers and that is generated by the journalizing unit based on the identification information from the characterized information that has been digitized. And a system having.
In claim 1,
The distributed unit includes a unit that classifies the divided data according to a notation position in the voucher and distributes the divided data to the different workers for each type.
In claim 1 or 2,
The distribution unit is a unit that generates a plurality of voucher-divided images obtained by dividing the journal of the journal to be journalized according to character information notation positions;
A unit that transmits the divided data including the plurality of voucher divided images to the different workers via a computer network.
In any of claims 1 to 3,
The journal entry data includes a word sequence extracted from the transaction date, amount, and other character information of the journal entry voucher,
The journal unit includes a plurality of word sequences extracted from the transaction date, the amount, and other character information for each entry in the book including the journal object data and the voucher information of the user's past journal as entries. A similarity determination unit that calculates the distance from the journal reference entry of the transaction date, the amount, and the similarity of each word included in the word array as a parameter;
And a first journal output unit that outputs the account item of the journal reference entry having the shortest distance from the journal object data as a journal.
In claim 4,
When the distance between the journal unit and the shortest journal reference entry selected by the first journal destination output unit is greater than a first threshold value, a difference from the amount of the journal data is a second value. A system including a second journal output unit that outputs the account of the journal reference entry having the closest transaction date within the threshold as the journal.
In claim 4 or 5,
The account items of the plurality of journal reference entries are divided into a plurality of categories, each of the plurality of journal reference entries includes category information;
The similarity determination unit calculates a distance between the journal reference entry of the same category as the category determined based on at least a title and a word included in the word array of the journal target data and the journal target data. A system that includes a categorical comparison unit.
In claim 6,
The journal unit includes a category determination unit that determines whether the journal data belongs to any one of the plurality of categories based on at least a word indicating a title and a destination included in the word array of the journal data. Including the system.
In claim 7,
At least a part of words included in the word array of the journalizing target data includes positional information of the words written in the voucher,
The journal unit includes a title / address extracting unit that extracts the title and the destination from the word array based on the position information of the word.
Outputting a voucher journal that is evidence of a user's transaction by a system including a computer, comprising:
The system includes a transmission / reception unit in which the computer exchanges data with a plurality of workers via the Internet,
The method is
The computer converts the divided data obtained by dividing the plurality of character information included in the voucher to be journalized according to the notation position in the voucher together with identification information indicating the voucher to be journalized to different workers for digitization. , Transmitting in a distributed manner via the transceiver unit;
Obtaining the digitized character information digitized by the different workers via the transmission / reception unit and outputting the journal from the divided digitized character information based on the identification information Generating journal object data to be processed at.
In claim 9,
The method of transmitting in a distributed manner includes classifying the divided data according to a notation position in the voucher and transmitting the divided data to the different workers for each type.
In claim 9 or 10,
The method of transmitting in a distributed manner includes transmitting the divided data including a plurality of voucher divided images obtained by dividing the journal to be journalized according to the notation position of character information to the different workers. .
In any of claims 9 to 11,
The computer includes a plurality of word sequences extracted from a transaction date, an amount, and other character information for each entry in the book including, as an entry, information on the user's past journalized vouchers as entries. Have a journalized database with journal reference entries,
The journal entry data includes a word sequence extracted from the transaction date, amount, and other character information of the journal entry voucher,
Outputting the journal is as follows:
The computer calculates a distance between the plurality of journal reference entries in the journalized database and the journal target data using a transaction date, an amount of money, and similarity of each word included in the word array as a parameter;
Outputting the account item of the journal reference entry having the shortest distance from the journal object data as the journal.
A program that causes a computer to operate as a system having a journal unit that outputs a journal entry of a voucher that is evidence of a user's transaction,
The system further includes different workers for digitization of divided data obtained by dividing a plurality of character information included in a voucher to be journalized according to a notation position in the voucher together with identification information indicating the voucher to be journalized. A distributed unit that distributes and transmits to
An aggregation unit that obtains characterized information that has been digitized by the different workers and that is generated by the journalizing unit based on the identification information from the characterized information that has been digitized. Including the program.
In claim 13,
The journal entry data includes a word sequence extracted from the transaction date, amount, and other character information of the journal entry voucher,
The journal unit includes a plurality of word sequences extracted from the transaction date, the amount, and other character information for each entry in the book including the journal object data and the voucher information of the user's past journal as entries. A similarity determination unit that calculates the distance from the journal reference entry of the transaction date, the amount, and the similarity of each word included in the word array as a parameter;
A first journal output unit that outputs the account item of the journal reference entry having the shortest distance from the journal object data as a journal.
In claim 14,
A program for causing the computer to further operate as means for generating a journalized database including the plurality of journal reference entries from the user's past book data input to the computer.