JP2015118488A

JP2015118488A - System, method and program for inputting account data

Info

Publication number: JP2015118488A
Application number: JP2013260693A
Authority: JP
Inventors: 土本　一生; Kazuo Tsuchimoto; 一生土本; 眞樹桑島; Maki Kuwajima; 孝志勝毛; Takashi Katsuge; 秀規真後; Hideki Masago; 直樹松永; Naoki Matsunaga; 真史土田; Masashi Tsuchida; 聡司横山; Satoshi Yokoyama
Original assignee: Japan Digital Laboratory Co Ltd
Current assignee: Japan Digital Laboratory Co Ltd
Priority date: 2013-12-17
Filing date: 2013-12-17
Publication date: 2015-06-25
Anticipated expiration: 2033-12-17
Also published as: JP6268352B2

Abstract

PROBLEM TO BE SOLVED: To simplify recognition processing of date, a total amount and the like and reduce false recognition in account data input.SOLUTION: An accounting process support system includes: format determination means for determining a format of an original voucher from a scanned image; character recognition means; and a first character recognition dictionary containing kanji characters and the like; and a second character recognition dictionary for numerical recognition. The accounting process support system determines a receipt by format determination, uses the first character recognition dictionary to find a character string containing a kanji character and the like, and uses the second character recognition dictionary to recognize an assumable numeral portion. For recognition of date and a total amount, date is specified using a date keyword, and a numerical portion of the specified date character string is recognized placing priority on numerical recognition. A total amount is specified using a total amount keyword, and a total amount character string is specified by a relative position of the obtained total amount character string, and is recognized placing priority on numerical recognition. If a total amount cannot be specified using a keyword, a total amount character string is specified based on a character size and thickness, and is recognized placing priority on numerical recognition.

Description

本発明は、会計事務所の顧問先企業などで会計ソフトを利用する場合や、会計事務所で仕訳入力を行う場合に用いられる会計処理システムにおいて、仕訳入力の元となる領収書や通帳等の原始証憑類からのデータ入力を支援する技術に関する。 In the accounting processing system used when accounting software is used in a consulting company of an accounting office or when journal entry is input at an accounting office, such as a receipt, a passbook, etc. The present invention relates to a technology for supporting data input from primaries.

従来、公認会計士事務所や税理士事務所（以下単に「会計事務所」または「事務所」と称す。）では、顧問先から種々の形式で会計処理の元となるデータや原始証憑類を受け取って当該顧問先の会計処理を行っている。近年、パソコンの普及により、顧問先から会計事務所へ提出される基礎資料は電子媒体である場合が多くなっている。 Conventionally, certified public accountant offices and tax accountant offices (hereinafter simply referred to as “accounting offices” or “offices”) receive data and primordial certificates from the consultant in various formats. Accounting for the advisor. In recent years, with the spread of personal computers, basic materials submitted from consultants to accounting firms are often electronic media.

顧問先から会計事務所へ電子媒体（データ）で基礎資料を提出する場合、一般的に顧問先側で会計ソフト（出納帳形式等）を使用して取引のデータを入力し、データのままネットワークを介して会計事務所に送信するか、メモリカードなどの記憶媒体に格納して会計事務所に渡している。 When submitting basic materials from an advisor to an accounting firm in electronic media (data), generally enter the transaction data using accounting software (such as a cash register form) at the advisor, and keep the data in the network. To the accounting office or stored in a storage medium such as a memory card and passed to the accounting office.

このような会計ソフトへのデータ入力は通常、取引の際に発生した領収書や通帳等の原始証憑類をユーザーが目視しながら、キーボードから入力される。データ入力に際し、キーボードからの入力作業を軽減し、データ入力ミスの削減を図るような技術がある（例えば、特許文献１）。 Data input to such accounting software is usually input from the keyboard while the user visually observes the original certificate such as a receipt or a passbook generated during the transaction. When inputting data, there is a technique for reducing input work from a keyboard and reducing data input mistakes (for example, Patent Document 1).

また、例えばパソコン操作が不慣れで会計ソフトへのデータ入力が困難な顧問先においては、従前からプレプリントされた定型伝票へ手書きにて文字を記入し、記入済みとなった伝票を会計事務所へ提出して、会計事務所にて伝票をＯＣＲ処理して、データ入力を効率化する方法がある（例えば、特許文献２）。 In addition, for example, in the case of a consultant who is not familiar with personal computer operations and difficult to input data into accounting software, he / she enters handwritten characters on a standard form preprinted in the past, and the completed form is sent to the accounting office. There is a method of submitting and OCR processing a slip at an accounting office to improve data input efficiency (for example, Patent Document 2).

また、取引の際に発生した領収書や通帳等の原始証憑類をスキャナ等の光学装置によりイメージとして取得し、イメージから数字等を切り出し、辞書等との判定結果の確からしさを予め定めていた閾値と比較して文字認識の精度を高める技術がある（例えば、特許文献３）。 In addition, the original vouchers such as receipts and passbooks generated at the time of transactions were acquired as an image by an optical device such as a scanner, and numbers and the like were extracted from the image to determine the certainty of the determination result with a dictionary or the like. There is a technique for improving the accuracy of character recognition as compared with a threshold (for example, Patent Document 3).

また、取引の際に発生した領収書や通帳等の原始証憑類の金額等をスキャナ等の光学装置で読み取り、その結果をテキストデータに変換した場合に、どの位置にある文字が金額等であるかが不明な為、読み取った帳票の種類を特定することで読み取ったテキストの文字列の意味を特定する技術がある（例えば、特許文献４）。 In addition, when the amount of money such as receipts and passbooks generated during transactions is read with an optical device such as a scanner and the result is converted to text data, the character at which position is the amount Since it is unknown, there is a technique for specifying the meaning of the character string of the read text by specifying the type of the read form (for example, Patent Document 4).

また、取引の際に発生した領収書や通帳等の原始証憑類をスキャナ等の光学装置によりイメージとして取得し、イメージから数字等を切り出し、切り出した数字等の文字と一番近い単語候補を辞書から表示させる技術がある（例えば、特許文献５）。 In addition, the original certificate such as receipts and passbooks generated at the time of transaction is acquired as an image by an optical device such as a scanner, and numbers etc. are cut out from the image, and the word candidates closest to the characters such as the cut out numbers are dictionaryd. (For example, Patent Document 5).

特開平１０−２７５１９６JP 10-275196 A 特開平８−３０７１９JP-A-8-30719 特開平１１−２２４３０５JP-A-11-224305 特開平９−３３０３６３JP 9-330363 A 特開平５−４０８５４JP-A-5-40854

プレプリントされた定型のＯＣＲ読取り専用の伝票を使う場合には、顧問先にて取引の際に発生した領収書や通帳等の原始証憑類をユーザーが目視しながら、手書きで伝票に文字を記入することになる。これは煩雑な作業であり、また誤入力も多くなる問題があった。 When using a pre-printed standard OCR read-only slip, the user enters the letters on the slip by hand while looking at the original certificate such as receipts and passbooks generated at the time of the transaction at the business partner. Will do. This is a cumbersome operation, and there is a problem that many erroneous inputs occur.

領収書（レシート）や通帳等の原始証憑類をスキャナにて読み取り、ＯＣＲ処理して、会計ソフトへデータ出力する場合は、読み取る原始証憑類が定型化されており会計ソフトで対応していれば認識精度が高まりデータ入力作業の軽減が図れるが、領収書（レシート）のフォーマットは発行者が任意に設定できるため定型化されておらず、通帳も金融機関毎に項目の並びが違ったり定型化されていない。このため会計ソフトで対応していない形式のレシートや通帳は読み取り不可能となるか、認識精度が大幅に下落するという問題がある。 When reading original receipts such as receipts or passbooks with a scanner, OCR processing, and outputting data to accounting software, if the original identification documents to be read are standardized and supported by accounting software Although the recognition accuracy increases and the data input work can be reduced, the format of the receipt (receipt) is not standardized because the issuer can set it arbitrarily, and the bankbook also has a different lineup of items depending on the financial institution It has not been. For this reason, there is a problem that a receipt or passbook in a format not supported by the accounting software becomes unreadable or the recognition accuracy is greatly reduced.

また、原始証憑類をスキャナにて読み取らせ、ＯＣＲ処理するためには、原始証憑類のどの部分を認識してデータ出力すべきかの処理が複雑化する。実際、特にレシートはフォーマットが定義されておらず記載内容に意味づけがなされていないため、会計情報として取り込みたいデータがどこにあるのか把握しづらく、したがって必要な情報を抽出しづらい問題がある。処理の複雑化は、誤認識の発生箇所が多くなることの要因となり、ユーザーは誤認識した箇所の修正作業を行う機会が多くなるので、データ入力作業の軽減が図れないという問題がある。 In addition, in order to cause the original certificate to be read by the scanner and to perform the OCR process, the process of identifying which part of the original certificate and outputting the data becomes complicated. In fact, the receipt has a problem that it is difficult to grasp where the data to be taken in as accounting information is located because the format is not defined and the meaning is not given to the description, and therefore, it is difficult to extract necessary information. The complexity of the process is a factor that increases the number of places where misrecognition occurs, and the user has more opportunities to correct the misrecognized places. Therefore, there is a problem that the data input work cannot be reduced.

ところで、原始証憑類から得られる、会計処理に関する情報として重要なのは日付と合計金額であるので、原始証憑類をスキャナにて読取りＯＣＲ処理を行う場合、日付と合計金額の情報を精度良く取得することが重要となる。 By the way, since the date and the total amount are important as information related to the accounting processing obtained from the original certificate, when the original certificate is read by the scanner and the OCR process is performed, the information of the date and the total amount is obtained with high accuracy. Is important.

そこで、本発明は、会計事務所の顧問先企業などで、領収書（レシート）や通帳等の原始証憑類をスキャナにて読み取った画像をＯＣＲ処理して会計ソフトへデータ出力を行う場合の処理について、日付や合計金額などの認識処理を簡素化するとともに、誤認識を少なくすることで、データ入力に際し、キーボードからの入力作業を軽減し、データ入力ミスの削減を図ることを目的とする。 Therefore, the present invention is a process in a case where an image obtained by reading an original certificate such as a receipt or a passbook with a scanner is subjected to OCR processing and output to accounting software at an accounting firm's consulting company or the like. In addition to simplifying the recognition process of date and total amount, etc., the purpose is to reduce input errors from the keyboard and reduce data input mistakes when inputting data by reducing misrecognition.

上記目的を達成するため、本発明は、原始証憑を読取った読取画像から、原始証憑のフォーマットを判断するフォーマット判断手段と、前記読取画像から、文字認識辞書を利用して文字を認識して認識候補のテキストデータを生成する文字認識手段と、を具える会計処理支援システムにおいて、前記文字認識辞書として、漢字等を含む第１の文字認識辞書と、数字認識用の第２の文字認識辞書を具え、
前記画像がレシートである場合に、
前記文字認識手段により、前記第１の文字認識辞書を利用して漢字等を含む文字列としての認識候補のテキストデータに加え、前記第２の文字認識辞書を利用して数字であると仮定した認識候補のテキストデータを生成し、
前記認識候補のテキストデータから日付情報となる日付データ文字列を抽出する日付情報抽出手段と、
前記認識候補のテキストデータから合計金額情報となる合計金額データ文字列を抽出する合計金額情報抽出手段と、を具え、
前記日付情報抽出手段は、前記認識候補のテキストデータから日付情報の特定に関する日付キーワードを含む文字列を会計処理に有意な情報と位置づけて抽出して、当該文字列を日付情報データと推定して、前記推定した文字列を日付情報に係る数字と仮定した場合の認識候補を優先して取り扱う認識候補のテキストデータ最適化を行い、
前記合計金額情報抽出手段は、前記テキストデータから前記合計金額情報の特定に関する合計金額キーワードを含む文字列が存在する場合は、当該文字列と特定の相対的位置関係にある文字列を会計処理に有意な情報と位置づけて抽出して、当該文字列を合計金額情報と推定し、前記合計金額キーワードを含む文字列が前記テキストデータに存在しない場合は、文字サイズまたは文字の太さに基づいて一意の文字列を抽出し、当該抽出した数字を含む文字列を合計金額情報と推定し、
前記推定した文字列を合計金額情報に係る数字と仮定した場合の認識候補を優先して取り扱う認識候補のテキストデータ最適化を行う、ことを特徴とする。 In order to achieve the above object, the present invention recognizes a format determination means for determining the format of a primitive voucher from a read image obtained by reading a primitive voucher, and recognizes and recognizes characters from the read image using a character recognition dictionary. In an accounting processing support system comprising character recognition means for generating candidate text data, a first character recognition dictionary including kanji characters and a second character recognition dictionary for number recognition are used as the character recognition dictionary. Prepared,
If the image is a receipt,
It is assumed that the character recognition means uses the first character recognition dictionary to recognize the number of characters using the second character recognition dictionary in addition to the recognition candidate text data as a character string including kanji and the like. Generate text data of recognition candidates,
Date information extraction means for extracting date data character strings as date information from the text data of the recognition candidates;
A total amount information extracting means for extracting a total amount data character string as total amount information from the text data of the recognition candidate,
The date information extracting means extracts a character string including a date keyword related to specification of date information from the recognition candidate text data as significant information for accounting processing, and estimates the character string as date information data. , Performing text data optimization of recognition candidates that preferentially handle recognition candidates when the estimated character string is assumed to be a number related to date information,
If there is a character string that includes a total amount keyword related to the identification of the total amount information from the text data, the total amount information extraction unit uses the character string in a specific relative positional relationship with the character string for accounting processing. If the character string including the total amount keyword is not present in the text data, it is determined based on the character size or the thickness of the character. Is extracted, and the character string including the extracted number is estimated as the total amount information,
The recognition candidate text data optimization that preferentially treats the recognition candidate when the estimated character string is assumed to be a number related to the total amount information is performed.

一実施例において、前記合計金額情報抽出手段は、前記テキストデータから前記合計金額情報の特定に関する合計金額キーワードを含む文字列が存在する場合は、前記特定の相対的位置関係として右側を設定し、当該文字列の前記合計金額キーワードの右側の数字を合計金額と判断することを特徴とする。 In one embodiment, the total amount information extraction unit sets the right side as the specific relative positional relationship when there is a character string including a total amount keyword related to the specification of the total amount information from the text data, The number on the right side of the total amount keyword of the character string is determined as the total amount.

一実施例において、前記第１の文字認識辞書は、会計処理用の原始証憑で使用される文字に絞り込んだ最低限の定義データを有する認識辞書で構成されることを特徴とする。 In one embodiment, the first character recognition dictionary is constituted by a recognition dictionary having minimum definition data narrowed down to characters used in a primitive voucher for accounting processing.

一実施例において、前記第２の文字認識辞書は、認識対象の文字列が数字であると推定される場合に、数字と仮定した場合の文字認識処理を行う際に、数字以外の文字列を除去して数字の認識率を向上させる認識辞書で構成されることを特徴とする。 In one embodiment, when the second character recognition dictionary is assumed to be a number when the character string to be recognized is assumed to be a number, when performing character recognition processing assuming that the number is a character string, It is composed of a recognition dictionary that is removed to improve the recognition rate of numbers.

一実施例において、前記日付キーワードは、「年」、「月」、「日」、「／」、「．」、「−」のうちの少なくとも一つ以上を含むことを特徴とする。 In one embodiment, the date keyword includes at least one of “year”, “month”, “day”, “/”, “.”, “−”.

一実施例において、前記合計金額キーワードは、「合計」、「現計」、「買上計」のいずれかであることを特徴とする。 In one embodiment, the total amount keyword is one of “total”, “current total”, and “purchase total”.

本発明では、顧問先企業や会計事務所において、領収書（レシート）や通帳等の原始証憑類をスキャナやカメラ機能付モバイル機器等にて読取りＯＣＲ処理して、会計ソフトへデータ出力を行うことで、キーボードからの入力作業を軽減し、データ入力ミスの削減を図ることができる。本発明は、特にフリーフォーマットの原始証憑であっても、日付や合計金額といった必要な会計データを探し当て、会計ソフトの入力画面に流し込むことができる。 In the present invention, at a client company or an accounting office, a receipt or a passbook is read with a scanner or a mobile device with a camera function, and OCR processing is performed, and data is output to accounting software. Therefore, it is possible to reduce input work from the keyboard and to reduce data input mistakes. The present invention can find necessary accounting data such as a date and a total amount, even for a free format primitive voucher, and flow it into an input screen of accounting software.

また、文字認識処理における辞書データとして、数字のみを認識候補とする辞書データを用いたり、原始証憑で使用される文字に絞り込んだ辞書データを用いたりすることにより、処理速度や認識精度の向上を達成することができる。 Also, as dictionary data in character recognition processing, using dictionary data with only numbers as recognition candidates, or using dictionary data narrowed down to characters used in primitive vouchers, the processing speed and recognition accuracy can be improved. Can be achieved.

図１は、本発明にかかる会計データ入力支援システムの全体概要を説明する概略図である。FIG. 1 is a schematic diagram illustrating an overall outline of an accounting data input support system according to the present invention. 図２は、原始証憑がレシートである場合のシステムの日付情報抽出処理を説明するための図である。FIG. 2 is a diagram for explaining date information extraction processing of the system when the original voucher is a receipt. 図３は、原始証憑がレシートである場合のシステムの合計金額情報抽出処理を説明するための図である。FIG. 3 is a diagram for explaining the total amount information extraction process of the system when the original voucher is a receipt. 図４は、原始証憑が預金通帳である場合のシステムの処理を説明するための図である。FIG. 4 is a diagram for explaining the processing of the system when the original voucher is a bankbook. 図５は、会計データ入力支援システムのシステム構成とそのバリエーションを示す図である。FIG. 5 is a diagram showing a system configuration of the accounting data input support system and its variations. 図６は、本発明にかかる方法の全体処理を説明するためのフロー図である。FIG. 6 is a flowchart for explaining the overall processing of the method according to the present invention. 図７は、図６のステップＳ０６−０４をより詳細に説明するためのフロー図である。FIG. 7 is a flowchart for explaining step S06-04 in FIG. 6 in more detail. 図８は、レシートや領収書の日付情報の文字認識を行う場合の、図６の後処理（Ｓ０６−０５〜０６）を説明するためのフロー図である。FIG. 8 is a flowchart for explaining the post-processing (S06-05 to 06) of FIG. 6 in the case of performing character recognition of date information on receipts and receipts. 図９は、レシートや領収書の合計金額情報の文字認識を行う場合の、図６の後処理（Ｓ０６−０５〜０６）を説明するためのフロー図である。FIG. 9 is a flowchart for explaining the post-processing (S06-05 to 06) of FIG. 6 when character recognition is performed on the total amount information of receipts and receipts. 図１０は、預金通帳における取引情報の文字認識処理の詳細を説明するフロー図である。FIG. 10 is a flowchart for explaining the details of the character recognition process for transaction information in the bankbook. 図１１は、文字認識辞書の構造例を示す図である。FIG. 11 is a diagram illustrating a structure example of a character recognition dictionary. 図１２は、キーワード辞書の一例を示す図である。FIG. 12 is a diagram illustrating an example of a keyword dictionary. 図１３は、推定処理テーブルの一例を示す図である。FIG. 13 is a diagram illustrating an example of the estimation processing table.

本発明を実施するための形態について、添付の図面を参照しながら以下に詳細に説明する。図１は、本発明にかかる会計データ入力支援システムの全体概要を説明する概略図である。本発明はフリーフォーマットの領収書（レシート）や通帳等の原始証憑１０をスキャナや、被写体を撮影して画像データとして保存することができるデジタルカメラ、デジタルビデオ等（無線機能で画像をアップロードできる機能や、メモリ装置を介する場合を含む）、カメラ付き携帯電話、カメラ付きスマートフォン、カメラ付きモバイル端末、カメラ付きタブレット端末、モバイルカメラ等のモバイル端末（携帯電話、スマートフォン等の携帯型の端末装置、カメラ機能付きタブレット端末等（以下単に「カメラ機能付きモバイル機器」））のカメラ、あるいはその他の原始証憑をデジタル化できる各種の読み取り装置２０で読み取り、画像処理を行ってコンピュータ１００上で動作する会計ソフトに会計データが自動的に反映されるようにするものである。ここで、会計ソフトが対応している伝票形式であれば予めどの部分に何が記載されているかが判明しているため、必要な部分を読み取ってＯＣＲ処理を行い会計データに反映することができるが、定型化されていないフリーフォーマットの原始証憑はこのような処理ができない。そこで本発明は、フリーフォーマットの原始証憑であっても、下側の拡大画面に示すように、日付や合計金額といった必要な会計データを探し当て、会計ソフトの入力画面に流し込めるようにすることを特徴とする。 EMBODIMENT OF THE INVENTION The form for implementing this invention is demonstrated in detail below, referring an accompanying drawing. FIG. 1 is a schematic diagram illustrating an overall outline of an accounting data input support system according to the present invention. The present invention relates to a free format receipt (receipt), a passbook, etc., as a scanner, a digital camera capable of photographing a subject and storing it as image data, a digital video, etc. (function for uploading images by wireless function) Mobile phone with camera, smartphone with camera, mobile terminal with camera, tablet terminal with camera, mobile terminal such as mobile camera (portable terminal device such as mobile phone and smartphone, camera) Accounting software that operates on the computer 100 by reading the image of the camera of a tablet terminal with function (hereinafter simply referred to as “mobile device with camera function”) or other reading device 20 that can digitize the original voucher, and performing image processing The accounting data is automatically reflected in It is intended to be. Since the what is described in advance which parts if slip form accounting software is compatible is known, can be reflected in the accounting data performs OCR process reads the necessary part However, unformatted free format primitive vouchers cannot do this. In view of this, the present invention seeks to search for necessary accounting data such as date and total amount, even in a free format primordial voucher, as shown in the enlarged screen on the lower side, and to flow it into the input screen of accounting software. Features.

先に、図５を用いて、本発明にかかる会計データ入力支援システムのシステム構成について説明する。一実施例では、会計データ入力支援システム１は、顧問先あるいは会計事務所に設置されるコンピューターシステムとして構成される。図５−Ａに示すように、システム１は、原始証憑１０を読み取るためのスキャナまたはカメラ機能付きモバイル機器等の読み取り装置２０と、当該読み取り装置２０と直接あるいはネットワークを介して接続された端末装置１００とを具える。端末装置１００は、キーボードやマウス等の入力部１１０と、ディスプレイ等の表示部１２０と、ＵＳＢポートや可搬型記憶ドライブ等の出力部１３０と、インターネット等のネットワークに接続するための通信部１４０と、端末装置１００の各種制御を司る制御部１５０と、各種データが格納される記憶部１６０とを具える。 First, the system configuration of the accounting data input support system according to the present invention will be described with reference to FIG. In one embodiment, the accounting data input support system 1 is configured as a computer system installed in a consultant or accounting firm. As shown in FIG. 5A, the system 1 includes a reading device 20 such as a scanner for reading the original voucher 10 or a mobile device with a camera function, and a terminal device connected to the reading device 20 directly or via a network. 100 and so on. The terminal device 100 includes an input unit 110 such as a keyboard and a mouse, a display unit 120 such as a display, an output unit 130 such as a USB port and a portable storage drive, and a communication unit 140 for connecting to a network such as the Internet. A control unit 150 that controls various controls of the terminal device 100 and a storage unit 160 that stores various data are provided.

記憶部１６０は少なくとも、原始証憑のイメージデータ格納部１６１と、文字認識辞書１６２と、単語辞書１６３と、キーワード辞書１６４と、取引情報格納部１６５と、会計データ格納部１６６とを具える。制御部１５０の機能としては、読み取り処理部、イメージ処理部、管理部、文字認識処理部、辞書ハンドリング部、認識候補生成部、単語照合部、最適化処理部、取引データ生成部、および仕訳処理部がある。これらの機能は記憶部１６０に格納された会計ソフトあるいは別個のプログラムモジュールを制御部１５０で読み出して展開することにより実現される。なお、本発明の実施例は図５−Ａに示す形態のものに限られず、一部の要素がなくても、また他の要素を具えてもよい。 The storage unit 160 includes at least a primitive voucher image data storage unit 161, a character recognition dictionary 162, a word dictionary 163, a keyword dictionary 164, a transaction information storage unit 165, and an accounting data storage unit 166. The functions of the control unit 150 include a reading processing unit, an image processing unit, a management unit, a character recognition processing unit, a dictionary handling unit, a recognition candidate generation unit, a word matching unit, an optimization processing unit, a transaction data generation unit, and a journal processing. There is a department. These functions are realized by reading out and developing accounting software or a separate program module stored in the storage unit 160 by the control unit 150. The embodiment of the present invention is not limited to the form shown in FIG. 5-A, and some elements may be omitted or other elements may be included.

読み取り処理部は、読み取り装置２０で原始証憑を読み取り、イメージデータ（読取画像）を作成する。この読み取り処理部が破線で示されているのは、原始証憑の読取り自体は、当該端末装置で行うことは必須ではなく、カメラ機能を有する他の装置（カメラ機能付きモバイル機器やデジタルカメラ等）で行ってもよいことによるものである。すなわち、読取処理部（破線囲みで図示）は、端末装置１００内または読取装置２０内、さらにこれとネットワーク接続された他の端末装置１００ａ、端末装置２０、データサーバやＮＡＳ（Network Attached Storage）等にあってもよい。また、当該端末装置１００は、デスクトップ型のコンピュータの他、ノート型のコンピュータ、あるいは、カメラ機能付モバイル機器であってもよい。カメラ機能付モバイル機器の場合は、図１ないし図５−Ａの読取り装置（スキャナ等）は端末装置１００の内部に構成されることになる。 The reading processor reads the original voucher with the reading device 20 and creates image data (read image). This reading processing unit is indicated by a broken line because the reading of the original certificate is not necessarily performed by the terminal device, but other devices having a camera function (mobile devices with a camera function, digital cameras, etc.) It is because it may be performed in. That is, the reading processing unit (illustrated with a broken line) is included in the terminal device 100 or the reading device 20, and another terminal device 100a connected to the network, the terminal device 20, a data server, NAS (Network Attached Storage), etc. May be. In addition to the desktop computer, the terminal device 100 may be a notebook computer or a mobile device with a camera function. In the case of a mobile device with a camera function, the reading device (scanner or the like) shown in FIGS. 1 to 5A is configured inside the terminal device 100.

イメージ処理部はフォーマット判断手段を含み、読み取り処理部で作成されたイメージデータ（読取画像）から、そのイメージデータの証憑タイプが、領収書（レシート）等であるか、あるいは預金通帳であるかの大まかな判定を行ったり、あるいは領収書（レシート）や通帳等の原始証憑の文字列がどこにあるかといった大まかなフォーマットを判定するフォーマット判断手段で判断する。 The image processing unit includes a format judging means, and from the image data (read image) created by the reading processing unit, whether the voucher type of the image data is a receipt (receipt) or the like, or is a bankbook A rough judgment is made, or the format judgment means for judging a rough format such as where the character string of the original voucher such as a receipt or a passbook is located.

管理部は、読み取り処理部で作成されたイメージデータと仕訳処理部で仕訳入力される仕訳データの関連づけを行う。 The management unit associates the image data created by the reading processing unit with the journal data input by the journal processing unit.

文字認識手段は文字認識処理部、辞書ハンドリング部、認識候補生成部、最適化処理部が連携することで実現する。具体的には辞書ハンドリング部が選択した文字認識辞書（漢字等を含む辞書、数字用辞書）を利用して、文字認識処理部が各文字の特徴量を算出して文字認識辞書と比較し、認識候補生成部で比較結果を考慮して認識候補を生成し、辞書ハンドリング部が選択したキーワード辞書を利用して日付や合計に関するキーワードを検出し、最適化処理部が、キーワード等に基づいて所定の推定処理を行い、推定に基づいた最適化処理プログラムにより、認識候補として優先順位を上げて、認識候補を最適化することで、認識候補のテキストデータを調整する。 The character recognition means is realized by cooperation of a character recognition processing unit, a dictionary handling unit, a recognition candidate generation unit, and an optimization processing unit. Specifically, using the character recognition dictionary selected by the dictionary handling unit (a dictionary including kanji, a number dictionary), the character recognition processing unit calculates a feature amount of each character and compares it with the character recognition dictionary. The recognition candidate generation unit generates a recognition candidate in consideration of the comparison result, detects keywords related to dates and totals using the keyword dictionary selected by the dictionary handling unit, and the optimization processing unit performs predetermined processing based on the keywords and the like. The estimation candidate is adjusted by increasing the priority as a recognition candidate and optimizing the recognition candidate by the optimization processing program based on the estimation.

日付情報抽出手段は辞書ハンドリング部、認識候補生成部、最適化処理部が連携することで実現する。具体的には文字認識手段の認識候補生成部が出力した認識候補のテキストデータから最適化処理部が日付情報の特定に関する日付キーワードを含む文字列を日付情報データと推定して、推定した文字列を日付情報に係る数字と仮定した場合の認識候補の優先順位を上げる認識候補のテキストデータ最適化を行う。 The date information extraction means is realized by cooperation of a dictionary handling unit, a recognition candidate generation unit, and an optimization processing unit. Specifically, the optimization processing unit estimates a character string including a date keyword related to specifying date information as date information data from the recognition candidate text data output by the recognition candidate generation unit of the character recognition unit, and the estimated character string , The recognition candidate text data optimization is performed to increase the priority of the recognition candidate.

合計金額情報抽出手段は辞書ハンドリング部、認識候補生成部、単語照合部、最適化処理部が連携することで動作する。具体的には文字認識手段の認識候補生成部が出力した認識候補のテキストデータと辞書ハンドリング部が選択した単語辞書の合計等の単語とを単語照合部が照合し、前記合計金額情報の特定に関する合計金額キーワードを含む文字列が存在する場合は、最適化処理部が当該文字列と特定の相対的位置関係にある文字列を合計金額情報と推定し、前記合計金額キーワードを含む文字列が前記テキストデータに存在しない場合は、最適化処理部が最も文字サイズの大きな数字を含む文字列を合計金額情報と推定し、推定した文字列を合計金額情報に係る数字と仮定した場合の認識候補の優先順位を上げる認識候補のテキストデータ最適化を行う。 The total amount information extraction unit operates by the cooperation of the dictionary handling unit, the recognition candidate generation unit, the word matching unit, and the optimization processing unit. Specifically, the word collation unit collates the recognition candidate text data output from the recognition candidate generation unit of the character recognition unit with words such as the sum of the word dictionary selected by the dictionary handling unit, and relates to the identification of the total amount information. When there is a character string including the total amount keyword, the optimization processing unit estimates a character string having a specific relative positional relationship with the character string as total amount information, and the character string including the total amount keyword is If it does not exist in the text data, the optimization processing unit estimates the character string including the number with the largest character size as the total amount information, and the recognition candidate when the estimated character string is assumed to be the number related to the total amount information. Optimize text data for recognition candidates to increase priority.

仕訳処理部は、例えば図１に示すような２画面式会計入力画面の入力内容から仕訳データを作成し、会計データ格納部へ保存する。 The journal processing unit creates journal data from the input contents of a two-screen accounting input screen as shown in FIG. 1, for example, and stores it in the accounting data storage unit.

取引データ生成部は仕訳処理部で生成された仕訳データから取引情報である摘要を取引情報格納部へ保存する。 The transaction data generation unit saves a summary, which is transaction information, from the journal data generated by the journal processing unit in the transaction information storage unit.

さらに、記憶部１６０のイメージデータ格納部１６１、取引情報格納部１６５、および会計データ格納部１６６は、端末装置１００内またはこれとネットワーク接続されたデータサーバやＮＡＳ（Network Attached Storage）等にあってもよい。このようなシステム構成の変形例を図５−Ｂ、図５−Ｃに示す。図５−Ｂの実施例では、イメージデータ格納部１６１がネットワーク接続されたサーバー装置５０上に配置され、さらに原始証憑を読み取った読み取り装置２０に接続された端末装置１００ａとは異なる端末装置１００ｂで会計データ入力支援処理を行うようにしている。本例ではさらに、サーバー装置５０を設けずに他の端末装置１００ａにイメージデータ格納部１６１が配置され、これをネットワーク接続された端末装置１００ｂで読み出して会計データ入力支援処理を行うようにすることも考えられる。 Further, the image data storage unit 161, the transaction information storage unit 165, and the accounting data storage unit 166 of the storage unit 160 are in a data server or NAS (Network Attached Storage) in the terminal device 100 or connected to the network. Also good. Modification examples of such a system configuration are shown in FIGS. In the embodiment of FIG. 5B, the image data storage unit 161 is arranged on the server device 50 connected to the network, and the terminal device 100b is different from the terminal device 100a connected to the reading device 20 that has read the original voucher. Accounting data input support processing is performed. Further, in this example, the image data storage unit 161 is arranged in another terminal device 100a without providing the server device 50, and this is read out by the terminal device 100b connected to the network to perform the accounting data input support processing. Is also possible.

また、原始証憑が通常顧問先において発生した取引に係るものであることから、端末装置１００ｂ、他の端末装置１００ａは、共に顧問先側の端末装置であることが想定されるが、他の端末装置１００ａは、会計事務所側の端末、もしくは会計事務所の職員が持ち込んだ端末装置であってもよい。会計事務所側のサービスとして、顧問先から原始証憑を預かって読取りを行う場合もあるからである。あるいは、逆に、端末装置１００ｂが会計事務所側の端末装置であり、他の端末装置１００ａが顧問先側の端末装置として原始証憑の読取りを行うように構成しても良い。また、端末装置１００ｂおよび他の端末装置１００ａとも会計事務所側の端末装置として構成されていても良い。 In addition, since the original voucher usually relates to a transaction that has occurred at an advisor, the terminal device 100b and the other terminal device 100a are both assumed to be terminal devices on the advisory side. The device 100a may be a terminal on the accounting office side or a terminal device brought in by a staff of the accounting office. This is because, as a service on the accounting office side, the original voucher may be deposited from the consultant and read. Or, conversely, the terminal device 100b may be a terminal device on the accounting office side, and the other terminal device 100a may be configured to read the original voucher as a terminal device on the advisory side. Moreover, both the terminal device 100b and the other terminal device 100a may be configured as a terminal device on the accounting office side.

図５−Ｃの実施例では、例えばスマートフォンといったカメラ機能付モバイル機器（すなわち読み取り装置２０）から直接ネットワークを介してサーバー装置５０にイメージデータが格納され、これを端末装置１００で読み出して処理が行われる。この場合のサーバー装置５０はＮＡＳやクラウドであってもよい。このように、システム構成は多様であり、特に限定する趣旨ではなく、何らかの形で、原始証憑を読み取ったイメージが存在しさえすれば足りる趣旨である。本発明では、以下、原始証憑を読み取ったイメージがあることを出発点として、原始証憑に関する文字認識処理をどのように工夫したかを中心に説明する。 In the embodiment of FIG. 5C, image data is stored in the server device 50 directly from a mobile device with a camera function such as a smartphone (that is, the reading device 20) via the network, and is read and processed by the terminal device 100. Is called. The server device 50 in this case may be a NAS or a cloud. As described above, the system configuration is diverse and not intended to be particularly limited. It is only necessary that an image obtained by reading the primitive voucher exists in some form. The present invention will be described below with a focus on how the character recognition processing related to the primitive voucher has been devised starting from the fact that there is an image obtained by reading the primitive voucher.

次に、図２〜図３を用いて、第１実施例にかかる、原始証憑がレシートである場合の会計データ入力支援方法を説明する。原始証憑のタイプがレシートであるか預金通帳であるかは、前述のフォーマット判断手段（図６のステップＳ０６−０２のレイアウト解析に相当）が、「領収」、「レシート」等のキーワードが含まれるか（領収書の場合）、あるいは「残高」等のキーワードが含まれるか（預金通帳の場合）の判定処理のほか、文字列が、雑多に並んでいるか（領収書の場合）、あるいは縦方向の列毎に整然と並んでいるか（預金通帳の場合）等を加味して判定するが、一般的な技術を用いることで足りるため、詳細は省略する。図２−Ａの左側に、一般的な飲食店のレシートを示す。このレシートは店ごとに定型ではないが、通常は店名や電話番号などの店情報、日付、個々の注文内容と金額、合計金額、預かり金額、お釣りといった情報が印字される。本発明では、このうち会計データとして重要である日付と合計金額（図では「現計」として表示）に着目する。図２は日付情報に着目した場合の処理の概要を示した図であり、図３は合計金額情報に着目した場合の処理の概要を示した図である。 Next, an accounting data input support method when the original voucher is a receipt according to the first embodiment will be described with reference to FIGS. Whether the type of the original voucher is a receipt or a bankbook includes keywords such as “receipt” and “receipt” by the above-described format determination means (corresponding to the layout analysis in step S06-02 in FIG. 6). (In the case of a receipt) or whether the keyword such as “Balance” is included (in the case of a bankbook), whether the character strings are miscellaneous (in the case of a receipt), or in the vertical direction However, the details are omitted because it is sufficient to use a general technique. The receipt of a general restaurant is shown on the left side of FIG. This receipt is not fixed for each store, but usually stores information such as store information such as store name and telephone number, date, individual order contents and amount, total amount, deposit amount, and change. In the present invention, attention is paid to the date and the total amount (displayed as “current total” in the figure) which are important as accounting data. FIG. 2 is a diagram showing an overview of processing when attention is paid to date information, and FIG. 3 is a diagram showing an overview of processing when attention is paid to total amount information.

まず、図２を参照して、日付情報に着目した場合の処理の概要を説明する。読み取り装置２０で読み取った画像データをＯＣＲ処理し、認識候補の文字列を生成した後、認識候補の文字列の中に、日付に関するキーワード（例えば、年、月、日、／、−、．（ピリオド）、Ｎｏｖなど英語の月表記）がないか検索し（以下同様の処理をキーワード検出処理と呼ぶ）（Ｓ１０１）、日付に関するキーワードがあった場合に、当該キーワードを含む一連の文字列を日付情報であると推定して、この文字列を抽出する（Ｓ１０２）。次に、当該文字列の最適化処理に移行する。 First, with reference to FIG. 2, an outline of processing when attention is paid to date information will be described. The image data read by the reading device 20 is subjected to OCR processing to generate a recognition candidate character string, and then a keyword (for example, year, month, day, /,-,. Period (English month notation such as Nov)) (Similar processing is hereinafter referred to as keyword detection processing) (S101), and when there is a keyword related to date, a series of character strings including the keyword is converted to date This character string is extracted by presuming that it is information (S102). Next, the process proceeds to optimization processing of the character string.

図２−Ａの例で示すと、実際のレシートには「２０１３年９月６日」と記載されており、システムは読み取り画像からステップＳ１０２でこの部分を日付情報であると推定する。しかしながら、レシートの状態や読み取り精度の設定などの要因により、一次的な読み込みでこの部分が「２ＤＪ８年９月Ｑ日」や「×＋−年Ｏか６口」と認識される場合がある。このような場合を考慮して最適化処理が行われる。なお、前文の後者の場合でも、「年」の文字が入っているため日時情報として文字列を抽出することができる。 In the example of FIG. 2A, the actual receipt describes “September 6, 2013”, and the system estimates that this portion is date information from the read image in step S102. However, depending on factors such as receipt status and reading accuracy setting, this part may be recognized as “2DJ8 / 09 / Q” or “× + −year O or 6 units” by primary reading. The optimization process is performed in consideration of such a case. Even in the latter case of the previous sentence, the character string “year” is included, so that the character string can be extracted as the date / time information.

日付情報であると仮定した場合の最適化処理では、日付情報が数字で構成されていることから、当該文字列に含まれる文字がすべて数字であると仮定した場合の認識候補を用いて（Ｓ１０２）、認識候補の最適化処理が行われる（Ｓ１０３）。すなわち、文字認識辞書を例えば図１１の右側に示す数字のみのものに固定し、数字以外の文字が候補に挙がらないようにする。そのうえで画像データから文字毎に複数の特徴点を抽出して特徴量として算出した値を、辞書データの特徴量と比較して、最も類似度の高い数字を決定する。どの数字とも特徴点がある程度一致しない場合は認識エラーとなる。これにより、推定された文字列から正確な数字が抽出され、認識されたテキストデータは「２０１３？９？６？」のようになる（Ｓ１０３の（イ））。 In the optimization process when it is assumed that it is date information, since the date information is composed of numbers, the recognition candidates when it is assumed that all the characters included in the character string are numbers are used (S102). ), Recognition candidate optimization processing is performed (S103). That is, the character recognition dictionary is fixed to, for example, only the numbers shown on the right side of FIG. 11 so that characters other than numbers are not candidates. In addition, a value calculated by extracting a plurality of feature points for each character from the image data and calculating the feature amount is compared with the feature amount of the dictionary data to determine a number having the highest similarity. If none of the numbers match the feature points to some extent, a recognition error occurs. As a result, an accurate number is extracted from the estimated character string, and the recognized text data becomes “2013? 9? 6?" ((B) of S103).

ここで「？」の部分は、年月日等の漢字に相当する部分なので、数字の認識候補として確定できない（類似度の高い候補が存在しないとした）場合を表現しており、この部分に相当する文字は、別途、通常の認識辞書で文字認識した年月日等の文字列の認識候補によって補完して（ステップＳ１０３の（ロ））、「２０１３年９月６日」と日付を完成させるようにしてもよい。このようにして、キーワードの検出に応じて、所定の推定処理を行い、日付等の数字に関する文字列が含まれることを前提とした処理として、仮に数字だと仮定して数字用の認識辞書を用いた文字認識を行って認識候補を生成し、数字だと仮定した場合の、数字らしい認識候補を優先的に取り扱うことで、日付情報の認識の精度を向上させることができる。なお、ステップＳ１０１のキーワード検出は、通常の認識辞書を用いた文字認識により生成した認識候補について、単語照合を行ったものを用いることができる。また、数字と仮定した場合の数字の辞書を用いた認識処理は、日付情報のキーワードを検出する前に、予め行っておいてもよいし、日付情報のキーワードを検出した後に、行うようにしてもよい。 Here, the “?” Part is a part corresponding to a kanji such as year, month, day, etc., and represents a case where it cannot be determined as a number recognition candidate (assuming that there is no candidate with high similarity). Corresponding characters are separately supplemented with character string recognition candidates such as date and time that have been recognized in the normal recognition dictionary (step S103 (b)), and the date "September 6, 2013" is completed. You may make it make it. In this way, a predetermined estimation process is performed according to the detection of the keyword, and as a process on the assumption that a character string related to a number such as a date is included, a recognition dictionary for numbers is assumed assuming that the number is a number. Recognition accuracy of date information can be improved by generating a recognition candidate by performing character recognition used and preferentially handling a recognition candidate that seems to be a number when it is assumed to be a number. In addition, the keyword detection of step S101 can use what performed word collation about the recognition candidate produced | generated by the character recognition using a normal recognition dictionary. In addition, the recognition process using the number dictionary when it is assumed to be a number may be performed in advance before the date information keyword is detected, or may be performed after the date information keyword is detected. Also good.

ここで、数字らしい認識候補を優先的に取り扱う処理としては、上記の例に限られるものではなく、次のような処理を行っても良い。図２−Ｂは、認識候補最適化処理の変形例を説明するための参考図である。特に数字と仮定した上で数字用の認識辞書を用いて文字認識して生成した認識候補を用いずに、通常の認識辞書によって、レシートに対する文字認識処理を行った場合、システムは文字ごとに複数の特徴点を抽出して生成した特徴量を辞書データと比較することにより、特徴点が一致する文字を候補として挙げる。この過程において、図２−Ｂの左側に示すように複数の文字列の認識候補が生じる。 Here, the processing for preferentially handling the recognition candidates that are likely to be numbers is not limited to the above example, and the following processing may be performed. FIG. 2-B is a reference diagram for explaining a modification of the recognition candidate optimization process. In particular, when a character recognition process is performed on a receipt using a normal recognition dictionary without using recognition candidates generated by recognizing characters using a recognition dictionary for numbers on the assumption that the number is a character, the system uses multiple characters for each character. Characters with matching feature points are listed as candidates by comparing feature values generated by extracting feature points with dictionary data. In this process, as shown on the left side of FIG.

次に、日付部分の文字認識において、図２−Ａのように数字であると仮定した場合の認識候補（Ｓ１０３のイ）を用いずに、「数字を最も多く含む候補」の優先順位を上げて最適化するようにしてもよい。より具体的には、図２−Ｂの例では、例えば最初の認識候補として特徴点の一致が最も多い順に「２ＤＪ８年９月Ｑ日」、「×＋−年Ｏか６口」、「２０１３年９か６日」という候補が挙がった場合、従来のシステムでは最も一致の多い「２ＤＪ８年９月Ｑ日」を最終候補として認識するが、本発明ではこの部分が日付情報であると推定しているため、最も数字の多い「２０１３年９か６日」という候補の優先順位を上げて最有力候補とする。 Next, in the character recognition of the date part, the priority of the “candidate with the largest number” is increased without using the recognition candidate (a in S103) when it is assumed to be a number as shown in FIG. May be optimized. More specifically, in the example of FIG. 2-B, for example, “2DJ, September Q day 8”, “XX + year O or 6 units”, “2013” If the candidate "9 or 6 days a year" is listed, the conventional system recognizes "2DJ8 September Q day", which has the highest match, as the final candidate. In the present invention, this part is estimated to be date information. Therefore, the priority of the candidate “9th or 6th 2013” with the largest number is raised to be the most promising candidate.

図３を参照して領収書やレシートの合計金額の読み取りに関する最適化処理について説明する。システムは、図３−Ａの左側に示すレシートのＯＣＲデータから、合計金額に関するキーワードを検索する（Ｓ２０１）。一実施例では合計金額に関するキーワードは、「合計」、「現計」、「買上計」などの文字列を含んで、予めシステム内のキーワード辞書（図１２参照）に登録されているものとする。このようなキーワードが検出された場合、当該キーワードの右側の文字列が、合計金額に関する数字であると推定する（Ｓ２０２）。このとき、慣習上合計金額の文字列は「￥」や「円」などの文字を含むことが多いため、キーワードの右側の文字列がこれらの文字を含むか否かを判断し、含む場合に合計金額と推定してもよい。 With reference to FIG. 3, the optimization process regarding reading of the total amount of receipts and receipts will be described. The system searches for keywords related to the total amount from the OCR data of the receipt shown on the left side of FIG. 3-A (S201). In one embodiment, the keyword related to the total amount includes character strings such as “total”, “current total”, “purchase total”, and is registered in advance in a keyword dictionary (see FIG. 12) in the system. . When such a keyword is detected, it is estimated that the character string on the right side of the keyword is a number related to the total amount (S202). At this time, since the character string of the total amount often includes characters such as “¥” and “yen”, it is determined whether or not the character string on the right side of the keyword includes these characters. The total amount may be estimated.

キーワード辞書に登録されたキーワードと一致する文字列が存在するかを調べるキーワード検出処理においては、予め、認識候補として、一文字単位の認識候補の生成処理を行い、単語辞書を用いて、文字を複数組み合わせた単語レベルでの照合を行うことで、認識候補としての精度を向上させたうえで、認識候補の検索を行う（図６のステップＳ０６−０５参照）。 In the keyword detection process for checking whether there is a character string that matches the keyword registered in the keyword dictionary, a recognition candidate generation process for each character is performed as a recognition candidate in advance, and a plurality of characters are generated using the word dictionary. By performing collation at the combined word level, the accuracy as a recognition candidate is improved, and a recognition candidate is searched (see step S06-05 in FIG. 6).

Ｓ２０２までの処理で、合計金額の数字がすべて明確に読み取れていればよいが、さらに数字に特化した認識候補の最適化を行ってもよい（Ｓ２０３）。すなわち、推定された文字列に対して図１１の右側に示す数字のみの辞書を適用して文字認識を行い、数字の認識精度を上げるようにする。より具体的には、数字と仮定した場合の認識候補を予め生成しておき、あるいは合計金額を示すキーワードを検出した際に、その右側の文字列を金額の数字と仮定した場合の認識候補を生成し、その認識候補を優先的に用いることで認識精度を上げる。図３−Ａの例では、ステップＳ２０３のイ）において、数字であると仮定した場合の認識候補が「？４？２００」となり、このうち数字の文字部分は優先的に候補として利用し、認識精度の低い部分である「？」部分は、漢字等であるとした場合の認識候補で精度の高い部分である数字以外の部分「￥」「，」で置換することで、認識結果の出力として「￥４２００」を得る。さらに、「￥」文字の後ろや「円」文字の前に数字があるかの判定を加えて、当該文字列が金額情報であることの認識精度を向上させることができる。 It is sufficient that all the numbers of the total amount are clearly read in the processing up to S202, but recognition candidates specialized for numbers may be further optimized (S203). That is, character recognition is performed by applying the number-only dictionary shown on the right side of FIG. 11 to the estimated character string so as to increase the number recognition accuracy. More specifically, a recognition candidate when a number is assumed is generated in advance, or when a keyword indicating the total amount is detected, a recognition candidate when a character string on the right side is assumed as a number is shown. The recognition accuracy is increased by generating and using the recognition candidates with priority. In the example of FIG. 3A, in step S203 b), the recognition candidate when it is assumed to be a number is “? 4? 200”, and the character part of the number is preferentially used as a candidate and recognized. The “?” Part, which is a part with low accuracy, is replaced with the part “¥”, “,” other than the number, which is a part that is a recognition candidate when it is a kanji character, etc. Get “¥ 4200”. Further, it is possible to improve the recognition accuracy that the character string is money amount information by determining whether there is a number after the “¥” character or before the “yen” character.

次に、図３−Ｂに示す例のように、レシートに合計金額に関するキーワードが存在しない場合のキーワード検出に代えた代替手段としての検出処理を説明する。本例のように、原始証憑に合計金額に関する文字列が存在しない場合には、画像読み込みとＯＣＲ処理において、個々の文字の大きさを認識するようにする。通常のＯＣＲ処理では画像内の文字をテキスト化するだけで文字の大きさに着目することはないが、例えばレシートでは合計金額が最も大きな数字として表記されることがあるため、本発明では数字の大きさを比較して最も大きな数字を合計金額であると推定する（S２０４）。この場合、店名なども大きく表示されることが多いため、数字のみを比較対象として認識精度を向上させることができる。また、電話番号や日付も大きく表示されることがあるため、これを除外すべく、例えば「数字４桁−数字４桁」といった並びの文字列は電話番号の可能性が高いとして合計金額として推定しないようにしたり、日付キーワード（年、月、日、／など）を含む文字列はフォントサイズが大きくても合計金額として推定しないようにしてもよい。その後、推定された数字の文字列に対して上記と同様の最適化処理を行い、認識結果を出力する。また、本実施例では、最も大きな文字を重要情報として抽出することを中心に記載したが、最も太い文字列や、斜体となっている文字列、あるいはフォントの種類が異なる文字列を重要情報として抽出するようにしてもよい。 Next, as an example shown in FIG. 3B, a detection process as an alternative means in place of keyword detection when a keyword related to the total amount does not exist in the receipt will be described. As in this example, when there is no character string related to the total amount in the original voucher, the size of each character is recognized in image reading and OCR processing. In normal OCR processing, the characters in the image are simply converted into text and the size of the characters is not focused. For example, in the receipt, the total amount may be represented as the largest number. By comparing the sizes, it is estimated that the largest number is the total amount (S204). In this case, since the store name and the like are often displayed in large size, the recognition accuracy can be improved by using only numbers as comparison targets. In addition, since the phone number and date may be displayed in large size, in order to exclude this, for example, a string of characters such as “4 digits-4 digits” is estimated as the total amount assuming that there is a high possibility of a phone number. The character string including the date keyword (year, month, day, /, etc.) may not be estimated as the total amount even if the font size is large. Thereafter, optimization processing similar to that described above is performed on the estimated numeric character string, and a recognition result is output. Also, in this embodiment, the description has focused on extracting the largest character as important information. However, the thickest character string, the italicized character string, or the character string with a different font type is used as important information. You may make it extract.

このようにして検出された日付と合計金額の情報が会計ソフト側に渡され、図１に示す２画面式会計入力画面に自動的に反映される。図１の下側に画面例を示すように、画面左側に原始証憑（この場合はレシート）が表示され、これに含まれる日付情報と合計金額が右側の入力画面に自動的に反映されている。ユーザーは原始証憑を参照して、日付と合計金額が一致しているのを確認したうえで、必要に応じて、自動で入力された年月日の日付情報や合計金額等の金額情報をキーボードやタッチ入力等で修正することができる。特に、日付情報については、領収書等において、発行日と取引が発生した日（購入日など）、ないしポイント有効期限等、様々な日付情報が印刷されている場合もあり、キーボード等で取引日を手入力できるようにするほか、複数の年月日情報の認識候補を表示して選択できるようにしてもよい。あるいは、会社の会計処理によっては、領収書の取引の日付ではなく精算の日付を入力する場合もあるため、その場合には手入力による修正の他、データ入力した日付が自動的に反映されるようにしてもよい。 Information on the date and the total amount detected in this way is passed to the accounting software side and is automatically reflected on the two-screen accounting input screen shown in FIG. As shown in the screen example at the bottom of Fig. 1, a primordial voucher (in this case, a receipt) is displayed on the left side of the screen, and the date information and total amount included in this are automatically reflected on the input screen on the right side. . The user confirms that the date and the total amount are the same by referring to the original certificate, and if necessary, automatically inputs the date information and the amount information such as the total amount, etc. It can be corrected by touch input. In particular, for date information, various date information such as the date of issue and date of transaction (purchase date, etc.) or point expiration date may be printed on the receipt. It is also possible to display and select a plurality of recognition candidates for date information. Alternatively, depending on the company's accounting process, the settlement date may be entered instead of the transaction date of the receipt. In this case, in addition to manual correction, the data entry date is automatically reflected. You may do it.

また、他の情報（店名、誰といたか、目的、支払い方法等）を入力して会計データを作成する。本例では、入力画面は５Ｗ１Ｈ（いつ、どこで、誰と、どのように、何で）に基づく質問形式であり、ユーザーは質問の答えを選択するか直接入力することにより会計データを作成することができる。この質問式伝票の詳細は本発明の特徴部分ではなく、本願出願人の他の出願に詳細に開示されているため本願での説明は省略する。このように作成された会計データがシステムの会計データ格納部１６６に保存される。会計データが顧問先側で作成された場合は、出力部１３０から記憶媒体に納められて会計事務所に渡されたり、通信部１４０からネットワークを介して会計事務所に送られる。 In addition, accounting information is created by inputting other information (store name, who you were with, purpose, payment method, etc.). In this example, the input screen is a question format based on 5W1H (when, where, with whom, how, and what), and the user can create accounting data by directly entering the question answer selection it can. The details of the question-type slip are not characteristic features of the present invention, but are disclosed in detail in other applications of the applicant of the present application, and therefore description thereof is omitted here. The accounting data created in this way is stored in the accounting data storage unit 166 of the system. When the accounting data is created on the side of the advisor, it is stored in a storage medium from the output unit 130 and transferred to the accounting office, or sent from the communication unit 140 to the accounting office via the network.

なお、上記は原始証憑がレストランのレシートである例を用いて説明したが、レシート以外にも金額等が手書きの領収書、請求書、各種税金の納付書などの原始証憑にも本発明を適用することができる。その場合の処理方法は上記と同様である。 In addition, although the above demonstrated using the example that a primitive voucher is a receipt of a restaurant, this invention is applied also to primitive vouchers, such as a receipt, an invoice, and the payment of various taxes other than a receipt. can do. The processing method in that case is the same as described above.

図６〜図９を用いて、原始証憑類からのデータ入力支援方法の詳細について説明する。図６は、本発明にかかる方法の全体処理を説明するためのフロー図である。まず読み取り装置２０で読み取った伝票（原始証憑）のイメージを取り込む（Ｓ０６−０１）。前処理として、イメージ内から文字列を切り出し（Ｓ０６−０２）、次に一文字単位での切り出しを行う（Ｓ０６−０３）。文字列の切り出しでは、イメージ内に文字列（すなわちドットの固まり）がどこにあるかが判断され、一文字単位の切り出しでは全角／半角等のピッチ判定が行われる。この場合に用いられる切出しの手法としては、文字列を構成する微細なドットの固まりをＸ軸側およびＹ軸側からサンプリングしてヒストグラム化して、ドットの固まりの存在箇所を把握する射影手法が知られている。なお、この前処理において、レシートであるか通帳であるか等の大枠の原稿タイプ判定を行ってもよい（図７のステップＳ０７−０１参照）。 The details of the data input support method from the primitive voucher will be described with reference to FIGS. FIG. 6 is a flowchart for explaining the overall processing of the method according to the present invention. First, an image of a slip (primary voucher) read by the reading device 20 is captured (S06-01). As preprocessing, a character string is cut out from the image (S06-02), and then cut out in character units (S06-03). In the cutout of the character string, it is determined where the character string (that is, a cluster of dots) is in the image, and in the cutout of one character unit, pitch determination such as full-width / half-width is performed. As a cutting method used in this case, a projection method is known in which fine dot clusters constituting a character string are sampled from the X-axis side and the Y-axis side to form a histogram to grasp the location where the dot clusters exist. It has been. Note that in this pre-processing, it may be possible to determine a document type of a large frame such as a receipt or a passbook (see step S07-01 in FIG. 7).

次に、イメージに含まれる文字列について、認識処理として、１文字単位の文字認識を行う（Ｓ０６−０４）。この処理は、文字認識処理と、辞書ハンドリングと、認識候補生成により実現される。この場合、前述したように、通常モードでの文字認識（漢字等を含む認識辞書を用いた認識候補生成）に加え、必要に応じて文字列が数字であると仮定した文字認識を行うべく、数字のみの辞書を適用して、数字と仮定した認識候補を生成するようにすると、認識精度を向上させることができる。なお、前述のように、数字と仮定した場合の認識候補の生成処理は、日付情報や（合計）金額情報であると推定した場合において、その都度実行しても良いし、予め、全ての文字列について実行するようにしても良い。 Next, character recognition is performed for each character string included in the image as recognition processing (S06-04). This processing is realized by character recognition processing, dictionary handling, and recognition candidate generation. In this case, as described above, in addition to character recognition in the normal mode (recognition candidate generation using a recognition dictionary including kanji and the like), in order to perform character recognition assuming that the character string is a number as necessary, If a number-only dictionary is applied to generate recognition candidates that are assumed to be numbers, the recognition accuracy can be improved. Note that, as described above, the recognition candidate generation process when it is assumed to be a number may be executed each time when it is estimated that the information is date information or (total) amount information, or in advance, all characters You may make it perform about a column.

次に、１文字単位で認識された各文字について単語レベルでの単語照合が行われ、「合計」等のキーワードに対応する文字列の認識候補としての精度を向上させる処理として、認識候補の調整が行われる（Ｓ０６−０５）。月日や合計金額のキーワード検索によって、あるいは証憑タイプが「領収書」等であることが判明しているにも拘わらず、「合計」等の文字列が見つからない場合には、最も大きな数字列の特定を行い、注目する文字列の絞り込みが行われる（ステップＳ０６−０６の（１）丸数字１および（２）丸数字１）（例えば、日付情報に関しては図２−ＡのステップＳ１０１、合計金額情報に関しては図３−ＡのステップＳ２０１あるいは図３−Ｂのキーワード検出代替処理に相当）。この処理により、日付あるいは合計金額といった注目すべき文字列が決定する。 Next, word recognition is performed at the word level for each character recognized in units of characters, and recognition candidate adjustment is performed as a process for improving accuracy as a recognition candidate of a character string corresponding to a keyword such as “total”. Is performed (S06-05). If a character string such as “Total” is not found even though it is known that the voucher type is “Receipt”, etc., by the keyword search for the month and day, or the total amount, the largest numeric string (1) circle number 1 and (2) circle number 1 in step S06-06 (for example, for date information, step S101 in FIG. 2-A, total) The money amount information corresponds to step S201 in FIG. 3-A or keyword detection / substitution processing in FIG. 3-B). Through this process, a notable character string such as a date or a total amount is determined.

そして、注目すべき文字列に対して最適化処理が行われる（Ｓ０６−０６の（１）の丸数字２〜３および（２）の丸数字２〜３）。すなわち、キーワード検出により日付や合計金額と推定する文字列について、キーワードがある場合には数字であると仮定した最適化処理を行い、キーワードがない場合には例えば最も大きな数字の文字列に着目し、これを数字と仮定した最適化処理を行う。このようにして最適な認識候補が決定されて出力される（Ｓ０６−０７）。 Then, optimization processing is performed on the character string to be noted (circle numbers 2-3 in (1) and circle numbers 2-3 in (2) in S06-06). That is, for a character string that is estimated as a date or total amount by keyword detection, if there is a keyword, an optimization process assuming that it is a number is performed, and if there is no keyword, focus on the character string with the largest number, for example. The optimization process is performed assuming that this is a number. In this way, the optimum recognition candidate is determined and output (S06-07).

図７は、図６のステップＳ０６−０４をより詳細に説明するための図である。最初に読み取りイメージの原稿タイプの判定が行われる（Ｓ０７−０１）。これは、例えば会計ソフトで対応しているＯＣＲ伝票のフォーマットが予めシステムに登録されており、画像内の識別子を読み込むことによりどの伝票フォーマットか判定したり、画像の大きさや枠の配置から預金通帳であると判定したり、その他のフォーマットの特定できない伝票（原始証憑）かを判定したりすることにより行われる。このように原稿タイプを自動的に判定するようにしてもよいし、あるいは通帳かレシートか領収書か等の選択画面を提示して、ユーザーが証憑イメージを見て入力してもよい。システムで対応しているＯＣＲ伝票である場合（Ｓ０７−０２：Ｙ）、どこに何が記載されるかが予め判明しているため、システムは必要な部分の情報を読み取って会計入力画面に自動的に反映させる（Ｓ０７−１２）。この処理自体は公知技術であり、詳細な説明は本明細書においては省略する。 FIG. 7 is a diagram for explaining step S06-04 in FIG. 6 in more detail. First, the document type of the read image is determined (S07-01). This is because, for example, the format of an OCR slip that is supported by accounting software is registered in the system in advance, and it is possible to determine which slip format by reading the identifier in the image, or to check the passbook from the size of the image and the layout of the frame. It is determined by determining whether it is a slip (primary voucher) that cannot be specified in other formats. In this way, the document type may be automatically determined, or the user may input by looking at the voucher image by presenting a selection screen such as a passbook, a receipt or a receipt. If it is an OCR slip that is supported by the system (S07-02: Y), it is already known where it will be written, so the system automatically reads the necessary information and automatically displays the accounting input screen. (S07-12). This processing itself is a known technique, and detailed description thereof is omitted in this specification.

読み取りイメージが、フォーマットの特定できない伝票（原始証憑）である場合、まず漢字等を含む辞書による文字認識が行われる（Ｓ０７−０３〜０７）。図１１に、文字認識辞書の構造例を示す。図１１において左側が漢字等を含む文字認識辞書の登録内容を示し、右側が数字用の文字認識辞書の登録内容を示す。図１１に示すように、各文字について文字コード、特徴量などが登録されている。数字用の文字認識辞書は、数字のみ、あるいは数字と数学記号のみが登録され、漢字や仮名等は登録されていないため情報量が少なく、これを用いる場合の処理が高速化するとともに、数字以外の候補を示さないため数字について誤認識を有効に回避することができる。一方、漢字等を含む文字認識辞書は一般的なＯＣＲ辞書でもよいが、本発明は会計データの入力支援にかかるものであるため、特に原始証憑で用いられる可能性のある文字のみに特化して構成されていてもよい。これにより漢字等を含む辞書を用いる場合でも処理の高速化と認識率の向上を図ることができる。 When the read image is a slip (primary certificate) whose format cannot be specified, first, character recognition is performed using a dictionary including kanji and the like (S07-03 to 07). FIG. 11 shows an example of the structure of the character recognition dictionary. In FIG. 11, the left side shows the registered contents of the character recognition dictionary including kanji and the right side shows the registered contents of the character recognition dictionary for numbers. As shown in FIG. 11, a character code, a feature amount, and the like are registered for each character. The character recognition dictionary for numbers contains only numbers, or only numbers and mathematical symbols. Kanji and kana are not registered, so the amount of information is small. In this case, it is possible to effectively avoid misrecognition of numbers. On the other hand, a character recognition dictionary including kanji characters may be a general OCR dictionary, but the present invention is related to accounting data input support, and thus specializes only in characters that are likely to be used in primitive vouchers. It may be configured. As a result, even when a dictionary including kanji and the like is used, the processing speed can be increased and the recognition rate can be improved.

図７に戻ると、ステップＳ０７−０３でフォーマットの特定できない伝票（原始証憑）用の文字認識辞書（すなわち、図１１の辞書）を取得し、最初に漢字等を含む文字認識辞書をセットして（Ｓ０７−０４）、文字認識処理を行う（Ｓ０７−０５〜０７）。次に、数字と仮定した場合の文字認識を行うべく、数字用の文字認識辞書をセットし（Ｓ０７−０８）、文字認識辞書を行う（Ｓ０７−０９〜１１）。このようにして原始証憑イメージの文字認識が行われる。数字と仮定した認識候補を生成する場合、数字に関しては認識候補を精度良く生成できるので、日付情報を含む文字列や、合計等の文字列の右側ないし最も大きな数字列を合計金額の数字と推定し、数字と仮定した場合の認識候補をあえて生成した上で、優先的に取り扱うことで認識精度の向上を図ることができる。なお、前述のように、数字と仮定した場合の認識候補の生成処理は、日付情報や（合計）金額情報であると推定した場合において、その都度実行しても良いし、予め、全ての文字列について実行するようにしても良い。 Returning to FIG. 7, in step S07-03, a character recognition dictionary (namely, the dictionary of FIG. 11) for a slip whose format cannot be specified (that is, the dictionary of FIG. 11) is acquired, and first a character recognition dictionary including kanji is set. (S07-04), character recognition processing is performed (S07-05-07). Next, in order to perform character recognition when it is assumed to be a number, a character recognition dictionary for numbers is set (S07-08), and a character recognition dictionary is performed (S07-09 to 11). In this way, the character recognition of the original voucher image is performed. When generating recognition candidates that are assumed to be numbers, it is possible to generate recognition candidates with accuracy, so the character string including date information, the right side of the character string such as the sum or the largest number string is estimated as the total amount. However, it is possible to improve recognition accuracy by preferentially handling a recognition candidate when it is assumed that the number is assumed and then preferentially handling it. Note that, as described above, the recognition candidate generation process when it is assumed to be a number may be executed each time when it is estimated that the information is date information or (total) amount information, or in advance, all characters You may make it perform about a column.

図８は、レシートや領収書の日付情報の文字認識を行う場合の、図６の後処理（Ｓ０６−０５〜０６）を説明するためのフロー図である。前述したように、システムには予め会計データにおける重要情報（例えば日付や合計金額）を抽出するためのキーワードが登録されている。このキーワード辞書の一例を図１２に示す。本図に示すように、例えば合計金額を示すキーワードとして、「合計」、「現計」、「買上計」、「領収金額」、「支払」・・・といったキーワードが登録され、日付に関するキーワードとして「日、月、年」、「／」、「−」、「．」・・・といったキーワードが登録されている。図１２に示されているように、キーワードには第１、第２、・・・といった優先順位が示されているが、これは例えば日付情報に関し、レシートには発行日と取引日のほかポイントの有効期限等の複数の日付情報を含み、どれが会計情報として有意な情報であるかは、文字認識の観点からは難しい面があり（必ずしも発行日とか取引日と明示されているわけではないため）、レシートを統計的に解析して、上記のような優先順位で抽出した文字列を取引日として取り扱うことで、およそ取引日である可能性が高いことを背景として、取引日の日付情報として用いることができる。同様に、合計等のキーワードにおいても、優先順位が設けられ、適宜重み付け等して、合計金額の数字である可能性が高い文字列を効率的に判別することに利用される。システムは日付に関するキーワード辞書をセットし（Ｓ０８−０１）、文字認識を行った原始証憑に対してキーワード検索を行う（Ｓ０８−０２）。ここでキーワードがヒットした場合、すなわち例えばある文字列で「月、日」などの文字が検出された場合（Ｓ０８−０３：Ｙ）、キーワードに応じた推定処理（図２ステップＳ１０２）を行う（Ｓ０８−０４）。 FIG. 8 is a flowchart for explaining the post-processing (S06-05 to 06) of FIG. 6 in the case of performing character recognition of date information on receipts and receipts. As described above, keywords for extracting important information (for example, date and total amount) in accounting data are registered in the system in advance. An example of this keyword dictionary is shown in FIG. As shown in this figure, keywords such as “total”, “current total”, “purchase total”, “receipt amount”, “payment”, etc. are registered as keywords indicating total amounts, Keywords such as “day, month, year”, “/”, “−”, “.”... Are registered. As shown in FIG. 12, the priorities such as first, second,... Are indicated for the keywords. This is related to, for example, date information, and the receipt has points in addition to the issue date and transaction date. It is difficult from the viewpoint of character recognition to determine which information is significant as accounting information, including multiple date information such as expiry date of the date (it is not necessarily clearly stated as issue date or transaction date) ), Statistically analyzing receipts and handling character strings extracted in the above priority order as transaction dates. Can be used as Similarly, priorities are set for keywords such as totals, and are used for efficiently discriminating character strings that are likely to be numbers of total amounts by appropriately weighting them. The system sets a keyword dictionary related to the date (S08-01), and performs a keyword search for the original voucher for which character recognition has been performed (S08-02). When the keyword is hit here, that is, for example, when a character such as “month, day” is detected in a certain character string (S08-03: Y), an estimation process according to the keyword (step S102 in FIG. 2) is performed (step S102 in FIG. 2). S08-04).

ステップＳ０８−０４の推定処理では、一実施例では、抽出されたキーワードに応じて、例えば図１３に示す推定処理テーブルの処理が行われる。すなわち、証憑のタイプ毎に、認識対象毎に、推定処理が動的に選択され、選択された推定処理に連携する最適化処理が動的に選択されるように構成される。より具体的には、図１３に示すように、例えば領収書の日付情報に関するキーワードが抽出された場合、最適化処理プログラムＩＤ：Ｐ００２が選択され、「年月日以外の部分を数字と推定」して、図２に関して説明したような最適化処理が行われる。また、この場合に「年」や「月」、「日」の直前の文字が数字の認識候補を含むかどうかも加味することで、推定の精度を向上させるようにしてもよい。そして、キーワードに応じて、図２に関して説明した最適化処理を行う（Ｓ０８−０５）。すなわち、図２−Ａに示すように、数字であるとした場合の認識候補と、漢字等である場合の認識候補の双方を勘案し、数字と推測される部分の認識候補は「数字であるとした場合の認識候補」を採用することで、認識精度を向上させたり、図２−Ｂに示すように、数字であるとした場合の認識候補を用いずに、通常の辞書での文字認識を行った認識候補のうち、数字を多く含む候補の優先順位を上げるようにしてもよい。さらに、例えば「年」の前は４桁（または下２桁）、「月、日」の前は２桁（または下１桁）の数字があるということを前提として、その他の認識候補を除外したり、当該文字列を日付情報ではないものと判定するようにしてもよい。さらに、会計事務所や顧問先で仕訳入力を行おうとする原始証憑は通常近い過去のことであるため、認識される月日が一定の日付（例えば５年前以降）の範囲内にあるかを判断して、判定の確からしさや抽出処理の確からしさを判定してもよい。なお、ステップＳ０８−０３の日付情報に関するキーワードが見つからない場合は、図１に示す２画面式会計入力画面でユーザー自身が入力するようにしてもよい。 In the estimation process in step S08-04, in one embodiment, for example, the process of the estimation process table shown in FIG. 13 is performed according to the extracted keyword. That is, for each type of voucher, an estimation process is dynamically selected for each recognition target, and an optimization process linked to the selected estimation process is dynamically selected. More specifically, as shown in FIG. 13, for example, when a keyword related to the date information of the receipt is extracted, the optimization processing program ID: P002 is selected and “estimates the part other than the date as a number” Then, the optimization process as described with reference to FIG. 2 is performed. In this case, the accuracy of estimation may be improved by taking into consideration whether or not the character immediately before “year”, “month”, and “day” includes a number recognition candidate. Then, the optimization process described with reference to FIG. 2 is performed according to the keyword (S08-05). That is, as shown in FIG. 2-A, the recognition candidate for the part that is assumed to be a number is “numbers” in consideration of both the recognition candidates for numbers and the recognition candidates for kanji characters. Character recognition in a normal dictionary without using recognition candidates in the case of numbers, as shown in FIG. 2-B. Among the recognition candidates that have been subjected to the above, the priority order of candidates that include a large number may be increased. Furthermore, excluding other recognition candidates, assuming that there are 4 digits (or last 2 digits) before "Year" and 2 digits (or 1 last digit) before "Month, Day". Alternatively, it may be determined that the character string is not date information. In addition, since the primordial voucher for which journal entries are entered at accounting firms or consultants is usually in the near past, it is necessary to check whether the recognized date falls within a certain range (for example, five years ago or later). Judgment may be made to determine the certainty of determination and the certainty of extraction processing. In addition, when the keyword regarding the date information of step S08-03 is not found, the user himself / herself may be input on the two-screen accounting input screen shown in FIG.

図９は、レシートや領収書の合計金額情報の文字認識を行う場合の、図６の後処理（Ｓ０６−０５〜０６）を説明するためのフロー図である。図８と同様にキーワード類型をセットし、キーワード辞書（図１２）を参照してキーワード検出を行う（Ｓ０９−０１〜０３）。例えば「現計」、「合計」などの合計金額に関するキーワードがあった場合、当該キーワードの右側の文字列（数字列）を合計金額とするといったキーワードに応じた推定処理、最適化処理を実行し（Ｓ０９−０４〜０５）、文字列単位の認識候補を生成する（Ｓ０９−０６）。一方、キーワード検索により合計金額情報に関するキーワードが見あたらない場合（Ｓ０９−０３：Ｎ）、代替処理としてキーワードに代わる重要情報の検出を行う（Ｓ０９−０８）。すなわち、図１３に示すように領収書の合計金額情報として、キーワードが無い場合、最も大きな数字列を合計金額の数字と推定する（Ｓ０９−０９、最適化処理プログラムＩＤ：Ｐ０１１）。より詳細には、原始証憑イメージに含まれる文字の大きさを１文字単位で比較したり、認識した文字列群を切り出した単位で大きさを比較して大きい文字列群を残し、最後に残った文字列群を抽出したり、文字列の全体の平均値を算出して大きい文字の固まりをいくつか抽出して比較したりすることにより、最も大きな数字列を索出する。このとき、合計金額は数字であるため、数字の文字列のみに絞って検出するようにすると、店名などを除外することができる。さらに、大きな文字列が１箇所しか発見されなかった場合は、その文字列を合計金額情報であると推定したり、例えば大きな文字列が３箇所発見された場合には、中央より右側にある文字列や、最も上段の位置にある文字列を合計金額情報と推定するようにしてもよい。これは、レシートにおいて比較的大きく表示される数字として、合計金額、預かり金額、お釣りがあることが想定され、このうち合計金額は最も上に記載されることが多いことを考慮したものである。このようにして合計金額情報を推定したら、認識候補の最適化を行って（Ｓ０９−１０）、文字列単位の認識候補を生成する（Ｓ０９−０６）。なお、上記の説明では、文字列の大きさ（文字サイズ）を重要情報として把握した例を記載したが、文字の太さを（単独で、あるいは文字の大きさに重畳して）考慮してもよい。 FIG. 9 is a flowchart for explaining the post-processing (S06-05 to 06) of FIG. 6 when character recognition is performed on the total amount information of receipts and receipts. Similar to FIG. 8, keyword types are set, and keyword detection is performed with reference to the keyword dictionary (FIG. 12) (S09-01 to 03). For example, if there are keywords related to the total amount such as “current total” and “total”, the estimation process and optimization process are executed according to the keyword, such as the character string (number string) on the right side of the keyword is used as the total amount. (S09-04 to 05), recognition candidates in character string units are generated (S09-06). On the other hand, when a keyword related to the total amount information is not found by keyword search (S09-03: N), important information replacing the keyword is detected as an alternative process (S09-08). That is, as shown in FIG. 13, when there is no keyword as the total amount information of the receipt, the largest number string is estimated as the number of the total amount (S09-09, optimization processing program ID: P011). More specifically, the size of characters included in the original certificate image is compared in units of one character, or the size is compared in units of cut out recognized character strings, leaving a large character string group, and finally remaining The largest character string is found by extracting a group of character strings or by calculating an average value of the entire character string and extracting and comparing several large character clusters. At this time, since the total amount is a number, it is possible to exclude a store name or the like by detecting only the character string of the number. Furthermore, when only one large character string is found, it is estimated that the character string is total amount information, or when three large character strings are found, for example, the character on the right side from the center A string or a character string at the uppermost position may be estimated as total amount information. This is because it is assumed that there are a total amount, a deposit amount, and a change as numbers that are displayed relatively large in the receipt, and the total amount is often described at the top. When the total amount information is estimated in this way, recognition candidates are optimized (S09-10), and recognition candidates in character string units are generated (S09-06). In the above description, the example in which the size of the character string (character size) is grasped as the important information has been described. However, the thickness of the character (alone or superimposed on the size of the character) is considered. Also good.

また、文字サイズの比較は、切り出した文字について、文字の縦方向の高さのみの勘案、あるいは横方向の幅のみの勘案のほか、高さと幅の両方（面積）を勘案するようにしてもよいし、高さ、幅、面積などのうち、いずれかの要素に重み付けをして判定するようにしてもよい。切り出した文字の高さや幅、面積の判断については、読取りの際の傾きを補正した後のイメージを用いることが通常であるが、補正前のものを用いることができる。なお、言うまでもないことであるが、認識辞書との特徴量の比較を行うために、文字サイズを調整して正規化することがあるが、文字サイズ等の比較は、正規化する前の状態で行う。 In addition, character size comparisons may be based on both the height and width (area) of the cut-out character, considering only the vertical height of the character or only the width in the horizontal direction. Alternatively, the determination may be made by weighting any element of the height, width, area, and the like. Regarding the determination of the height, width, and area of the cut-out character, it is usual to use an image after correcting the inclination at the time of reading, but the image before correction can be used. Needless to say, in order to compare the feature values with the recognition dictionary, the character size may be adjusted and normalized, but the comparison of the character size etc. is in the state before normalization. Do.

次に、図４、図１０を用いて、本発明の第２実施例にかかる、原始証憑が預金通帳である場合の会計データ入力支援方法を説明する。原始証憑が預金通帳の場合にも、基本的には、図６のフローチャートの処理を遂行することになるが、図６のステップＳ０６−０６の処理を詳細化したのが、図１０の処理フローである。預金通帳は、例えば図４−Ａに示すように、取引毎に「日付」、「支払」、「預入」、「残高」などの欄に別れているが、予め印刷された欄名は薄く印字されていて読み取れなかったり、また「支払」と「預入」の欄が金融機関によって逆になっている場合があり、定型の原始証憑として扱いづらい問題がある。そこで本発明は、左端が「日付」、右端が「残高」であると仮定したうえで、これらの間の「支払」と「預入」を自動的に判定して会計データとして取り込めるようにしたことを特徴とする。 Next, an accounting data input support method when the original voucher is a bankbook according to the second embodiment of the present invention will be described with reference to FIGS. Even when the original voucher is a bankbook, the processing of the flowchart of FIG. 6 is basically performed. The processing flow of FIG. 10 is a detailed processing of step S06-06 of FIG. It is. For example, as shown in Fig. 4-A, the bankbook is divided into fields such as "date", "payment", "deposit", and "balance" for each transaction, but the preprinted column names are printed lightly. However, there are cases where the “payment” and “deposit” fields are reversed by the financial institution, which makes it difficult to handle as a standard primordial voucher. Therefore, the present invention assumes that the left end is “date” and the right end is “balance”, and “payment” and “deposit” between them are automatically determined and can be taken in as accounting data. It is characterized by.

図４−Ａに示す預金通帳の読み取り画像において、前提として、例えばＯＣＲ認識処理において、認識候補の文字列について、日付や残高、預入等のキーワードの文字列と一致する文字列があるかどうかの検索を行い、これらのキーワードに合致する文字列が検出されれば、その下に存在する数字の文字列を該当するキーワード（日付、残高、支払、預入など）の内容と把握して、２画面式会計入力画面（図１）の該当箇所に反映させる。一方、証憑タイプが「預金通帳」であることが判明しているにも拘わらず、「残高」等の文字列が、前処理としてのノイズ除去処理や濃度調整で消失したりして、キーワードに合致する文字列が見つからない場合、行単位で文字列を切り出し（Ｓ３０１）、左端の文字列を「日付」、右端の文字列を「残高」であると推定し、中央の２つが「支払」と「預入」であると推定する（Ｓ３０２：図１３の推定処理テーブル参照）。そして、これらの文字列に対し数字としての認識候補の優先順位を上げて、候補を調整する（Ｓ３０３）。調整のための認識候補の最適化処理は、日付情報については図２−ＡのステップＳ１０３、金額情報については図３−ＡのステップＳ２０３などの処理と同様であるため、詳細は省略する。これにより、図４−ＡのＳ３０４に示すように、日付と残高が確定した認識結果が出力されるが、この段階では間の２つの文字列が支払であるか預入であるかを確定できない。 In the passbook reading image shown in FIG. 4-A, as a premise, for example, in the OCR recognition process, whether or not there is a character string that matches a character string of a keyword such as date, balance, deposit, etc. If a character string that matches these keywords is found by performing a search, the number character string below will be understood as the contents of the corresponding keyword (date, balance, payment, deposit, etc.) It is reflected in the corresponding part of the formula accounting input screen (Fig. 1). On the other hand, even though it is known that the voucher type is “passbook”, the character string such as “balance” disappears due to noise removal processing or density adjustment as preprocessing, If no matching character string is found, the character string is cut out in line units (S301), the leftmost character string is estimated to be “date”, the rightmost character string is assumed to be “balance”, and the two at the center are “payment”. And “deposit” (S302: refer to the estimation processing table in FIG. 13). And the priority of the recognition candidate as a number is raised with respect to these character strings, and a candidate is adjusted (S303). The recognition candidate optimization process for adjustment is the same as the process in step S103 in FIG. 2-A for date information and the process in step S203 in FIG. Thereby, as shown in S304 of FIG. 4-A, the recognition result in which the date and the balance are fixed is output, but at this stage, it cannot be determined whether the two character strings in between are payment or deposit.

そこで本発明はさらに、図４−Ｂに概略を示すように、前後の行の残高の増減傾向を判定して（Ｓ４０３）、前の行から残高が増えていればその行の日付と残高の間の数字は預入であり、逆に残高が減っていれば支払であると把握して、その欄の内容を確定させる（Ｓ４０４）。この場合において、実際の増分（図４−ＢのステップＳ４０３の例では３万円）が、その左側の列（右から２列目）の金額「３０，０００」円と合致するかどうかを判定して、検算を加えることで、認識精度を向上させるようにしてもよい。なお、キーワード検出代替処理Ｓ４０１は図４−ＡのステップＳ３０１、推定処理Ｓ４０２は図４−ＡのステップＳ３０２などの処理と同様であるため、詳細は省略する。 Therefore, the present invention further determines the trend of increase / decrease in the balance of the preceding and succeeding lines (S403) as shown schematically in FIG. 4-B, and if the balance has increased from the previous line, the date of the line and the balance The number in the middle is a deposit, and conversely, if the balance is decreasing, it is recognized that the payment is made, and the content of the column is finalized (S404). In this case, it is determined whether or not the actual increment (30,000 yen in the example of step S403 in FIG. 4-B) matches the amount of money “30,000” in the left column (second column from the right). Then, recognition accuracy may be improved by adding verification. The keyword detection / substitution process S401 is the same as the process in step S301 in FIG. 4-A, and the estimation process S402 is the same as the process in step S302 in FIG.

図１０は、この預金通帳における取引情報の文字認識処理の詳細を説明するフロー図である。預金通帳のイメージデータの文字認識において見出しとなるキーワードが見つからない場合、代替処理としてキーワードに代わる重要情報の検出が行われる（Ｓ１０−０１）。例えば左端の文字列に対して「日付」であると推定して、それ以外は「金額」であると推定する（Ｓ１０−０２）。このとき、右端が「残高」で中央の２つがそれぞれ「支払」と「預入」のいずれかであると推定してもよい。次に、各行について重要情報に応じた最適化処理を行って認識候補を生成する（Ｓ１０−０３〜０５）。 FIG. 10 is a flowchart for explaining the details of the character recognition process for transaction information in the bankbook. When a keyword serving as a headline is not found in character recognition of the image data of the passbook, important information replacing the keyword is detected as an alternative process (S10-01). For example, it is estimated that it is “date” with respect to the leftmost character string, and it is estimated that it is “amount” otherwise (S10-02). At this time, it may be estimated that the right end is “balance” and the two in the center are “payment” and “deposit”, respectively. Next, an optimization process according to important information is performed for each row to generate recognition candidates (S10-03 to 05).

次に、各行において右端の文字列を「残高」であると推定したうえで、例えば上が空でない任意の行に注目し、その上下の行の残高の増減傾向を判定し、その内容から支払と預入の列を決定する（Ｓ１０−０６〜０９）。このようにして、預金通帳の見出し欄の内容が確定し、その下方に記載される文字列の意味合いが確定して、システムは取引の内容に応じて日付や金額を自動的に抽出して会計入力画面に反映させることができる。 Next, after estimating that the rightmost character string is “balance” in each line, for example, pay attention to any non-empty line, determine the increasing / decreasing tendency of the balance in the upper and lower lines, and pay from the contents And the deposit line are determined (S10-06-09). In this way, the contents of the header column of the bankbook are confirmed, the meaning of the character string described below is confirmed, and the system automatically extracts the date and amount according to the contents of the transaction and accounts for it. It can be reflected on the input screen.

このように、レシートなどの原始証憑は、枠線などで記載位置・内容が決まっているＯＣＲ伝票と異なり、数字／漢字などの区別が困難で認識率が低いが、会計に関する情報として重要なのは、日付と合計金額の情報なので、これらのキーワードの検出およびキーワードに応じた推定処理（キーワードが見つからない場合はキーワード代替処理を行った上での重要情報の推定処理）を行い、推定処理に応じた最適化処理（認識候補の優先的な取り扱いや、候補としての順位をアップするといった調整処理）を付加することで、日付や合計金額の認識に特化した十分な認識精度を得ることができる。 In this way, the original voucher such as a receipt is different from the OCR slip where the description position and contents are determined by a frame line etc., and it is difficult to distinguish numbers / kanji and the recognition rate is low. Since it is information of date and total amount, these keywords are detected and estimated according to the keywords (if keyword is not found, keyword substitution is performed and important information is estimated) By adding optimization processing (adjustment processing such as preferential handling of recognition candidates and increasing the rank as a candidate), sufficient recognition accuracy specialized in recognition of dates and total amounts can be obtained.

また、あえて、数字であると仮定した文字認識を所定の段階で行う際に、漢字等を除去して数字に特化した数字用の文字認識辞書を活用することは、所定の推定処理を経て、数字らしいことを前提とした上で、数字であると仮定した場合の認識候補を優先的に取り扱う、という一連の処理の中で、会計処理に有意な「取引の年月日や（合計等の）金額情報といった数字」の認識率を向上させる上で有用である。また、レシート等はフォーマットが定義されておらず記載内容に意味づけがなされていないので、会計情報として取込みたいデータがどれなのか把握・抽出しにくいが、会計データとして取り込みたい項目（日付や仕訳に入れる金額（合計金額など））を自動的に抽出することで、入力を容易にすることができる。 In addition, when character recognition that is assumed to be a number is performed at a predetermined stage, it is necessary to remove a kanji character and utilize a number character recognition dictionary specialized for numbers through a predetermined estimation process. In a series of processes that preferentially deal with recognition candidates when it is assumed to be numbers, assuming that it seems to be a number, the “date of transaction” and (total etc. It is useful in improving the recognition rate of “numbers such as monetary information)”. In addition, because the format of receipts etc. is not defined and meaning is not given to the description, it is difficult to grasp and extract what data you want to import as accounting information, but items (dates and journals) you want to import as accounting data By automatically extracting the amount of money (such as the total amount) to be entered, it is possible to facilitate the input.

以上、本発明の数々の実施形態および実施例について詳細に説明したが、本発明の技術的範囲は上記の実施形態ないし実施例に限定されるものではなく、本発明は添付の特許請求の範囲を逸脱することなく様々な変形例、変更例として実現することができ、このような変形例、変更例はすべて本発明の技術的範囲に属すると解されるべきである。 As described above, the embodiments and examples of the present invention have been described in detail. However, the technical scope of the present invention is not limited to the above-described embodiments and examples, and the present invention is limited to the appended claims. Various modifications and changes can be realized without departing from the scope of the invention, and it should be understood that all such modifications and changes belong to the technical scope of the present invention.

本発明は、会計事務所の顧問先企業などで会計ソフトを利用するコンピューターシステムや、会計事務所で仕訳入力を行う場合に用いられる会計処理システムに利用することができる INDUSTRIAL APPLICABILITY The present invention can be used for a computer system that uses accounting software in a consulting company of an accounting office or an accounting processing system that is used when journal entry is input at an accounting office.

１会計データ入力支援システム
１０原始証憑
２０読み取り装置
５０サーバー装置
１００端末装置
１１０入力部
１２０表示部
１３０出力部
１４０通信部
１５０制御部
１６０記憶部
１６１イメージデータ格納部 DESCRIPTION OF SYMBOLS 1 Accounting data input support system 10 Primitive voucher 20 Reading apparatus 50 Server apparatus 100 Terminal apparatus 110 Input part 120 Display part 130 Output part 140 Communication part 150 Control part 160 Storage part 161 Image data storage part

Claims

A format determining unit that determines a format of the original voucher from a read image obtained by reading the original voucher; and a character recognition unit that recognizes a character from the read image using a character recognition dictionary to generate recognition candidate text data. In the accounting processing support system comprising: a first character recognition dictionary including kanji and the like, and a second character recognition dictionary for number recognition, as the character recognition dictionary,
If the image is a receipt,
It is assumed that the character recognition means uses the first character recognition dictionary to recognize the number of characters using the second character recognition dictionary in addition to the recognition candidate text data as a character string including kanji and the like. Generate text data of recognition candidates,
Date information extraction means for extracting date data character strings as date information from the text data of the recognition candidates;
A total amount information extracting means for extracting a total amount data character string as total amount information from the text data of the recognition candidate,
The date information extracting means extracts a character string including a date keyword related to specification of date information from the recognition candidate text data as significant information for accounting processing, and estimates the character string as date information data. , Performing text data optimization of recognition candidates that preferentially handle recognition candidates when the estimated character string is assumed to be a number related to date information,
If there is a character string that includes a total amount keyword related to the identification of the total amount information from the text data, the total amount information extraction unit uses the character string in a specific relative positional relationship with the character string for accounting processing. If the character string including the total amount keyword is not present in the text data, it is determined based on the character size or the thickness of the character. Is extracted, and the character string including the extracted number is estimated as the total amount information,
An accounting data input support system, wherein text data optimization of a recognition candidate that preferentially handles a recognition candidate when the estimated character string is assumed to be a number related to total amount information is performed.

The total amount information extraction means sets the right side as the specific relative positional relationship when the character string including the total amount keyword related to the specification of the total amount information exists from the text data, and the character string 2. The accounting data input support system according to claim 1, wherein a number on the right side of the total amount keyword is determined as a total amount.

The said 1st character recognition dictionary is comprised by the recognition dictionary which has the minimum definition data narrowed down to the character used with the primitive voucher for accounting processing, The said 1 or 2 characterized by the above-mentioned. Accounting data input support system.

The second character recognition dictionary removes character strings other than numbers when performing character recognition processing assuming that the character string to be recognized is a number when the character string to be recognized is a number. The accounting data input support system according to any one of claims 1 to 3, wherein the accounting data input support system comprises a recognition dictionary for improving a recognition rate.

The date keyword includes at least one of "year", "month", "day", "/", ".", "-". The accounting data input support system as described in one.

2. The accounting data input support system according to claim 1, wherein the total amount keyword is any one of “total”, “current total”, and “purchase total”.

A format determining unit that determines a format of the original voucher from a read image obtained by reading the original voucher; and a character recognition unit that recognizes a character from the read image using a character recognition dictionary to generate recognition candidate text data. In an accounting support system comprising
The character recognition dictionary includes a first character recognition dictionary including kanji and the like, and a second character recognition dictionary for number recognition,
If the image is a passbook,
It is assumed that the character recognition means uses the first character recognition dictionary to recognize the number of characters using the second character recognition dictionary in addition to the recognition candidate text data as a character string including kanji and the like. Generate text data of recognition candidates,
Date information extraction means for extracting date data character strings as date information from the text data of the recognition candidates;
A monetary amount information extraction means for extracting monetary amount data character string as monetary amount information from the text data of the recognition candidate,
The date information extraction means extracts character string data in line units from the text data as its constituent elements, further decomposes the character string data into item name units for each column, and generates a passbook data character string related to the date. With passbook date information extraction means,
The amount information extraction means extracts character string data in line units from the text data as its constituent elements, further decomposes the character string data into item name units for each column, and generates a passbook data character string relating to the amount With passbook amount information extraction means,
When the passbook date information extraction unit extracts character string data in units of rows, the character string in the leftmost column is extracted as positionally significant information for accounting processing, and the character string is date information. Estimating and performing text data optimization of recognition candidates that preferentially handle recognition candidates when the estimated character string is assumed to be a number related to date information,
Further, the passbook amount information extracting means extracts the rightmost character string as positionally significant information for accounting processing, estimates the character string as balance information, and uses the estimated character string as balance information. An accounting data input support system for generating the passbook data character string by optimizing text data of a recognition candidate that preferentially handles the recognition candidate when it is assumed to be such a number.

When the passbook amount information extraction unit extracts character string data in line units from the text data, it extracts an arbitrary line including a non-empty line above, the balance information of the extracted arbitrary line, 9. The accounting data input support system according to claim 7, wherein the payment information and the deposit information of the passbook data character string are discriminated by comparing the balance information of the lines.

The passbook amount information extracting means compares the balance information (A) included in the arbitrary row with the balance information (B) included in the upper row, and if (A)> (B), 9. The accounting data input support system according to claim 8, wherein an item next to balance information is determined to be deposit information, and the passbook data character string is generated.

10. The accounting data input support system according to claim 9, wherein the passbook amount information extraction unit generates the passbook data character string by regarding the remaining items of the passbook data character string as payment information.

A format determining unit that determines a format of the original voucher from a read image obtained by reading the original voucher; and a character recognition unit that recognizes a character from the read image using a character recognition dictionary to generate recognition candidate text data. In the accounting data input support method in a computer system comprising the first character recognition dictionary including kanji and the like as the character recognition dictionary and the second character recognition dictionary for number recognition,
If the image is a receipt,
It is assumed that the character recognition means is a number using the second character recognition dictionary in addition to text data of recognition candidates as a character string including kanji and the like using the first character recognition dictionary. Generating text data of recognition candidates;
A date information extraction means, extracting the date data character string as date information from the text data of the recognition candidate as positionally significant information and extracting the character string;
A total amount information extraction means, extracting the total amount data character string as the total amount information from the recognition candidate text data as significant information for accounting processing, and extracting the character string;
When the date information extraction means estimates a character string including a date keyword related to the specification of date information from the text data of the recognition candidate as date information data, and assumes that the estimated character string is a number related to date information The step of optimizing the text data of the recognition candidate to handle the recognition candidate with priority,
When the total amount information extraction unit includes a character string including a total amount keyword related to the specification of the total amount information from the text data, the character string having a specific relative positional relationship with the character string is calculated as the total amount information. When the character string including the total amount keyword is not present in the text data, a unique character string is extracted based on the character size or character thickness, and the character string including the extracted number is totaled. Estimating the amount information, and performing the text data optimization of the recognition candidate preferentially handling the recognition candidate when the estimated character string is assumed to be a number related to the total amount information, Accounting data input support method.

A format determining unit that determines a format of the original voucher from a read image obtained by reading the original voucher; and a character recognition unit that recognizes a character from the read image using a character recognition dictionary to generate recognition candidate text data. In the accounting data input support method in a computer system comprising the first character recognition dictionary including kanji and the like as the character recognition dictionary and the second character recognition dictionary for number recognition,
If the image is a receipt,
It is assumed that the character recognition means is a number using the second character recognition dictionary in addition to text data of recognition candidates as a character string including kanji and the like using the first character recognition dictionary. Generating text data of recognition candidates;
A date information extraction means, extracting the date data character string as date information from the text data of the recognition candidate as positionally significant information and extracting the character string;
A total amount information extraction means, extracting the total amount data character string as the total amount information from the recognition candidate text data as significant information for accounting processing, and extracting the character string;
When the date information extraction means estimates a character string including a date keyword related to the specification of date information from the text data of the recognition candidate as date information data, and assumes that the estimated character string is a number related to date information The step of optimizing the text data of the recognition candidate to handle the recognition candidate with priority,
When the total amount information extraction unit includes a character string including a total amount keyword related to the specification of the total amount information from the text data, the character string having a specific relative positional relationship with the character string is calculated as the total amount information. When the character string including the total amount keyword is not present in the text data, a unique character string is extracted based on the character size or character thickness, and the character string including the extracted number is totaled. Estimating the amount information, and performing the text data optimization of the recognition candidate preferentially handling the recognition candidate when the estimated character string is assumed to be a number related to the total amount information, Accounting data input support program.