JP5962449B2

JP5962449B2 - Determination program, determination method, and determination apparatus

Info

Publication number: JP5962449B2
Application number: JP2012251667A
Authority: JP
Inventors: 毅帯刀; 鉄平 ▲角▼本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-11-15
Filing date: 2012-11-15
Publication date: 2016-08-03
Anticipated expiration: 2032-11-15
Also published as: JP2014099114A

Description

本発明は、判定プログラム、判定方法及び判定装置に関する。 The present invention relates to a determination program, a determination method, and a determination apparatus.

紙文書の電子化が進んでいる。例えば、医療分野においては、電子カルテシステムの導入が進んでいる。電子カルテシステムが新規に導入された場合には、一例として、導入前に業務で生じていた紙文書をスキャンすることによって紙文書が画像データへ電子化される。このように紙文書が電子化された電子文書は、電子カルテシステム上で病院の関係者が手軽に閲覧できるように、患者名、文書名や診療科などのインデックスによって分類がなされる場合がある。 The digitization of paper documents is progressing. For example, in the medical field, the introduction of an electronic medical chart system is progressing. When an electronic medical record system is newly introduced, as an example, the paper document that has been generated in the business before the introduction is scanned, and the paper document is digitized into image data. Electronic documents in which paper documents are digitized in this way may be classified according to indexes such as patient names, document names, and departments so that hospital staff can easily view them on the electronic medical record system. .

かかる電子文書の分類を支援する技術の一例として、スキャンが実行される紙文書のインデックス情報、例えば患者名、文書名や診療科を表すバーコードが印字された紙を紙文書とともにスキャンすることによって電子文書の分類を自動化する技術が知られている。 As an example of a technique for supporting the classification of electronic documents, by scanning together with paper documents, index information of paper documents to be scanned, for example, paper on which a barcode representing patient name, document name, and department is printed. Techniques for automating the classification of electronic documents are known.

特開２００９−１１８７４号公報JP 2009-11874 A 特開２００７−８７０２１号公報JP 2007-87021 A

しかしながら、上記の技術では、紙文書がスキャンされる度にバーコードを作成した上で印字する手間や余分な紙が生じるので、電子文書の分類を効果的に支援するには自ずから限界がある。 However, with the above-described technology, each time a paper document is scanned, a barcode is generated and an extra paper is generated, so that there is a limit to effectively supporting the classification of electronic documents.

１つの側面では、電子文書の分類を効果的に支援できる判定プログラム、判定方法及び判定装置を提供することを目的とする。 An object of one aspect is to provide a determination program, a determination method, and a determination apparatus that can effectively support classification of electronic documents.

一態様の判定プログラムは、複数の表示媒体のそれぞれに表示された表示内容を読み込み、読み込んだ各表示内容について文字認識処理を行う処理をコンピュータに実行させる。さらに、前記コンピュータに、前記複数の表示媒体のうち、連続的に読み込まれていない関係にある第１の表示媒体と第２の表示媒体のそれぞれに共通する所定の位置に対応する文字認識処理の結果が、第１の整合率以上である場合に、該所定の位置に対応する文字認識結果が同じ識別情報を示すと判定する処理を実行させる。さらに、前記コンピュータに、前記複数の表示媒体のうち、連続的に読み込まれた関係にある第３の表示媒体と第４の表示媒体のそれぞれに共通する所定の位置に対応する文字認識処理の結果が、第１の整合率より低い第２の整合率以上である場合に、該所定の位置に対応する文字認識結果が同じ識別情報を示すと判定する処理を実行させる。 The determination program according to one aspect reads a display content displayed on each of a plurality of display media, and causes a computer to execute a process of performing a character recognition process on each read display content. Further, the computer performs character recognition processing corresponding to a predetermined position common to each of the first display medium and the second display medium that are not continuously read from among the plurality of display media. When the result is equal to or higher than the first matching rate, a process of determining that the character recognition result corresponding to the predetermined position indicates the same identification information is executed. Further, the computer recognizes a result of character recognition processing corresponding to a predetermined position common to each of the third display medium and the fourth display medium that are sequentially read from the plurality of display media. Is equal to or higher than the second matching rate lower than the first matching rate, a process of determining that the character recognition result corresponding to the predetermined position indicates the same identification information is executed.

一実施形態によれば、電子文書の分類を効果的に支援できる。 According to an embodiment, classification of electronic documents can be effectively supported.

図１は、実施例１に係る電子カルテシステムの構成を示す図である。FIG. 1 is a diagram illustrating the configuration of the electronic medical record system according to the first embodiment. 図２は、実施例１に係るクライアント端末の機能的構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the client terminal according to the first embodiment. 図３は、キーワードデータの一例を示す図である。FIG. 3 is a diagram illustrating an example of keyword data. 図４は、文書管理マスタの一例を示す図である。FIG. 4 is a diagram illustrating an example of a document management master. 図５は、識別情報の抽出例を示す図である。FIG. 5 is a diagram illustrating an example of extraction of identification information. 図６は、実施例１に係る読込処理の手順を示すフローチャートである。FIG. 6 is a flowchart illustrating the procedure of the reading process according to the first embodiment. 図７は、実施例１に係る判定処理の手順を示すフローチャートである。FIG. 7 is a flowchart illustrating the procedure of the determination process according to the first embodiment. 図８は、実施例１及び実施例２に係る判定プログラムを実行するコンピュータの一例について説明するための図である。FIG. 8 is a schematic diagram illustrating an example of a computer that executes a determination program according to the first and second embodiments.

以下に添付図面を参照して本願に係る判定プログラム、判定方法及び判定装置について説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, a determination program, a determination method, and a determination apparatus according to the present application will be described with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Each embodiment can be appropriately combined within a range in which processing contents are not contradictory.

［システム構成］
図１は、実施例１に係る電子カルテシステムの構成を示す図である。図１に示す電子カルテシステム１では、クライアント端末３０Ａ〜３０Ｃ上で電子カルテの作成、編集および閲覧を実行させる電子カルテサービスがサーバ装置１０によって提供される。このように、電子カルテシステム１では、導入後に作成された電子カルテをサーバ装置１０に管理させる他、導入前には書類、書状や書籍等を表示媒体で管理していた病院の文書が電子化された電子文書についてもサーバ装置１０に管理させる。 [System configuration]
FIG. 1 is a diagram illustrating the configuration of the electronic medical record system according to the first embodiment. In the electronic medical chart system 1 shown in FIG. 1, an electronic medical chart service that allows creation, editing, and browsing of an electronic medical chart on the client terminals 30 </ b> A to 30 </ b> C is provided by the server device 10. As described above, in the electronic medical record system 1, the electronic medical record created after the introduction is managed by the server device 10, and the hospital document that has been managing the document, letter, book, etc. on the display medium before the introduction is digitized. The server device 10 is also made to manage the electronic document that has been processed.

図１に示すように、電子カルテシステム１には、サーバ装置１０と、クライアント端末３０Ａ、３０Ｂ及び３０Ｃとが収容される。以下では、クライアント端末３０Ａ〜３０Ｃの各端末を総称して「クライアント端末３０」と記載する場合がある。なお、図１の例では、３つのクライアント端末を図示したが、電子カルテシステム１が収容可能なクライアント端末の数は図示の数に限定されず、任意の数のクライアント端末を収容できる。 As shown in FIG. 1, the electronic medical chart system 1 accommodates a server device 10 and client terminals 30A, 30B, and 30C. Hereinafter, the terminals of the client terminals 30A to 30C may be collectively referred to as “client terminal 30”. In the example of FIG. 1, three client terminals are illustrated. However, the number of client terminals that can be accommodated in the electronic medical record system 1 is not limited to the illustrated number, and an arbitrary number of client terminals can be accommodated.

これらサーバ装置１０及びクライアント端末３０の間は、ネットワーク５を介して相互に通信可能に接続される。かかるネットワーク５の一例としては、有線または無線を問わず、インターネット（Internet）を始め、ＬＡＮ（Local Area Network）やＶＰＮ（Virtual Private Network）などの任意の通信網を採用できる。 The server device 10 and the client terminal 30 are connected via the network 5 so that they can communicate with each other. As an example of the network 5, any communication network such as the Internet (Internet), LAN (Local Area Network), and VPN (Virtual Private Network) can be adopted regardless of wired or wireless.

クライアント端末３０は、上記の電子カルテサービスを利用する側のコンピュータである。例えば、クライアント端末３０は、電子カルテシステム１へのアカウントを持つ病院の関係者、例えば医師、看護師を始め、大学病院であれば講師、准教授や教授等によって使用される。かかるクライアント端末３０の一例としては、パーソナルコンピュータを始めとする固定端末の他、スマートフォン、携帯電話機、ＰＨＳ（Personal Handyphone System）やＰＤＡ（Personal Digital Assistants）などの携帯端末も採用できる。 The client terminal 30 is a computer that uses the electronic medical record service. For example, the client terminal 30 is used by hospital personnel who have an account for the electronic medical record system 1, such as doctors and nurses, and lecturers, associate professors, and professors in a university hospital. As an example of the client terminal 30, a mobile terminal such as a smartphone, a mobile phone, a PHS (Personal Handyphone System), or a PDA (Personal Digital Assistants) can be adopted in addition to a fixed terminal such as a personal computer.

サーバ装置１０は、上記の電子カルテサービスを提供する側のコンピュータである。かかるサーバ装置１０の一態様としては、パッケージソフトウェアやオンラインソフトウェアとして電子カルテサービスを提供する電子カルテプログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、サーバ装置１０は、上記の電子カルテサービスを提供するＷｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の電子カルテサービスを提供するクラウドとして実装することとしてもかまわない。 The server device 10 is a computer that provides the electronic medical record service. As an aspect of the server device 10, the server device 10 can be implemented by installing an electronic medical record program that provides an electronic medical record service as package software or online software on a desired computer. For example, the server device 10 may be implemented as a Web server that provides the above electronic medical record service, or may be implemented as a cloud that provides the above electronic medical record service by outsourcing.

ここで、病院の紙文書は、例えば、図示しない病院の保管庫等に保管される。例えば、電子カルテシステム１が導入前である場合には、医師が患者から受け取った紹介状、診察内容が記入されたカルテ、検査や手術の同意書などの各種の紙文書がクリアファイル等の包袋で纏められて患者単位に保管される。このうち、紹介状や同意書などの紙文書は、院外の医師の署名や患者の署名が書面に記入されるので、一旦は、電子カルテシステム１の導入後であっても保管庫に保管される。このため、電子カルテシステム１が導入された後であっても紙文書は生じ、電子文書を分類する場面が発生する。上記の紙文書のように、診察室で外来の診察時に発生する紙文書の他、入院時や検査時に発生する紙文書についても、患者単位に包袋で纏めて管理される。 Here, the paper document of the hospital is stored in, for example, a hospital storage or the like (not shown). For example, when the electronic medical record system 1 is not yet introduced, various paper documents such as a letter of introduction received from a patient by a doctor, a medical record in which the contents of examination are filled in, a written consent for examination or surgery, etc. It is put together in a bag and stored in a patient unit. Of these, paper documents such as letters of introduction and consent forms are signed in writing by doctors outside the hospital and patient signatures, so they are once stored in the vault even after the introduction of the electronic medical record system 1. The For this reason, even after the electronic medical record system 1 is introduced, a paper document is generated and a scene for classifying the electronic document occurs. Like the above paper documents, paper documents generated at the time of outpatient examination in the examination room, as well as paper documents generated at the time of hospitalization and examination, are collectively managed in a patient bag.

かかる保管庫で保管された病院の紙文書は、例えば、クライアント端末３０によって電子化される。例えば、クライアント端末３０は、複数の紙をスキャナに読み込ませることによって各々の紙文書のイメージデータを順次取得する。このとき、スキャナは、複数の表示媒体に含まれる各々の表示媒体に表示された表示内容を表示媒体がセットされた順に読み込むことによって表示媒体ごとのイメージデータを生成する。その後、クライアント端末３０は、表示媒体である紙文書のイメージデータに含まれる文字を認識した上で文字コードに変換する文字認識処理、いわゆるＯＣＲ（Optical Character Reader）処理を実行する。このようにＯＣＲ処理が実行されることによって、紙文書に含まれる文字のテキストデータが得られる。その上で、クライアント端末３０は、表示媒体である紙文書に含まれる文字認識結果を用いて、イメージデータ及びテキストデータを含む電子文書を分類するインデックスを抽出した上で各電子文書のインデックスとともにイメージデータ及びテキストデータをサーバ装置１０へアップロードする。 The paper document of the hospital stored in the storage is digitized by the client terminal 30, for example. For example, the client terminal 30 sequentially acquires image data of each paper document by causing a scanner to read a plurality of papers. At this time, the scanner generates image data for each display medium by reading the display contents displayed on the respective display media included in the plurality of display media in the order in which the display media are set. After that, the client terminal 30 executes a character recognition process for recognizing a character included in the image data of the paper document as a display medium and converting it into a character code, a so-called OCR (Optical Character Reader) process. By executing the OCR process in this way, text data of characters included in the paper document is obtained. After that, the client terminal 30 extracts an index for classifying the electronic document including the image data and the text data using the character recognition result included in the paper document as a display medium, and then images the image together with the index of each electronic document. Data and text data are uploaded to the server device 10.

なお、本実施例では、電子文書を患者単位別に分類する場合を想定して以下の説明を行う。また、本実施例では、テキストデータの生成をクライアント端末３０に実行させる場合を例示するが、サーバ装置１０でテキストデータの生成を実行させることとしてもかまわない。また、本実施例では、クライアント端末３０が有するスキャナによって紙文書のイメージデータを読み取る場合を例示したが、スキャナをサーバ装置１０に接続することによってサーバ装置１０側でイメージデータを生成させることもできる。 In the present embodiment, the following description will be given on the assumption that electronic documents are classified by patient unit. In this embodiment, the case where the client terminal 30 is caused to generate text data is illustrated, but the server device 10 may be allowed to generate text data. In this embodiment, the case where the image data of the paper document is read by the scanner included in the client terminal 30 is illustrated. However, the server device 10 can generate image data by connecting the scanner to the server device 10. .

［クライアント端末３０の構成］
続いて、本実施例に係るクライアント端末３０の機能的構成について説明する。図２は、実施例１に係るクライアント端末３０の機能的構成を示すブロック図である。図２に示すように、クライアント端末３０は、スキャナ３１と、通信Ｉ／Ｆ（interface）部３２と、記憶部３３と、制御部３５とを有する。なお、クライアント端末３０は、図２に示した機能部以外にも既知のコンピュータが有する各種の機能部、例えば各種の入出力デバイスなどを有することとしてもかまわない。 [Configuration of Client Terminal 30]
Subsequently, a functional configuration of the client terminal 30 according to the present embodiment will be described. FIG. 2 is a block diagram illustrating a functional configuration of the client terminal 30 according to the first embodiment. As illustrated in FIG. 2, the client terminal 30 includes a scanner 31, a communication I / F (interface) unit 32, a storage unit 33, and a control unit 35. Note that the client terminal 30 may include various functional units included in a known computer, for example, various input / output devices, in addition to the functional units illustrated in FIG.

スキャナ３１は、表示媒体から表示媒体に表示された表示内容を読み込んで画像データへ変換する読込装置である。一態様としては、スキャナ３１は、読込対象とする紙などの表示媒体に光を照射して得られた反射光をＣＣＤ（Charge Coupled Devices）などで読み取ってデジタル信号の画像データに変換する。なお、スキャナ３１の一例としては、原稿である表示媒体を固定して読込装置を動かすタイプのものを採用することもできるし、また、原稿である表示媒体を固定して手動で動かすタイプのものを採用することもできる。 The scanner 31 is a reading device that reads the display content displayed on the display medium from the display medium and converts it into image data. As one aspect, the scanner 31 reads reflected light obtained by irradiating light onto a display medium such as paper to be read with a CCD (Charge Coupled Devices) or the like, and converts it into image data of a digital signal. In addition, as an example of the scanner 31, a type that fixes a display medium that is a manuscript and moves the reading device may be adopted, or a type that manually moves the display medium that is a manuscript and moves it. Can also be adopted.

通信Ｉ／Ｆ部３２は、他の装置、例えばサーバ装置１０や他のクライアント端末３０との間で通信制御を行うインタフェースである。かかる通信Ｉ／Ｆ部３２の一態様としては、ＬＡＮカードなどのネットワークインタフェースカードを採用できる。例えば、通信Ｉ／Ｆ部３２は、イメージデータ及びテキストデータを含む電子文書をサーバ装置１０へ送信したり、あるいはサーバ装置１０から閲覧対象とする電子カルテや電子文書を受信したりする。 The communication I / F unit 32 is an interface that performs communication control with other devices, for example, the server device 10 and other client terminals 30. As an aspect of the communication I / F unit 32, a network interface card such as a LAN card can be employed. For example, the communication I / F unit 32 transmits an electronic document including image data and text data to the server device 10 or receives an electronic medical record or electronic document to be viewed from the server device 10.

記憶部３３は、制御部３５で実行されるＯＳ（Operating System）や後述の判定プログラムなどの各種プログラムを記憶する記憶デバイスである。記憶部３３の一態様としては、フラッシュメモリなどの半導体メモリ素子、ハードディスク、光ディスクなどの記憶装置が挙げられる。また、記憶部３３は、上記の種類の記憶装置に限定されるものではなく、ＲＡＭ（Random Access Memory)、ＲＯＭ（Read Only Memory)であってもよい。 The storage unit 33 is a storage device that stores various programs such as an OS (Operating System) executed by the control unit 35 and a determination program described later. As one mode of the storage unit 33, a storage device such as a semiconductor memory element such as a flash memory, a hard disk, and an optical disk can be cited. The storage unit 33 is not limited to the above-mentioned types of storage devices, and may be a RAM (Random Access Memory) or a ROM (Read Only Memory).

記憶部３３は、制御部３５で実行される各種のプログラムに用いられるデータの一例として、イメージデータ３３ａ、テキストデータ３３ｂ、キーワードデータ３３ｃ及び文書管理マスタ３３ｄなどを記憶する。なお、図２に図示されたデータ以外にも、他の電子データ、例えばスキャナ３１における読取画像の解像度の設定やＯＣＲ処理に用いる文字認識のアルゴリズムの設定なども併せて記憶することもできる。 The storage unit 33 stores image data 33a, text data 33b, keyword data 33c, a document management master 33d, and the like as examples of data used for various programs executed by the control unit 35. In addition to the data shown in FIG. 2, other electronic data, for example, the setting of the resolution of the read image in the scanner 31 and the setting of the algorithm for character recognition used for OCR processing can also be stored.

イメージデータ３３ａは、紙文書が電子化された画像データである。かかるイメージデータ３３ａの一態様としては、スキャナ３１によって複数の紙が読み込まれた場合に、各紙文書の画像が所定のファイル形式で登録される。例えば、イメージデータ３３ａのファイル形式には、ＴＩＦＦ（Tagged Image File Format）、ＪＰＥＧ（Joint Photographic Experts Group）、ＧＩＦ（Graphic Interchange Format）やＰＤＦ（Portable Document Format）などの任意のファイル形式を採用できる。 The image data 33a is image data obtained by digitizing a paper document. As one aspect of the image data 33a, when a plurality of papers are read by the scanner 31, images of each paper document are registered in a predetermined file format. For example, any file format such as TIFF (Tagged Image File Format), JPEG (Joint Photographic Experts Group), GIF (Graphic Interchange Format), or PDF (Portable Document Format) can be adopted as the file format of the image data 33a.

テキストデータ３３ｂは、紙文書のイメージデータがテキスト化されたデータである。かかるテキストデータ３３ｂは、スキャナ３１によって各紙文書ごとに生成されたイメージデータに含まれる文字がＯＣＲ処理によってテキストデータへ変換された後に表示媒体ごと、すなわちページ単位ごとに登録される。 The text data 33b is data obtained by converting the image data of a paper document into text. The text data 33b is registered for each display medium, that is, for each page unit, after characters included in the image data generated for each paper document by the scanner 31 are converted into text data by OCR processing.

キーワードデータ３３ｃは、電子文書を分類するインデックスを抽出するためのキーワードが定義されたデータである。かかるキーワードは、電子文書の閲覧時にインデックスとして用いられる識別情報の近傍に位置する属性情報が定義される。キーワードデータ３３ｃの一態様としては、キーワードＩＤ（identifier）及びキーワードなどの項目が対応付けられたデータを採用できる。図３は、キーワードデータ３３ｃの一例を示す図である。図３には、一例として、患者の名称を表す文字列を識別情報（属性値）として得るために、患者の名称の記入を促す属性情報に関する各種のキーワードが図示されている。図３に示すように、患者の名称を表す文字列を識別情報として抽出する場合には、患者の名称の記入を促す属性情報の各種の表記「患者氏名」、「患者名」や「入院患者名」などのキーワードが検索されることを意味する。なお、ここでは、患者の名称を表す文字列を識別情報として抽出する場合を例示したが、文書名や診療科を表す文字列を抽出する場合には、同様に、文書名または診療科の記入を促す属性情報の各種の表記がキーワードとして用いられる。 The keyword data 33c is data in which a keyword for extracting an index for classifying an electronic document is defined. As such a keyword, attribute information positioned in the vicinity of identification information used as an index when browsing an electronic document is defined. As an aspect of the keyword data 33c, data in which items such as a keyword ID (identifier) and a keyword are associated can be employed. FIG. 3 is a diagram illustrating an example of the keyword data 33c. FIG. 3 shows, as an example, various keywords related to attribute information that prompts the patient name to be entered in order to obtain a character string representing the patient name as identification information (attribute value). As shown in FIG. 3, when extracting a character string representing a patient name as identification information, various notations of “patient name”, “patient name”, “ This means that a keyword such as “name” is searched. Here, the case where a character string representing a patient's name is extracted as identification information is illustrated, but when a character string representing a document name or medical department is extracted, similarly, the document name or medical department is entered. Various notations of attribute information for prompting are used as keywords.

文書管理マスタ３３ｄは、紙文書が電子化された電子文書を管理するために用いられるマスタデータである。かかる文書管理マスタ３３ｄの一態様としては、レコード番号、画像ＩＤ、テキストＩＤ及び属性情報などの項目が対応付けられたデータを採用できる。ここで言う「レコード番号」とは、文書管理マスタ３３ｄが持つレコードを識別する識別情報の一態様であり、ここでは、例えば、シーケンシャルな連番によって表される。また、「画像ＩＤ」は、イメージデータを識別する識別情報を指し、また、「テキストＩＤ」は、テキストデータを識別する識別情報を指す。また、「属性情報」は、電子文書の閲覧時にインデックスとして用いられる識別情報の近傍に位置する属性情報を指す。 The document management master 33d is master data used for managing an electronic document obtained by digitizing a paper document. As an aspect of the document management master 33d, data in which items such as a record number, an image ID, a text ID, and attribute information are associated can be employed. The “record number” referred to here is an aspect of identification information for identifying a record held by the document management master 33d, and is represented here by, for example, a sequential serial number. “Image ID” indicates identification information for identifying image data, and “Text ID” indicates identification information for identifying text data. “Attribute information” refers to attribute information located in the vicinity of identification information used as an index when browsing an electronic document.

図４は、文書管理マスタ３３ｄの一例を示す図である。図４には、スキャナ３１によって５枚の紙が読み込まれた場合の文書管理マスタ３３ｄを例示している。図４に示すように、１番目に読み込まれた紙文書の画像ＩＤが「G001」であり、テキストＩＤが「T001」であり、図３に示したキーワードと相対する所定の位置、例えばキーワードの右隣の位置から患者名の属性値「吉田太郎」が抽出されたことを意味する。また、図４に示すように、２番目に読み込まれた紙文書の画像ＩＤが「G002」であり、テキストＩＤが「T002」であり、図３に示したキーワードの右隣の位置から患者名の属性値「吉×太郎」が抽出されたことを意味する。さらに、図４に示すように、３番目に読み込まれた紙文書の画像ＩＤが「G003」であり、テキストＩＤが「T003」であり、キーワードの右隣の位置から患者名の属性値「吉田太×」が抽出されたことを意味する。 FIG. 4 is a diagram illustrating an example of the document management master 33d. FIG. 4 illustrates the document management master 33d when five sheets are read by the scanner 31. As shown in FIG. 4, the image ID of the first read paper document is “G001”, the text ID is “T001”, and a predetermined position relative to the keyword shown in FIG. This means that the patient name attribute value “Taro Yoshida” has been extracted from the position on the right. Further, as shown in FIG. 4, the image ID of the second paper document read is “G002”, the text ID is “T002”, and the patient name is displayed from the right side of the keyword shown in FIG. This means that the attribute value “Yoshi x Taro” is extracted. Furthermore, as shown in FIG. 4, the image ID of the third read paper document is “G003”, the text ID is “T003”, and the attribute value “Yoshida” of the patient name from the position on the right side of the keyword. It means that “thick x” is extracted.

制御部３５は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する。制御部３５は、図２に示すように、文字認識部３５ａと、判定部３５ｂと、出力部３５ｃとを有する。 The control unit 35 has an internal memory for storing programs defining various processing procedures and control data, and executes various processes using these. As shown in FIG. 2, the control unit 35 includes a character recognition unit 35a, a determination unit 35b, and an output unit 35c.

文字認識部３５ａは、ＯＣＲソフトウェアを実行する処理部である。一態様としては、文字認識部３５ａは、スキャナ３１によって紙文書が読み込まれてイメージデータ生成される度に、当該紙文書のイメージデータに対し、当該イメージデータに含まれる文字を認識した上で文字コードに変換するＯＣＲ処理を実行する。このようにＯＣＲ処理が実行されることによって、表示媒体である紙文書に含まれる文字のテキストデータが得られる。その後、文字認識部３５ａは、イメージデータに画像ＩＤを採番するとともに、テキストデータにテキストＩＤを採番する。その上で、文字認識部３５ａは、イメージデータ及びテキストデータを記憶部３３へ登録する。なお、ここでは、イメージデータに含まれる文字をテキストファイルへ変換する場合を例示したが、ＨＴＭＬファイル、ＸＭＬファイルやＲＴＦ（Rich Text Format）ファイルへ変換することとしてもよい。 The character recognition unit 35a is a processing unit that executes OCR software. As one aspect, each time the paper document is read by the scanner 31 and image data is generated by the scanner 31, the character recognition unit 35a recognizes the characters included in the image data with respect to the image data of the paper document. Execute OCR processing to convert to code. By executing the OCR process in this way, text data of characters included in a paper document as a display medium can be obtained. Thereafter, the character recognition unit 35a assigns an image ID to the image data and assigns a text ID to the text data. In addition, the character recognition unit 35 a registers image data and text data in the storage unit 33. Although the case where characters included in the image data are converted into a text file has been illustrated here, it may be converted into an HTML file, an XML file, or an RTF (Rich Text Format) file.

判定部３５ｂは、各紙文書間で共通する所定の位置に対応する文字認識結果を用いて、当該文字認識結果が同じ識別情報を示すか否かを判定する処理部である。一態様としては、判定部３５ｂは、文字認識部３５ａによってテキストデータが生成される度に、当該テキストデータからキーワードデータ３３ｃに定義されたキーワードと一致する文字列を検索する。このとき、判定部３５ｂは、キーワードと一致する文字列が検索された場合には、当該キーワードと一致する文字列から所定の位置、例えばキーワードの右隣の位置に対応する患者名の文字認識結果を抽出する。その上で、判定部３５ｂは、文書管理マスタ３３ｄのレコード番号、画像ＩＤ及びテキストＩＤとともに、属性情報の属性値として患者名の文字認識結果を対応付けて文書管理マスタ３３ｄへ登録する。その後、判定部３５ｂは、スキャナ３１によって全ての紙文書が読み取られるまで、患者名の文字認識結果を抽出して文書管理マスタ３３ｄへ登録する処理を繰り返し実行する。 The determination unit 35b is a processing unit that determines whether or not the character recognition result indicates the same identification information by using the character recognition result corresponding to a predetermined position common among the paper documents. As one aspect, every time text data is generated by the character recognition unit 35a, the determination unit 35b searches the text data for a character string that matches the keyword defined in the keyword data 33c. At this time, when a character string that matches the keyword is found, the determination unit 35b determines a character recognition result of the patient name corresponding to a predetermined position, for example, a position to the right of the keyword, from the character string that matches the keyword. To extract. Then, the determination unit 35b registers the character recognition result of the patient name as the attribute value of the attribute information in association with the record number, the image ID, and the text ID of the document management master 33d and registers them in the document management master 33d. Thereafter, the determination unit 35b repeatedly executes the process of extracting the character recognition result of the patient name and registering it in the document management master 33d until all the paper documents are read by the scanner 31.

図５は、識別情報の抽出例を示す図である。図５には、図４に示したレコード番号１およびレコード番号２の識別情報が抽出される場合を図示している。図５に示すように、紙文書のイメージデータ５１ＡにＯＣＲ処理が実行された場合には、テキストデータ５２Ａが得られる。その後、テキストデータ５２Ａから図３に示したキーワード「患者氏名」、「患者名」や「入院患者名」などが検索される。この結果、テキストデータ５２Ａからは、「患者氏名」が検索される。この場合には、キーワード「患者氏名」の右隣に位置する文字認識結果「吉田太郎」が抽出される。また、紙文書５１ＢにＯＣＲ処理が実行された場合には、テキストデータ５２Ｂが得られる。その後、テキストデータ５２Ｂから図３に示したキーワード「患者氏名」、「患者名」や「入院患者名」などが検索される。この結果、テキストデータ５２Ｂからは、「患者氏名」が検索される。この場合には、キーワード「患者氏名」の右隣に位置する文字認識結果「吉×太郎」が抽出される。このように、紙文書が横書きの文書である場合には、患者の名称の記入を促す属性情報から見て右側や下側に患者の名称が並べて記入されることが多い。このため、テキスト化がなされた後には、キーワードに対応する文字列の右側に続けて識別情報の文字列が発現する可能性が高い。このことから、紙文書が横書きの文書である場合には、キーワードに対応する文字列の右側の文字認識結果が抽出される。 FIG. 5 is a diagram illustrating an example of extraction of identification information. FIG. 5 shows a case where the identification information of record number 1 and record number 2 shown in FIG. 4 is extracted. As shown in FIG. 5, when the OCR process is performed on the image data 51A of the paper document, text data 52A is obtained. Thereafter, the keywords “patient name”, “patient name”, “inpatient name” and the like shown in FIG. 3 are searched from the text data 52A. As a result, “patient name” is searched from the text data 52A. In this case, the character recognition result “Taro Yoshida” located on the right side of the keyword “patient name” is extracted. In addition, when the OCR process is performed on the paper document 51B, text data 52B is obtained. Thereafter, the keyword “patient name”, “patient name”, “inpatient name”, etc. shown in FIG. 3 are searched from the text data 52B. As a result, “patient name” is searched from the text data 52B. In this case, the character recognition result “Yoshi × Taro” located on the right side of the keyword “patient name” is extracted. As described above, when the paper document is a horizontally written document, the patient name is often written side by side on the right side or the lower side as viewed from the attribute information that prompts the patient name to be entered. For this reason, there is a high possibility that the character string of the identification information will appear after the right side of the character string corresponding to the keyword after the text is made. Therefore, when the paper document is a horizontally written document, the character recognition result on the right side of the character string corresponding to the keyword is extracted.

全ての紙文書を対象に患者名の文字認識結果を抽出して文書管理マスタ３３ｄへ登録した後に、判定部３５ｂは、連続的に読み込まれた関係にある紙文書の間でテキストデータに含まれるキーワードの右隣の文字認識結果が同一の識別情報を示すか否かを判定する。このとき、判定部３５ｂは、連続的に読み込まれた関係にある紙文書の間で文字認識結果の整合率を算出する。例えば、判定部３５ｂは、患者の名称を構成する文字列のうち文字列を構成する互いの文字が一致する割合を整合率として算出する。図５の例で言えば、テキストデータ５２Ａから抽出された文字認識結果「吉田太郎」と、テキストデータ５２Ｂから抽出された文字認識結果「吉×太郎」との間では、「吉」、「太」及び「郎」の３文字の認識結果が一致する。このため、整合率は、「３÷４×１００＝７５％」と算出される。 After extracting the character recognition result of the patient name for all the paper documents and registering it in the document management master 33d, the determination unit 35b is included in the text data between the paper documents in a continuously read relationship. It is determined whether the character recognition result on the right side of the keyword indicates the same identification information. At this time, the determination unit 35b calculates the matching rate of the character recognition results between the paper documents that are in a continuously read relationship. For example, the determination unit 35b calculates, as the matching rate, the rate at which the characters constituting the character string match among the character strings constituting the patient's name. In the example of FIG. 5, between the character recognition result “Taro Yoshida” extracted from the text data 52A and the character recognition result “Kichi × Taro” extracted from the text data 52B, “K” "And" Buro "match the three characters. For this reason, the matching rate is calculated as “3 ÷ 4 × 100 = 75%”.

その上で、判定部３５ｂは、連続的に読み込まれていない関係にある第１の表示媒体および第２の表示媒体の間でテキストデータに含まれるキーワードの右隣の文字認識結果が同じ識別情報を示すと判定するのに用いる第１の整合率より低い第２の整合率を先に算出された整合率と比較する。例えば、連続的に読み込まれた関係にある表示媒体のうち後で読み込まれた表示媒体のテキストデータからキーワードの右隣の文字認識結果が抽出できなかった場合などには、文字認識結果が得られた表示媒体まで比較対象とする表示媒体がとばされる場合がある。この場合には、連続的に読み込まれていない関係にある第１の表示媒体および第２の表示媒体の間でテキストデータに含まれるキーワードの右隣の文字認識結果が同じ識別情報を示すか否かが判定されることなる。かかる場合には、上記の第１の整合率が閾値として用いられることになる。 In addition, the determination unit 35b has the same character recognition result on the right side of the keyword included in the text data between the first display medium and the second display medium that are not continuously read. The second matching rate, which is lower than the first matching rate used to determine that it indicates that is shown, is compared with the previously calculated matching rate. For example, the character recognition result is obtained when the character recognition result on the right side of the keyword cannot be extracted from the text data of the display medium read later among the display media that are continuously read. In some cases, display media to be compared are skipped. In this case, whether or not the character recognition result on the right side of the keyword included in the text data indicates the same identification information between the first display medium and the second display medium that are not continuously read. Is determined. In such a case, the first matching rate is used as a threshold value.

このように、第１の整合率よりも第２の整合率の値を低くするのは、連続的に読み込まれた関係にある第３の表示媒体および第４の表示媒体の間で患者の名称が類似する場合には、連続的に読み込まれていない関係にある第１の表示媒体および第２の表示媒体の間で患者の名称が類似する場合よりも、２つの表示媒体が同一の患者に関する紙文書である可能性が高いからである。なぜなら、病院の保管庫等で管理される紙文書等の表示媒体は、包袋によって患者単位で保管されており、各表示媒体が患者単位で重ねられた状態でスキャナ３１へセットされる可能性が高いからである。なお、本実施例では、第１の整合率が９０％であり、第２の整合率が７０％である場合を想定して以下の説明を行う。 As described above, the value of the second matching rate is made lower than the first matching rate because the name of the patient between the third display medium and the fourth display medium that are in the continuously read relationship. Are similar, the two display media are related to the same patient than when the patient names are similar between the first display medium and the second display medium that are not continuously read. This is because the possibility of being a paper document is high. This is because a display medium such as a paper document managed in a hospital storage etc. is stored in a patient unit by a wrapping bag, and each display medium may be set in the scanner 31 in a state where each display medium is stacked in a patient unit. Because it is expensive. In the present embodiment, the following description will be given on the assumption that the first matching rate is 90% and the second matching rate is 70%.

図５の例で言えば、テキストデータ５２Ａから抽出された文字認識結果「吉田太郎」と、テキストデータ５２Ｂから抽出された文字認識結果「吉×太郎」との整合率「７５％」が第２の整合率以上である。このため、テキストデータ５２Ａから抽出された文字認識結果「吉田太郎」と、テキストデータ５２Ｂから抽出された文字認識結果「吉×太郎」とは、同じ識別情報を示すと同定する。その上で、判定部３５ｂは、先に読み込まれていた紙文書のテキストデータ５２Ａから抽出された文字認識結果「吉田太郎」を、後に読み込まれた紙文書のテキストデータ５２Ｂから抽出された文字認識結果へ上書きする。この結果、図４に示したレコード番号２の属性情報の属性値が「吉田太郎」と更新されることになる。すると、レコード番号２及びレコード番号３の間で文字認識結果の整合率が算出された場合にも、「吉」、「田」及び「太」の３文字の認識結果が一致する結果、第２の整合率「７０％」以上の整合率「３÷４×１００＝７５％」が算出される。このため、先に読み込まれていたレコード番号２の文字認識結果「吉田太郎」が、後に読み込まれたレコード番号３の文字認識結果へ上書きされる。このように、連続的に読み込まれた関係にある表示媒体のうち一方の表示媒体でキーワードに対応する位置の文字認識に誤認識等が発生した場合でも、他方の表示媒体の文字認識結果で補間して同じ識別情報を示すことが判定できるので、電子文書を患者単位で効果的に分類できる。 In the example of FIG. 5, the matching rate “75%” between the character recognition result “Taro Yoshida” extracted from the text data 52A and the character recognition result “Kichi × Taro” extracted from the text data 52B is the second. It is more than the matching rate. For this reason, the character recognition result “Taro Yoshida” extracted from the text data 52A and the character recognition result “Kichi × Taro” extracted from the text data 52B are identified as indicating the same identification information. After that, the determination unit 35b recognizes the character recognition result “Taro Yoshida” extracted from the text data 52A of the paper document previously read, and the character recognition extracted from the text data 52B of the paper document read later. Overwrite the result. As a result, the attribute value of the attribute information of record number 2 shown in FIG. 4 is updated to “Taro Yoshida”. Then, even when the matching rate of the character recognition result is calculated between the record number 2 and the record number 3, the recognition result of the three characters “yoshi”, “da”, and “thick” matches, The matching rate “3 ÷ 4 × 100 = 75%” equal to or higher than the matching rate “70%” is calculated. For this reason, the character recognition result “Taro Yoshida” of record number 2 that was read first is overwritten on the character recognition result of record number 3 that was read later. As described above, even when erroneous recognition or the like occurs in character recognition at the position corresponding to the keyword in one display medium among the display media that are continuously read, interpolation is performed based on the character recognition result of the other display medium. Since it can be determined that the same identification information is shown, the electronic document can be effectively classified on a patient basis.

出力部３５ｃは、サーバ装置１０に対するデータの出力制御を実行する処理部である。一態様としては、出力部３５ｃは、判定部３５ｂによる判定処理が終了した後に、記憶部３３に記憶されたイメージデータ３３ａ及びテキストデータ３３ｂを含む電子文書と、記憶部３３に記憶された文書管理マスタ３３ｄとをサーバ装置１０へアップロードする。これによって、サーバ装置１０では、文書管理マスタ３３ｄに含まれる属性情報の属性値である識別情報「患者名」をインデックスとして電子文書を分類することができる。 The output unit 35 c is a processing unit that executes data output control for the server device 10. As one aspect, the output unit 35c includes an electronic document including the image data 33a and the text data 33b stored in the storage unit 33 after the determination process by the determination unit 35b is completed, and a document management stored in the storage unit 33. The master 33d is uploaded to the server device 10. As a result, the server apparatus 10 can classify the electronic documents using the identification information “patient name”, which is the attribute value of the attribute information included in the document management master 33d, as an index.

なお、制御部３５には、各種の集積回路や電子回路を採用できる。また、制御部３５が有する機能部の一部を別の集積回路や電子回路とすることもできる。例えば、集積回路としては、ＡＳＩＣ（Application Specific Integrated Circuit）が挙げられる。また、電子回路としては、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などが挙げられる。 The control unit 35 can employ various integrated circuits and electronic circuits. Further, a part of the functional unit included in the control unit 35 may be another integrated circuit or an electronic circuit. For example, an ASIC (Application Specific Integrated Circuit) is an example of the integrated circuit. Examples of the electronic circuit include a central processing unit (CPU) and a micro processing unit (MPU).

［処理の流れ］
続いて、本実施例に係るクライアント端末３０の処理の流れについて説明する。なお、ここでは、クライアント端末３０によって実行される（１）読込処理を説明した後に、（２）判定処理を説明することとする。 [Process flow]
Next, the process flow of the client terminal 30 according to the present embodiment will be described. Here, after (1) reading processing executed by the client terminal 30 is described, (2) determination processing is described.

（１）読込処理
図６は、実施例１に係る読込処理の手順を示すフローチャートである。この読込処理は、複数の紙がスキャナ３１にセットされた状態で図示しない入力デバイス等を介して読込指示を受け付けた場合に処理が起動される。 (1) Reading Process FIG. 6 is a flowchart illustrating the procedure of the reading process according to the first embodiment. This reading process is started when a reading instruction is received via an input device or the like (not shown) with a plurality of sheets set in the scanner 31.

図６に示すように、表示媒体である紙文書を読み込むと（ステップＳ１０１）、スキャン３１は、紙文書のイメージデータを生成する（ステップＳ１０２）。続いて、文字認識部３５ａは、紙文書のイメージデータに含まれる文字を認識した上で文字コードに変換するＯＣＲ処理を実行することによって紙文書のテキストデータを生成する（ステップＳ１０３）。 As shown in FIG. 6, when a paper document as a display medium is read (step S101), the scan 31 generates image data of the paper document (step S102). Subsequently, the character recognition unit 35a generates text data of the paper document by executing an OCR process for recognizing a character included in the image data of the paper document and converting it to a character code (step S103).

そして、文字認識部３５ａは、イメージデータに画像ＩＤを採番するとともに、テキストデータにテキストＩＤを採番する（ステップＳ１０４）。その上で、文字認識部３５ａは、イメージデータ及びテキストデータを記憶部３３へ登録する（ステップＳ１０５）。 Then, the character recognition unit 35a assigns an image ID to the image data and assigns a text ID to the text data (step S104). Then, the character recognition unit 35a registers image data and text data in the storage unit 33 (step S105).

続いて、判定部３５ｂは、ステップＳ１０３で生成されたテキストデータからキーワードデータ３３ｃに定義されたキーワードと一致する文字列を検索する（ステップＳ１０６）。なお、キーワードと一致する文字列が検索されなかった場合（ステップＳ１０７Ｎｏ）には、ステップＳ１１０の処理へ移行する。 Subsequently, the determination unit 35b searches the text data generated in step S103 for a character string that matches the keyword defined in the keyword data 33c (step S106). If no character string matching the keyword is found (No in step S107), the process proceeds to step S110.

このとき、キーワードと一致する文字列が検索された場合（ステップＳ１０７Ｙｅｓ）には、判定部３５ｂは、次のような処理を実行する。すなわち、判定部３５ｂは、当該キーワードと一致する文字列から所定の位置、例えばキーワードの右隣の位置に対応する患者名の文字認識結果を抽出する（ステップＳ１０８）。 At this time, when a character string that matches the keyword is searched (step S107 Yes), the determination unit 35b executes the following process. That is, the determination unit 35b extracts a character recognition result of a patient name corresponding to a predetermined position, for example, a position on the right side of the keyword, from a character string that matches the keyword (step S108).

その上で、判定部３５ｂは、文書管理マスタ３３ｄのレコード番号、ステップＳ１０４で採番された画像ＩＤ及びテキストＩＤとともに、ステップＳ１０８で抽出された患者名の文字認識結果を対応付けて文書管理マスタ３３ｄへ登録する（ステップＳ１０９）。 Then, the determination unit 35b associates the character recognition result of the patient name extracted in step S108 with the record number of the document management master 33d, the image ID and text ID numbered in step S104, and associates them with the document management master 33d. It is registered in 33d (step S109).

その後、スキャナ３１によって全ての紙文書が読み込まれるまで（ステップＳ１１０Ｎｏ）、上記のステップＳ１０１〜ステップＳ１０９までの処理が繰り返し実行される。そして、スキャナ３１によって全ての紙文書が読み込まれた場合（ステップＳ１１０Ｙｅｓ）に、処理を終了する。 After that, until all the paper documents are read by the scanner 31 (No at Step S110), the processes from Step S101 to Step S109 are repeatedly executed. Then, when all the paper documents have been read by the scanner 31 (step S110 Yes), the process ends.

（２）判定処理
図７は、実施例１に係る判定処理の手順を示すフローチャートである。この判定処理は、例えば、図６に示した読込処理が終了した場合に処理が起動される。なお、上記の判定処理は、文書管理マスタ３３ｄのうち少なくとも連続的に読み込まれた関係にある２つのレコードに各々の文字認識結果が抽出された場合に文字認識結果が抽出されたレコードを対象に処理を起動して実行することもできる。 (2) Determination Processing FIG. 7 is a flowchart illustrating a determination processing procedure according to the first embodiment. For example, the determination process is started when the reading process illustrated in FIG. 6 is completed. Note that the above-described determination processing is performed on the record from which the character recognition result is extracted when each character recognition result is extracted from two records in the document management master 33d that are at least continuously read. It is also possible to start and execute the process.

図７に示すように、判定部３５ｂは、文書管理マスタ３３ｄのレコード番号をカウントするカウンタＮを初期値、例えばゼロに初期化する（ステップＳ３０１）。続いて、判定部３５ｂは、レコード番号のカウンタＮを１つインクリメントする（ステップＳ３０２）。 As shown in FIG. 7, the determination unit 35b initializes a counter N that counts the record number of the document management master 33d to an initial value, for example, zero (step S301). Subsequently, the determination unit 35b increments the record number counter N by one (step S302).

そして、判定部３５ｂは、レコード番号Ｎにおける患者名の文字認識結果と、レコード番号Ｎ＋１における患者名の文字認識結果とを比較して連続的に読み込まれた関係にある２つの紙文書の間で互いのテキストデータに含まれるキーワードの右隣の文字認識結果の整合率を算出する（ステップＳ３０３）。 Then, the determination unit 35b compares the character recognition result of the patient name in the record number N with the character recognition result of the patient name in the record number N + 1, and between the two paper documents that are continuously read. The matching rate of the character recognition result to the right of the keyword included in each text data is calculated (step S303).

ここで、ステップＳ３０３で算出された整合率が所定の閾値、例えば第２の整合率である７０％以上である場合（ステップＳ３０４Ｙｅｓ）には、判定部３５ｂは、次のような処理を実行する。すなわち、判定部３５ｂは、レコード番号Ｎの文字認識結果をレコード番号Ｎ＋１の文字認識結果へ設定する（ステップＳ３０５）。なお、ステップＳ３０３で算出された整合率が第２の整合率未満である場合（ステップＳ３０４Ｎｏ）には、上記のステップＳ３０５の処理は実行せずに、ステップＳ３０６の処理へ移行する。 Here, when the matching rate calculated in step S303 is a predetermined threshold, for example, the second matching rate is 70% or more (Yes in step S304), the determination unit 35b executes the following process. . That is, the determination unit 35b sets the character recognition result of the record number N to the character recognition result of the record number N + 1 (Step S305). When the matching rate calculated in step S303 is less than the second matching rate (No in step S304), the process proceeds to step S306 without executing the process in step S305.

その後、レコード番号Ｎ＋１の次のレコード、すなわちレコード番号Ｎ＋２のレコードが存在する限り（ステップＳ３０６Ｙｅｓ）、上記のステップＳ３０２〜ステップＳ３０５までの処理を繰り返し実行する。そして、レコード番号Ｎ＋１の次のレコード、すなわちレコード番号Ｎ＋２のレコードがなくなった場合（ステップＳ３０６Ｎｏ）に、処理を終了する。 Thereafter, as long as there is a record next to the record number N + 1, that is, a record with the record number N + 2 (Yes in step S306), the processes from step S302 to step S305 are repeated. Then, when there is no record next to the record number N + 1, that is, the record with the record number N + 2 (step S306 No), the process is terminated.

［実施例１の効果］
上述してきたように、本実施例に係るクライアント端末３０は、複数の表示媒体のそれぞれに表示された表示内容を読み込み、読み込んだ各表示内容について文字認識処理を行う。そして、本実施例に係るクライアント端末３０は、複数の表示媒体のうち、連続的に読み込まれていない関係にある第１の表示媒体と第２の表示媒体のそれぞれに共通する所定の位置に対応する文字認識処理の結果が、第１の整合率以上である場合に、該所定の位置に対応する文字認識結果が同じ識別情報を示すと判定し、複数の表示媒体のうち、連続的に読み込まれた関係にある第３の表示媒体と第４の表示媒体のそれぞれに共通する所定の位置に対応する文字認識処理の結果が、第１の整合率より低い第２の整合率以上である場合に、該所定の位置に対応する文字認識結果が同じ識別情報を示すと判定する。 [Effect of Example 1]
As described above, the client terminal 30 according to the present embodiment reads the display content displayed on each of the plurality of display media, and performs character recognition processing on each read display content. The client terminal 30 according to the present embodiment corresponds to a predetermined position common to each of the first display medium and the second display medium that are not continuously read from among the plurality of display media. When the result of the character recognition processing to be performed is equal to or higher than the first matching rate, it is determined that the character recognition result corresponding to the predetermined position indicates the same identification information, and is continuously read from the plurality of display media. When the result of character recognition processing corresponding to a predetermined position common to the third display medium and the fourth display medium having the above relationship is equal to or higher than the second matching rate lower than the first matching rate In addition, it is determined that the character recognition result corresponding to the predetermined position indicates the same identification information.

このため、本実施例に係るクライアント端末３０では、連続的に読み込まれた関係にある表示媒体のうち一方の表示媒体で所定の位置の文字の認識率が低下した場合でも、他方の表示媒体の文字認識結果で補間して同じ識別情報を示すことが判定できる。それゆえ、本実施例に係るクライアント端末３０では、病院の関係者に文字認識結果を目視の上で識別情報である文字認識結果を入力させたり、紙文書を改めて読み込ませて文字認識を再実行させたりする手間を抑制できる。したがって、本実施例に係るクライアント端末３０によれば、電子文書の分類を効果的に支援できる。例えば、手書きで患者の名称が紙文書へ記入される場合などには、文字の認識率の低下は発生しやすいが、紙文書へ手書きによって文字が書き込まれる場合でも、電子文書の分類を効果的に支援できる。 For this reason, in the client terminal 30 according to the present embodiment, even when the recognition rate of the character at a predetermined position is reduced in one display medium among the display media that are continuously read, the other display medium It can be determined that the same identification information is shown by interpolation with the character recognition result. Therefore, in the client terminal 30 according to the present embodiment, the character recognition result that is the identification information is visually input to the person concerned at the hospital, or the paper recognition is performed again by reading the paper document again. The trouble of letting it go can be suppressed. Therefore, the client terminal 30 according to the present embodiment can effectively support the classification of electronic documents. For example, when the patient's name is written on a paper document by handwriting, the recognition rate of characters tends to decrease, but the classification of electronic documents is effective even when characters are written on paper documents by handwriting. Can help.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments related to the disclosed apparatus have been described above, the present invention may be implemented in various different forms other than the above-described embodiments. Therefore, another embodiment included in the present invention will be described below.

［応用例］
上記の実施例１では、連続的に読み込まれた関係にある２つの表示媒体の間で識別情報を同定する場合を例示したが、連続的に読み込まれた関係にある表示媒体の数が３つ以上である場合にも互いの識別情報が同じ識別情報を示すことを同定できる。例えば、クライアント端末３０は、第３の表示媒体、第４の表示媒体、第５の表示媒体の３つの表示媒体の間で識別情報を同定することもできる。 [Application example]
In the first embodiment, the case where identification information is identified between two display media in a continuously read relationship is exemplified. However, the number of display media in a continuously read relationship is three. Even in this case, it can be identified that the identification information indicates the same identification information. For example, the client terminal 30 can identify identification information among three display media, that is, a third display medium, a fourth display medium, and a fifth display medium.

これを説明すると、クライアント端末３０は、上記の３つの表示媒体のうち中央に位置する第４の表示媒体と、第４の表示媒体の前に位置する第３の表示媒体との間でそれぞれに共通する所定の位置に対応する文字認識処理の結果の整合率を算出する。さらに、クライアント端末３０は、第４の表示媒体と、第４の表示媒体の後に位置する第５の表示媒体との間でそれぞれに共通する所定の位置に対応する文字認識処理の結果の整合率を算出する。ここで、クライアント端末３０は、第３の表示媒体と第４の表示媒体との間で算出された文字認識処理の結果の整合率および第４の表示媒体と第５の表示媒体との間で算出された文字認識処理の結果の整合率の両方が第２の整合率より低かったとしても、第２の整合率より低い第３の整合率、例えば５０％以上であれば、第３の表示媒体、第４の表示媒体および第５の表示媒体の間で所定の位置に対応する文字認識結果が同じ識別情報を示すと判定することもできる。これによって、同じ患者に関する表示媒体によって挟まれて読み込まれた表示媒体に含まれる所定の位置に対応する文字の認識率が低下した場合でも、前後に位置する表示媒体の文字認識結果で補間して同じ識別情報を示すことが判定できる。 Explaining this, the client terminal 30 is respectively connected between the fourth display medium located in the center of the three display media and the third display medium located in front of the fourth display medium. The matching rate of the result of the character recognition process corresponding to the common predetermined position is calculated. Further, the client terminal 30 matches the rate of the result of character recognition processing corresponding to a predetermined position common to each of the fourth display medium and the fifth display medium located after the fourth display medium. Is calculated. Here, the client terminal 30 determines the matching rate of the result of the character recognition process calculated between the third display medium and the fourth display medium, and between the fourth display medium and the fifth display medium. Even if both of the calculated matching rates of the character recognition processing result are lower than the second matching rate, the third display is performed if the third matching rate is lower than the second matching rate, for example, 50% or more. It can also be determined that the character recognition result corresponding to a predetermined position among the medium, the fourth display medium, and the fifth display medium indicates the same identification information. As a result, even when the recognition rate of the character corresponding to a predetermined position included in the display medium read by being sandwiched between the display media related to the same patient is reduced, the character recognition result of the display medium positioned before and after is interpolated. It can be determined that the same identification information is indicated.

［適用範囲］
上記の実施例１では、病院におけるカルテ等の紙文書が電子化された電子文書を分類する場合を例示したが、複数の紙をまとめて読み込んだ電子文書を分類する場合、例えば銀行窓口で受け付けられた各種の紙伝票が電子化された電子文書を顧客別に分類する場合などにも、制御部３５内の各機能部３５ａ〜３５ｃで実行される処理を同様に適用することができる。この場合には、例えば、キーワードデータ３３ｃに「氏名」、「お客様のご氏名」、「お名前」や「御名前」などのキーワードを定義しておくことができる。 [Scope of application]
In the first embodiment, the case where the electronic document obtained by digitizing the paper document such as the medical chart in the hospital is exemplified. However, when the electronic document read by collecting a plurality of papers is classified, for example, it is accepted at a bank window. The processing executed by the functional units 35a to 35c in the control unit 35 can be similarly applied to the case where electronic documents obtained by digitizing various types of paper slips are classified by customer. In this case, for example, keywords such as “name”, “customer name”, “name”, and “name” can be defined in the keyword data 33c.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、文字認識部３５ａ、判定部３５ｂまたは出力部３５ｃをクライアント端末３０の外部装置としてネットワーク経由で接続するようにしてもよい。また、文字認識部３５ａ、判定部３５ｂまたは出力部３５ｃを別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のクライアント端末３０の機能を実現するようにしてもよい。 [Distribution and integration]
In addition, each component of each illustrated apparatus does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the character recognition unit 35a, the determination unit 35b, or the output unit 35c may be connected as an external device of the client terminal 30 via a network. Further, the functions of the client terminal 30 may be realized by having different devices each having the character recognition unit 35a, the determination unit 35b, or the output unit 35c and connected to each other through a network.

［判定プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図８を用いて、上記の実施例と同様の機能を有する判定プログラムを実行するコンピュータの一例について説明する。 [Judgment program]
The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. In the following, an example of a computer that executes a determination program having the same function as that of the above-described embodiment will be described with reference to FIG.

図８は、実施例１及び実施例２に係る判定プログラムを実行するコンピュータの一例について説明するための図である。図８に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 8 is a schematic diagram illustrating an example of a computer that executes a determination program according to the first and second embodiments. As illustrated in FIG. 8, the computer 100 includes an operation unit 110a, a speaker 110b, a camera 110c, a display 120, and a communication unit 130. Further, the computer 100 includes a CPU 150, a ROM 160, an HDD 170, and a RAM 180. These units 110 to 180 are connected via a bus 140.

ＨＤＤ１７０には、図８に示すように、上記の実施例１で示した文字認識部３５ａ、判定部３５ｂ及び出力部３５ｃと同様の機能を発揮する判定プログラム１７０ａが予め記憶される。この判定プログラム１７０ａについては、図２に示した各々の文字認識部３５ａ、判定部３５ｂ及び出力部３５ｃの各構成要素と同様、適宜統合又は分離しても良い。すなわち、ＨＤＤ１７０に格納される各データは、常に全てのデータがＨＤＤ１７０に格納される必要はなく、処理に必要なデータのみがＨＤＤ１７０に格納されれば良い。 As shown in FIG. 8, the HDD 170 stores in advance a determination program 170a that exhibits the same functions as those of the character recognition unit 35a, the determination unit 35b, and the output unit 35c described in the first embodiment. The determination program 170a may be integrated or separated as appropriate, like the respective components of the character recognition unit 35a, determination unit 35b, and output unit 35c shown in FIG. In other words, all data stored in the HDD 170 need not always be stored in the HDD 170, and only data necessary for processing may be stored in the HDD 170.

そして、ＣＰＵ１５０が、判定プログラム１７０ａをＨＤＤ１７０から読み出してＲＡＭ１８０に展開する。これによって、図８に示すように、判定プログラム１７０ａは、判定プロセス１８０ａとして機能する。この判定プロセス１８０ａは、ＨＤＤ１７０から読み出した各種データを適宜ＲＡＭ１８０上の自身に割り当てられた領域に展開し、この展開した各種データに基づいて各種処理を実行する。なお、判定プロセス１８０ａは、図２に示した文字認識部３５ａ、判定部３５ｂ及び出力部３５ｃにて実行される処理、例えば図６〜図７に示す処理を含む。また、ＣＰＵ１５０上で仮想的に実現される各処理部は、常に全ての処理部がＣＰＵ１５０上で動作する必要はなく、処理に必要な処理部のみが仮想的に実現されれば良い。 Then, the CPU 150 reads the determination program 170 a from the HDD 170 and expands it in the RAM 180. Thereby, as shown in FIG. 8, the determination program 170a functions as a determination process 180a. The determination process 180a expands various data read from the HDD 170 in an area allocated to itself on the RAM 180 as appropriate, and executes various processes based on the expanded data. The determination process 180a includes processes executed by the character recognition unit 35a, the determination unit 35b, and the output unit 35c illustrated in FIG. 2, for example, the processes illustrated in FIGS. In addition, each processing unit virtually realized on the CPU 150 does not always require that all processing units operate on the CPU 150, and only a processing unit necessary for the processing needs to be virtually realized.

なお、上記の判定プログラム１７０ａについては、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶させておく必要はない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から各プログラムを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに各プログラムを記憶させておき、コンピュータ１００がこれらから各プログラムを取得して実行するようにしてもよい。 Note that the determination program 170a is not necessarily stored in the HDD 170 or the ROM 160 from the beginning. For example, each program is stored in a “portable physical medium” such as a flexible disk inserted into the computer 100, so-called FD, CD-ROM, DVD disk, magneto-optical disk, or IC card. Then, the computer 100 may acquire and execute each program from these portable physical media. Each program is stored in another computer or server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 acquires and executes each program from these. It may be.

１電子カルテシステム
５ネットワーク
１０サーバ装置
３０クライアント端末
３１スキャナ
３２通信Ｉ／Ｆ部
３３記憶部
３３ａイメージデータ
３３ｂテキストデータ
３３ｃキーワードデータ
３３ｄ文書管理マスタ
３５制御部
３５ａ文字認識部
３５ｂ判定部
３５ｃ出力部 DESCRIPTION OF SYMBOLS 1 Electronic medical record system 5 Network 10 Server apparatus 30 Client terminal 31 Scanner 32 Communication I / F part 33 Memory | storage part 33a Image data 33b Text data 33c Keyword data 33d Document management master 35 Control part 35a Character recognition part 35b Judgment part 35c Output part

Claims

Read the display content displayed on each of the multiple display media, perform character recognition processing on each read display content,
Of the plurality of display media, the result of character recognition processing corresponding to a predetermined position common to each of the first display medium and the second display medium that are not continuously read is the first display medium. When the matching rate is equal to or higher than the matching rate, it is determined that the character recognition result corresponding to the predetermined position indicates the same identification information;
Among the plurality of display media, the result of character recognition processing corresponding to a predetermined position common to each of the third display medium and the fourth display medium that are sequentially read is the first match. A character recognition result corresponding to the predetermined position is determined to indicate the same identification information when the second matching ratio is lower than the second matching ratio.
A determination program that causes a computer to execute processing.

The fourth display medium is in a relationship that is continuously read with the third display medium and is in a relationship that is continuously read with the fifth display medium,
As a process for determining the identification information of the character recognition result,
The result of character recognition processing corresponding to a predetermined position common to each of the third display medium and the fourth display medium is equal to or higher than a third matching rate lower than the second matching rate, and When the result of character recognition processing corresponding to a predetermined position common to each of the fourth display medium and the fifth display medium is equal to or higher than the third matching rate, the character recognition result corresponding to the predetermined position The determination program according to claim 1, wherein the determination programs indicate that they indicate the same identification information.

Computer
Read the display content displayed on each of the multiple display media, perform character recognition processing on each read display content,
Of the plurality of display media, the result of character recognition processing corresponding to a predetermined position common to each of the first display medium and the second display medium that are not continuously read is the first display medium. When the matching rate is equal to or higher than the matching rate, it is determined that the character recognition result corresponding to the predetermined position indicates the same identification information;
Among the plurality of display media, the result of character recognition processing corresponding to a predetermined position common to each of the third display medium and the fourth display medium that are sequentially read is the first match. A character recognition result corresponding to the predetermined position is determined to indicate the same identification information when the second matching ratio is lower than the second matching ratio.
A determination method characterized by executing processing.

A character recognition unit that reads display contents displayed on each of a plurality of display media and performs character recognition processing on each of the read display contents;
Of the plurality of display media, the result of character recognition processing corresponding to a predetermined position common to each of the first display medium and the second display medium that are not continuously read is the first display medium. A third display medium that determines that the character recognition result corresponding to the predetermined position indicates the same identification information when the matching rate is equal to or higher than the matching rate, and is in a continuously read relationship among the plurality of display media And the character corresponding to the predetermined position when the result of the character recognition processing corresponding to the predetermined position common to each of the fourth display media is equal to or higher than the second matching rate lower than the first matching rate. And a determination unit that determines that the recognition results indicate the same identification information.