JP2018060474A - Place name extraction program, place name extraction device and place name extraction method - Google Patents

Place name extraction program, place name extraction device and place name extraction method Download PDF

Info

Publication number
JP2018060474A
JP2018060474A JP2016199447A JP2016199447A JP2018060474A JP 2018060474 A JP2018060474 A JP 2018060474A JP 2016199447 A JP2016199447 A JP 2016199447A JP 2016199447 A JP2016199447 A JP 2016199447A JP 2018060474 A JP2018060474 A JP 2018060474A
Authority
JP
Japan
Prior art keywords
place name
character string
character
address
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2016199447A
Other languages
Japanese (ja)
Other versions
JP6759955B2 (en
Inventor
美佐子 宗
Misako So
美佐子 宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to JP2016199447A priority Critical patent/JP6759955B2/en
Publication of JP2018060474A publication Critical patent/JP2018060474A/en
Application granted granted Critical
Publication of JP6759955B2 publication Critical patent/JP6759955B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

PROBLEM TO BE SOLVED: To extract a character string of an accurate place name from a character string including a notation of an incomplete place name.SOLUTION: A computer is permitted to execute a process of receiving a character string, referring to a storage storing place name character strings, and outputting a place name character string which contains a larger number of characters common to both characters contained in the character string and characters contained in the place name character strings, and which contains at least one character from the end of the place name character string as a place name included in the character string.SELECTED DRAWING: Figure 2

Description

本発明は、地名抽出プログラム、地名抽出装置および地名抽出方法に関する。   The present invention relates to a place name extraction program, a place name extraction apparatus, and a place name extraction method.

カメラやスキャナにより得られた画像に含まれる住所・所在地表記(以下では、上位の都道府県等が省略される場合を含め、丁目番地の手前までを地名部とし、丁目番地を含めた全体を住所・所在地表記とする)をテキストデータとして利用したいというニーズが存在する。例えば、雑誌に掲載された施設の記事における住所・所在地表記をユーザがスマートフォンのカメラで撮影すると、記事の中に記載された住所・所在地表記が抽出されて電子地図の該当位置に対応して登録されて表示されるといったアプリケーションが考えられる。同様に、車載カメラで撮影された街中の施設の看板に記載された住所・所在地表記が電子地図の該当位置に対応して登録されて表示されるといったアプリケーションも考えられる。   Address and address notation included in images obtained by cameras and scanners (In the following, including the case where the upper prefecture is omitted, the place name is the part before the chome address, and the whole address including the chome address is the address.・ There is a need to use as a text data. For example, when a user photographs the address / location notation in an article of a facility published in a magazine with a smartphone camera, the address / location notation described in the article is extracted and registered corresponding to the corresponding position on the electronic map An application such as being displayed can be considered. Similarly, there may be an application in which an address / location notation described on a signboard of a facility in the city photographed with an in-vehicle camera is registered and displayed corresponding to the corresponding position on the electronic map.

このような画像に含まれる住所・所在地表記は、上位の都道府県等の省略、前後の住所・所在地表記でない余分な文字列の存在、表記の揺れ等により不完全なものであることが多い。また、写真撮影による場合、影による文字の欠損や、ボケが含まれる場合もあり、それらに起因して文字の認識誤りが発生することもある。被写体に汚れがある場合も影によるのと同様に文字の欠損が生じる場合がある。   The address / location notation included in such an image is often incomplete due to omission of upper prefectures and the like, the presence of extra character strings that are not the preceding and following address / location notations, and shaking of the notation. Further, in the case of taking a picture, there may be a loss of characters or a blur due to a shadow, which may cause character recognition errors. When the subject is dirty, character loss may occur as in the case of the shadow.

図1(a)は、上位の都道府県等が省略された例(上位の2階層が省略)であり、雑誌や看板等では提供される地域が限定されているためによくあるケースである。図1(b)は、前後の住所・所在地表記でない余分な文字列の存在の例であり、記事の説明の一部や、「住所」を示す記号や駐車場を示す記号および収容台数等の記載が住所・所在地表記の前後に含まれている。図1(c)は、表記の揺れの例を示しており、発音上の「の」が入ったり省略されたり、「字(あざ)」が入ったり省略されたりすることで文字数が増減する場合がある。図1(d)は、写真撮影時の影により文字の欠損が生じる例を示している。図1(e)は、写真撮影時にフォーカスが不十分であったためにボケが生じ、一部の文字が誤認識(「桑」が「団」に誤認識)された例を示している。   FIG. 1A is an example in which upper prefectures and the like are omitted (the upper two hierarchies are omitted), and is often the case because magazines, signboards, and the like provide limited areas. Fig. 1 (b) is an example of the existence of extra character strings that are not address / location notation before and after, such as a part of the description of the article, the symbol indicating "address", the symbol indicating parking lot, the number of accommodations, etc. The description is included before and after the address / location notation. FIG. 1C shows an example of the shaking of the notation, where the number of characters increases or decreases due to the pronunciation “no” being entered or omitted, or “characters” being entered or omitted. There is. FIG. 1D shows an example in which character loss occurs due to a shadow at the time of taking a photograph. FIG. 1E shows an example in which the focus is insufficient at the time of taking a picture and blurring occurs, and some characters are erroneously recognized (“mulberry” is erroneously recognized as “group”).

このような要因から、文字認識された文字列は住所・所在地表記としては不完全なものであり、地図情報等と対応付けるためには正確な住所・所在地表記の文字列に修正する必要がある。   Because of these factors, the character string that has been recognized is incomplete as an address / location notation and must be corrected to an accurate address / location notation in order to be associated with map information or the like.

一方、売上げ伝票、配送伝票等に記入される住所の文字認識結果について、認識誤りを修正し、更に、部分的に省略された住所文字列を補う文字認識結果修正方式が開示されている(例えば、特許文献1等を参照)。しかし、「県」「市」「町」等の区切り文字に着目し、所定数の候補の中で可能な組み合わせの中から正解を特定するものであるため、区切り文字が欠損している場合や、住所文字列の前後に住所ではない文字列が存在する場合には、正しく修正できない場合がある。   On the other hand, there is disclosed a character recognition result correction method for correcting a recognition error for a character recognition result of an address entered in a sales slip, a delivery slip, etc., and further supplementing a partially omitted address character string (for example, See Patent Document 1). However, it focuses on delimiters such as “prefecture”, “city”, “town”, etc., and identifies the correct answer from among the possible combinations of a predetermined number of candidates. If there is a character string that is not an address before and after the address character string, it may not be corrected correctly.

特開平3−257693号公報Japanese Patent Laid-Open No. 3-257893

上述したように、従来の手法では、不完全な住所・所在地表記、主に不完全な地名の表記を含む文字列から正確な地名の文字列を抽出するのが困難であった。   As described above, with the conventional method, it is difficult to extract a character string of an accurate place name from a character string including an incomplete address / location notation, mainly an incomplete place name notation.

そこで、一側面では、本発明は、不完全な地名の表記を含む文字列から正確な地名の文字列を抽出することを目的とする。   Therefore, in one aspect, an object of the present invention is to extract an accurate place name character string from a character string including an incomplete place name notation.

一つの形態では、文字列を受け付け、地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する、処理をコンピュータに実行させる。   In one embodiment, a character string is received, referring to a storage unit that stores a place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string, and The computer executes a process of outputting a place name character string including at least one character from the end of the place name character string as the place name included in the character string.

不完全な地名を含む文字列から正確な地名の文字列を抽出することができる。   An accurate place name character string can be extracted from a character string including an incomplete place name.

不完全な地名の例を示す図である。It is a figure which shows the example of the incomplete place name. 住所・所在地表記抽出装置の機能構成例を示す図である。It is a figure which shows the function structural example of an address / location notation extraction apparatus. 住所・所在地表記抽出装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of an address / location notation extraction apparatus. 実施形態の処理例を示すフローチャートである。It is a flowchart which shows the process example of embodiment. 地名候補の絞り込みの処理例を示す図である。It is a figure which shows the example of a process of narrowing down a place name candidate. 地名情報の例を示す図である。It is a figure which shows the example of place name information. 照合コストの計算式の例を示す図である。It is a figure which shows the example of the calculation formula of collation cost. 地名区切り文字判定の例を示す図である。It is a figure which shows the example of a place name delimiter character determination. 認識結果文字列の地名部の置き換えの例を示す図である。It is a figure which shows the example of replacement of the place name part of a recognition result character string. 丁目番地区切り文字検出の例を示す図である。It is a figure which shows the example of a chome address delimiter character detection. 不要文字列削除の例を示す図である。It is a figure which shows the example of an unnecessary character string deletion.

以下、本発明の好適な実施形態につき説明する。   Hereinafter, preferred embodiments of the present invention will be described.

<構成>
図2は住所・所在地表記抽出装置(情報処理装置)1の機能構成例を示す図である。図2において、住所・所在地表記抽出装置1は、認識結果入力部101と地名候補絞り込み部102と地名照合部103と地名区切り文字判定部104と地名決定部105と地名修正部106とを備えている。また、住所・所在地表記抽出装置1は、丁目番地区切り文字検出部107と丁目番地修正・決定部108と住所・所在地表記出力部109とを備えている。更に、住所・所在地表記抽出装置1は、処理に際して参照する情報として、地名文字情報111と地名情報112と丁目番地文字情報113とを備えている。
<Configuration>
FIG. 2 is a diagram illustrating a functional configuration example of the address / location notation extraction device (information processing device) 1. In FIG. 2, the address / location notation extraction apparatus 1 includes a recognition result input unit 101, a place name candidate narrowing unit 102, a place name collation unit 103, a place name delimiter character determination unit 104, a place name determination unit 105, and a place name correction unit 106. Yes. The address / location notation extraction apparatus 1 includes a chome address delimiter detection unit 107, a chome address correction / determination unit 108, and an address / location notation output unit 109. Further, the address / location notation extraction device 1 includes place name character information 111, place name information 112, and chome street address character information 113 as information to be referred to in the process.

地名文字情報111は、住所・所在地表記の対象となる範囲(例えば、日本全国)に存在する地名文字列(丁目番地の手前までの文字列)と、その地名文字列に含まれる個々の文字(見出し文字)とを対応付けたものである。ある文字を指定することで、その文字を含む1以上の地名文字列を特定することができる。地名文字情報111の具体例については後述する。   The place name character information 111 includes a place name character string (a character string up to the front of the street address) existing in a range to be addressed / addressed (for example, all over Japan) and individual characters included in the place name character string ( Heading character). By designating a certain character, one or more place name character strings including the character can be specified. A specific example of the place name character information 111 will be described later.

地名情報112は、対象となる範囲に存在する地名文字列を集積したものである。地名情報112の具体例については後述する。   The place name information 112 is a collection of place name character strings existing in a target range. A specific example of the place name information 112 will be described later.

丁目番地文字情報113は、丁目番地(丁目または番地)に用いられる可能性のある文字と、該文字と誤認識(混同)しやすい文字等と、丁目番地の末尾にくる可能性の有無とを対応付けたものである。丁目番地文字情報113の具体例については後述する。   The chome address character information 113 includes a character that may be used for the chome address (chome or address), a character that is likely to be erroneously recognized (confused) with the character, and the presence / absence of the possibility of being at the end of the chome address. It is a correspondence. A specific example of the street address character information 113 will be described later.

認識結果入力部101は、住所・所在地表記を含む文字認識結果である文字列(認識結果文字列)のテキストデータを入力(受付)する機能を有している。例えば、ユーザが雑誌やパンフレット等の住所・所在地表記を含む部分をスマートフォンのカメラ等で撮影し、その撮影画像から文字認識された結果が認識結果文字列として入力される。   The recognition result input unit 101 has a function of inputting (accepting) text data of a character string (recognition result character string) that is a character recognition result including an address / location notation. For example, a user photographs a portion including an address / location notation such as a magazine or a pamphlet with a smartphone camera or the like, and a character recognition result from the photographed image is input as a recognition result character string.

地名候補絞り込み部102は、認識結果入力部101により入力した認識結果文字列に対し、地名文字情報111を参照して、その後の処理に用いる地名文字列の候補を絞り込む機能を有している。処理の詳細については後述する。   The place name candidate narrowing-down unit 102 has a function of referring to the place name character information 111 with respect to the recognition result character string input by the recognition result input unit 101 and narrowing down place name character string candidates used for subsequent processing. Details of the processing will be described later.

地名照合部103は、地名候補絞り込み部102により絞り込まれた地名文字列の候補と、認識結果入力部101により入力した認識結果文字列とを照合し、照合スコアまたは照合コストを計算する機能を有している。照合スコアは、文字の順序を含めて、認識結果文字列に含まれる文字と候補の地名文字列に含まれる文字とが共通する文字数の多さを示すものである。照合コストは、認識結果文字列と候補の地名文字列との両者を合致させるために文字の挿入・削除・置換等を行うのに要する労力の度合いを示すものである。照合スコアまたは照合コストの計算の具体例については後述する。   The place name collation unit 103 has a function of collating the place name character string candidates narrowed down by the place name candidate narrowing unit 102 with the recognition result character string inputted by the recognition result input unit 101 and calculating a collation score or a collation cost. doing. The collation score indicates the number of characters in common between the characters included in the recognition result character string and the characters included in the candidate place name character string, including the character order. The collation cost indicates the degree of labor required to insert / delete / replace characters in order to match both the recognition result character string and the candidate place name character string. A specific example of the calculation of the matching score or the matching cost will be described later.

地名区切り文字判定部104は、照合スコアの大きい順、または、照合コストの小さい順に、上位所定数の候補の地名文字列の地名区切り文字が認識結果文字列に含まれるか否かを判定する機能を有している。地名区切り文字は、地名部の末尾の文字(丁目番地を示す文字に切り替わる直前の文字)を含む1以上の文字であり、それらの文字のいずれかが認識結果文字列に含まれるか否かを判定する。地名部の末尾は省略される可能性が低いため、末尾付近の文字の一致をもって、対応する地名であると特定するようにしている。なお、地名部の末尾の文字だけとしなかったのは、その文字に対応する認識結果文字列における文字が欠落していたり誤認識されていたりする場合に対処するためである。   The place name delimiter character determination unit 104 determines whether or not the place name delimiter characters of the upper predetermined number of candidate place name character strings are included in the recognition result character string in descending order of collation score or in ascending order of collation cost. have. The place name delimiter character is one or more characters including the last character of the place name portion (the character immediately before switching to the character indicating the chome address), and whether or not any of these characters is included in the recognition result character string. judge. Since it is unlikely that the end of the place name portion is omitted, the place name is identified by matching the characters near the end. The reason why only the last character of the place name portion is not used is to deal with a case where a character in the recognition result character string corresponding to the character is missing or misrecognized.

地名決定部105は、地名文字列の地名区切り文字が認識結果文字列に含まれる地名文字列のうち、照合スコアが高いものを優先(照合コストの場合は低いものを優先)し、認識結果文字列に含まれる地名として決定する機能を有している。   The place name determination unit 105 gives priority to a place name character string in which the place name delimiter of the place name character string is included in the recognition result character string (in the case of the matching cost, the lower one is given priority), and the recognition result character It has a function to determine the place name included in the column.

地名修正部106は、認識結果文字列中の地名文字列の末端を特定し、認識結果文字列の先端から地名文字列の末端までを地名決定部105で決定された地名文字列で置き換えることで、認識結果文字列を修正する機能を有している。   The place name correcting unit 106 identifies the end of the place name character string in the recognition result character string, and replaces the end of the recognition result character string to the end of the place name character string with the place name character string determined by the place name determining unit 105. , Has a function of correcting the recognition result character string.

丁目番地区切り文字検出部107は、修正後の認識結果文字列における地名部の末端の後を丁目番地部と不要文字列部として、丁目番地部と不要文字列部の境界に対応する丁目番地区切り文字を丁目番地文字情報113を使って検出する機能を有している。   The chome address delimiter detection unit 107 uses the end of the place name portion after the end of the corrected recognition result character string as the chome address portion and the unnecessary character string portion, and corresponds to the boundary between the chome address portion and the unnecessary character string portion. It has a function of detecting characters using the chome address character information 113.

丁目番地修正・決定部108は、丁目番地区切り文字検出部107により検出された丁目番地区切り文字から丁目番地部を特定するとともに、丁目番地部より後の不要文字列部を認識結果文字列から削除する機能を有している。   The chome address correction / determination unit 108 specifies the chome address part from the chome address delimiter character detected by the chome address delimiter character detection unit 107, and deletes the unnecessary character string part after the chome address part from the recognition result character string. It has a function to do.

住所・所在地表記出力部109は、最終的に得られた修正済みの認識結果文字列を住所・所在地文字列として出力する機能を有している。   The address / location notation output unit 109 has a function of outputting the finally obtained corrected recognition result character string as an address / location character string.

図3は住所・所在地表記抽出装置1のハードウェア構成例を示す図である。図3において、住所・所在地表記抽出装置1は、システムバス1001に接続されたCPU(Central Processing Unit)1002、ROM(Read Only Memory)1003、RAM(Random Access Memory)1004、NVRAM(Non-Volatile Random Access Memory)1005を備えている。また、住所・所在地表記抽出装置1は、I/F(Interface)1006と、I/F1006に接続された、I/O(Input/Output Device)1007、HDD(Hard Disk Drive)/SSD(Solid State Drive)1008、NIC(Network Interface Card)1009とを備えている。また、住所・所在地表記抽出装置1は、I/O1007に接続されたモニタ1010、キーボード1011、マウス1012等を備えている。I/O1007にはCD/DVD(Compact Disk/Digital Versatile Disk)ドライブ等を接続することもできる。   FIG. 3 is a diagram illustrating a hardware configuration example of the address / location notation extraction device 1. In FIG. 3, an address / location notation extraction apparatus 1 includes a CPU (Central Processing Unit) 1002, a ROM (Read Only Memory) 1003, a RAM (Random Access Memory) 1004, an NVRAM (Non-Volatile Random) connected to a system bus 1001. Access Memory) 1005. The address / location notation extraction apparatus 1 includes an I / F (Interface) 1006, an I / O (Input / Output Device) 1007, an HDD (Hard Disk Drive) / SSD (Solid State) connected to the I / F 1006. Drive) 1008 and NIC (Network Interface Card) 1009. The address / location notation extraction apparatus 1 includes a monitor 1010, a keyboard 1011, a mouse 1012, and the like connected to the I / O 1007. A CD / DVD (Compact Disk / Digital Versatile Disk) drive or the like can be connected to the I / O 1007.

図2で説明した住所・所在地表記抽出装置1の機能は、CPU1002において所定のプログラムが実行されることで実現される。プログラムは、記録媒体を経由して取得されるものでもよいし、ネットワークを経由して取得されるものでもよいし、ROM組込でもよい。また、処理に際して参照・更新される情報は、一時的にはRAM1004に記憶され、永続的にはHDD/SSD1008やNVRAM1005に記憶される。   The functions of the address / location notation extracting apparatus 1 described with reference to FIG. 2 are realized by executing a predetermined program in the CPU 1002. The program may be acquired via a recording medium, may be acquired via a network, or may be embedded in a ROM. In addition, information that is referred to or updated during processing is temporarily stored in the RAM 1004 and permanently stored in the HDD / SSD 1008 or the NVRAM 1005.

<動作>
図4は上記の実施形態の処理例を示すフローチャートである。図4において、住所・所在地表記抽出装置1が処理を開始すると、認識結果入力部101は、住所・所在地表記を含む文字認識結果である文字列(認識結果文字列)のテキストデータを入力(受付)する(ステップS101)。
<Operation>
FIG. 4 is a flowchart showing a processing example of the above embodiment. In FIG. 4, when the address / location notation extracting apparatus 1 starts processing, the recognition result input unit 101 inputs text data of a character string (recognition result character string) that is a character recognition result including an address / location notation (acceptance). (Step S101).

次いで、地名候補絞り込み部102は、認識結果入力部101により入力した認識結果文字列に対し、地名文字情報111を参照して、その後の処理に用いる地名文字列の候補を絞り込む(ステップS102)。   Next, the place name candidate narrowing-down unit 102 refers to the place name character information 111 with respect to the recognition result character string input by the recognition result input unit 101, and narrows down the place name character string candidates used for the subsequent processing (step S102).

図5は地名候補絞り込み部102による地名候補の絞り込みの処理例を示す図である。ここでは、図5(a)の右側に示すような認識結果文字列が入力されたとすると、認識結果文字列に含まれる各文字について、地名文字情報111の見出し文字に存在するか否かを調べる。そして、見出し文字に存在する場合に、その見出し文字に関連付けられた地名文字列に1票を投票する。図示の例では、「大崎」の「大」、「菱田」の「菱」について、それぞれ投票を行っている様子を示している。投票数は地名文字情報111の各地名文字列と対応付けて一時的に記憶しておく。なお、図5(b)は地名文字情報111のデータ構造例を示しており、通番と、見出し文字の文字コードと、この見出し文字に関連付けられた地名文字列の個数と、関連付けられた地名文字列の地名番号(地名情報112の地名に対応)とが対応付けられている。投票数は、例えば、地名文字情報111の地名番号に対応付けて記憶する。   FIG. 5 is a diagram illustrating an example of a place name candidate narrowing process performed by the place name candidate narrowing unit 102. Here, if a recognition result character string as shown on the right side of FIG. 5A is input, it is checked whether or not each character included in the recognition result character string exists in the heading character of the place name character information 111. . If it exists in the heading character, one vote is voted for the place name character string associated with the heading character. In the illustrated example, “Osaki” “Large” and “Hishida” “Hishi” are each voting. The number of votes is temporarily stored in association with each place name character string in the place name character information 111. FIG. 5B shows an example of the data structure of the place name character information 111. The serial number, the character code of the heading character, the number of place name character strings associated with the heading character, and the associated place name character. The place name numbers in the columns (corresponding to the place names in the place name information 112) are associated with each other. The number of votes is stored in association with the place name number of the place name character information 111, for example.

投票の結果を、図5(c)に示すように、投票数の多い順にソートし、所定の閾値以下の地名文字列を足切することで、投票数が多い上位の地名文字列に絞り込みを行う。例えば、投票数の閾値を「2」として2以下を足切すると、地名候補数を約12万件からN=O(1000)〜O(10)に減らすことが可能である。「O()」はオーダを示している。   As shown in FIG. 5C, the voting results are sorted in descending order of the number of votes, and the place name character strings having a large number of votes are narrowed down by subtracting the place name character strings having a predetermined threshold value or less. Do. For example, if the threshold for the number of votes is set to “2”, and the number of place names is reduced to 2 or less, the number of place name candidates can be reduced from about 120,000 to N = O (1000) to O (10). “O ()” indicates an order.

図4に戻り、地名照合部103は、地名候補絞り込み部102により絞り込まれた候補の地名文字列を地名情報112から取得し、認識結果入力部101により入力した認識結果文字列と照合し、照合スコアまたは照合コストを計算する(ステップS103)。   Returning to FIG. 4, the place name collation unit 103 obtains candidate place name character strings narrowed down by the place name candidate narrowing unit 102 from the place name information 112 and collates them with the recognition result character strings input by the recognition result input unit 101. A score or a verification cost is calculated (step S103).

図6は地名情報112の例を示す図であり、通番と、都道府県番号と、文字数と、地名文字列とが対応付けられている。例えば、地名照合部103は地名候補絞り込み部102から絞り込まれた地名候補の通番を受け取り、その通番を指定することで地名情報112から地名文字列を取得することができる。   FIG. 6 is a diagram showing an example of the place name information 112, in which a serial number, a prefecture number, the number of characters, and a place name character string are associated with each other. For example, the place name collation unit 103 can receive the place number candidate serial number narrowed down from the place name candidate narrowing down part 102, and can acquire the place name character string from the place name information 112 by designating the serial number.

図7は照合コストの計算式の例を示す図であり、文字の挿入・削除・置換があっても対応付けられる、例えばDPマッチング(動的計画法)を照合に用い、その際に得られる編集距離Lを用いている。編集距離Lは、2つの文字列の相違度を表す量であり、片方の文字列から片方の文字列変換するときの、文字の挿入・削除・置換の必要最小手順に該当する。図示の式において、Cは照合コスト、nは地名情報112中の着目する地名文字列(文字列#1)の文字数、nは入力文字列(認識結果文字列)(文字列#2)の文字数、kは文字列#1と文字列#2で一致する文字数である。文字列の長さに照合コストCを依存させないため、照合コストCは編集距離Lを2文字列の文字数n、nで正規化している。また、同じ編集距離Lの場合は、一致する文字数の割合が大きい方が照合コストCが小さくなるようにしている。なお、図示の式は一例であり、種々に設計が可能である。照合スコアは、照合コストとは逆の傾向を示す値であり、一致する文字数の比率や文字の順序関係の一致の比率等に応じた値である。 FIG. 7 is a diagram illustrating an example of a calculation formula for collation cost, which is obtained even when there is insertion / deletion / replacement of characters, for example, DP matching (dynamic programming) is used for collation. The edit distance L is used. The edit distance L is an amount representing the degree of difference between two character strings, and corresponds to the minimum necessary procedure for character insertion / deletion / replacement when converting one character string to one character string. In the expression shown in the figure, C is a collation cost, n 1 is the number of characters of the place name character string of interest (character string # 1) in the place name information 112, and n 2 is an input character string (recognition result character string) (character string # 2). K is the number of characters that match in character string # 1 and character string # 2. Since the collation cost C does not depend on the length of the character string, the collation cost C normalizes the edit distance L with the number of characters n 1 and n 2 of the two character strings. In the case of the same editing distance L, the collation cost C is reduced as the ratio of the number of matching characters is larger. In addition, the expression shown in the drawing is an example, and various designs are possible. The matching score is a value indicating a tendency opposite to the matching cost, and is a value according to the ratio of the number of matching characters, the matching ratio of the order relation of characters, and the like.

図4に戻り、地名区切り文字判定部104は、照合スコアの大きい順、または、照合コストの小さい順に地名候補を並び替える(ステップS104)。そして、地名区切り文字判定部104は、上位M個の地名候補を選択し(ステップS105)、i番目の地名候補の地名区切り文字が認識結果文字列中にあるかチェックを行い(ステップS106)、ない場合(ステップS107のNo)は次の地名候補についてチェックを行う。   Returning to FIG. 4, the place name delimiter determination unit 104 rearranges place name candidates in descending order of collation score or in ascending order of collation cost (step S104). Then, the place name delimiter determination unit 104 selects the top M place name candidates (step S105), checks whether the place name delimiter of the i th place name candidate is in the recognition result character string (step S106), If not (No in step S107), the next place name candidate is checked.

図8(a)は地名候補を照合コストが小さい順に並び替えた例を示しており、順位「1」の地名文字列「鹿児島県曽於郡大崎町菱田」の地名区切り文字が末尾の2文字「菱」、「田」となっている。ここで、図8(b)に示すような認識結果文字列であった場合、順位「1」の地名文字列の地名区切り文字の「田」(「菱」についても一致するが、末尾に近い方を優先)が存在すると判定される。   FIG. 8A shows an example in which the place name candidates are rearranged in ascending order of collation cost. The place name delimiter of the place name character string “Osaki-cho Osaki-cho, Kagoshima-ken” with the last two characters “ “Hishi” and “Ta”. Here, in the case of the recognition result character string as shown in FIG. 8B, the place name delimiter “da” (“rhino”) in the place name character string of the rank “1” also matches, but is close to the end. Is prioritized).

図4に戻り、地名候補の地名区切り文字が認識結果文字列中にあると判断された場合(ステップS107のYes)、地名決定部105は、認識結果文字列に地名区切り文字が存在した地名文字列を地名として決定する(ステップS108)。   Returning to FIG. 4, when it is determined that the place name delimiter character of the place name candidate is in the recognition result character string (Yes in step S <b> 107), the place name determining unit 105 determines that the place name delimiter character exists in the recognition result character string. A column is determined as a place name (step S108).

次いで、地名修正部106は、認識結果文字列中の地名文字列の末端を特定し、認識結果文字列の先端から地名文字列の末端までを地名決定部105で決定された地名文字列で置き換えることで、認識結果文字列を修正する(ステップS109)。   Next, the place name correcting unit 106 identifies the end of the place name character string in the recognition result character string, and replaces the end of the recognition result character string to the end of the place name character string with the place name character string determined by the place name determining unit 105. Thus, the recognition result character string is corrected (step S109).

図9は認識結果文字列の地名部の置き換えの例を示す図である。図9(a)に示すように、認識結果文字列の先端から地名区切り文字と一致した文字「田」までを置き換え対象とし、この置き換え対象の部分を、決定した地名文字列に置き換える。図9(b)は置き換え後の認識結果文字列を示している。   FIG. 9 is a diagram illustrating an example of replacement of the place name portion of the recognition result character string. As shown in FIG. 9A, the characters from the leading end of the recognition result character string to the character “da” that matches the place name delimiter are set as replacement targets, and the part to be replaced is replaced with the determined place name character string. FIG. 9B shows the recognition result character string after replacement.

図4に戻り、丁目番地区切り文字検出部107は、修正後の認識結果文字列における地名部の末端より後を丁目番地部と不要文字列部として、丁目番地部と不要文字列部の境界に対応する丁目番地区切り文字を丁目番地文字情報113を使って検出する(ステップS110)。   Returning to FIG. 4, the chome address delimiter character detection unit 107 sets the chome address part and the unnecessary character string part as the boundary between the chome address part and the unnecessary character string part after the end of the place name part in the corrected recognition result character string. A corresponding chome address delimiter is detected using the chome address character information 113 (step S110).

図10(a)は丁目番地文字情報113の例を示しており、丁目番地として用いられる可能性のある文字と、その文字と誤認識(混同)しやすいコンフュージョン文字と、丁目番地の末尾にくる可能性とが対応付けられている。ある文字が丁目番地として用いられる可能性のある文字そのものではなくても、コンフュージョン文字に該当する場合は、丁目番地として用いられる可能性のある文字と同様に扱われる。なお、コンフュージョン文字に該当する場合、認識結果文字列における該当する文字は丁目番地として用いられる可能性のある文字に置換される。   FIG. 10A shows an example of the chome address character information 113. A character that may be used as a chome address, a confusion character that is likely to be erroneously recognized (confused) with the character, and the end of the chome address. Is associated with the possibility of coming. Even if a certain character is not a character that may be used as a chome address, if it falls under a confusion character, it is treated in the same manner as a character that can be used as a chome address. In addition, when it corresponds to a confusion character, the applicable character in a recognition result character string is substituted by the character which may be used as a chome address.

ここで、図10(b)に示すような認識結果文字列である場合、地名部の末尾の後に続く文字のうち、「3」「2」は丁目番地文字情報113に登録されており、丁目番地文字として適正(OK)であると判断される。しかし、それに続く「@」は丁目番地文字情報113に文字としてもコンフュージョン文字としても登録されておらず、不要文字列部の先頭と判断され、その直前の「2」が丁目番地区切り文字とされる。   Here, in the case of the recognition result character string as shown in FIG. 10B, among the characters following the end of the place name portion, “3” and “2” are registered in the chome address character information 113, and the chome It is determined that the address character is appropriate (OK). However, the following “@” is not registered as a character or a confusion character in the chome address character information 113 and is determined to be the head of the unnecessary character string portion, and “2” immediately before that is the chome address delimiter. Is done.

図4に戻り、丁目番地修正・決定部108は、丁目番地区切り文字検出部107により検出された丁目番地区切り文字から丁目番地部を特定するとともに、丁目番地部より後の不要文字列部を認識結果文字列から削除する(ステップS111)。図11(a)は不要文字列削除前の認識結果文字列を示し、図11(b)は不要文字列削除後の認識結果文字列を示している。   Returning to FIG. 4, the chome address correction / decision unit 108 identifies the chome address part from the chome address separator character detected by the chome address separator character detection unit 107 and recognizes an unnecessary character string part after the chome address part. It deletes from a result character string (step S111). FIG. 11A shows a recognition result character string before unnecessary character string deletion, and FIG. 11B shows a recognition result character string after unnecessary character string deletion.

図4に戻り、住所・所在地表記出力部109は、最終的に得られた修正済みの認識結果文字列を住所・所在地文字列として出力し(ステップS112)、処理を終了する。   Returning to FIG. 4, the address / location notation output unit 109 outputs the corrected recognition result character string finally obtained as an address / location character string (step S112), and ends the process.

<総括>
以上説明したように、本実施形態によれば、不完全な地名を含む文字列から正確な地名の文字列を抽出することができる。また、住所・所在地表記の全体についても正確な文字列を抽出することができる。
<Summary>
As described above, according to the present embodiment, an accurate place name character string can be extracted from a character string including an incomplete place name. In addition, an accurate character string can be extracted for the entire address / location notation.

以上、好適な実施の形態により説明した。ここでは特定の具体例を示して説明したが、特許請求の範囲に定義された広範な趣旨および範囲から逸脱することなく、これら具体例に様々な修正および変更を加えることができることは明らかである。すなわち、具体例の詳細および添付の図面により限定されるものと解釈してはならない。   In the above, it demonstrated by preferred embodiment. While specific embodiments have been illustrated and described herein, it will be apparent that various modifications and changes may be made thereto without departing from the broad spirit and scope as defined in the claims. . That is, it should not be construed as being limited by the details of the specific examples and the accompanying drawings.

以上の説明に関し、更に以下の項を開示する。
(付記1)
文字列を受け付け、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する、
処理をコンピュータに実行させることを特徴とする地名抽出プログラム。
(付記2)
前記文字列に含まれる地名の文字列を、出力する前記地名文字列に置換する、
ことを特徴とする付記1に記載の地名抽出プログラム。
(付記3)
前記文字列に含まれる地名以降の文字列の内、該文字列の先頭から丁目または番地として登録された文字以外の文字の手前の文字までを丁目または番地を示す文字として特定する、
ことを特徴とする付記1または2に記載の地名抽出プログラム。
(付記4)
丁目または番地として登録された文字は、丁目または番地として登録された文字と混同し易い文字を含み、
混同し易い文字については、丁目または番地として登録された文字に置換する、
ことを特徴とする付記3に記載の地名抽出プログラム。
(付記5)
前記丁目または番地として登録された文字以外の文字以降を削除する、
ことを特徴とする付記3または4に記載の地名抽出プログラム。
(付記6)
文字列を受け付けた直後に、地名文字列と該地名文字列に含まれる文字との対応付けを記憶した記憶部を参照して、前記文字列に含まれる文字に合致する文字を含む地名文字列に投票を行い、
投票数が多い上位所定数の地名文字列に、その後の処理に用いる地名文字列の候補を絞り込む、
ことを特徴とする付記1乃至5のいずれか一項に記載の地名抽出プログラム。
(付記7)
文字列を受け付ける受付部と、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する出力部と、
を備えたことを特徴とする地名抽出装置。
(付記8)
前記文字列に含まれる地名の文字列を、出力する前記地名文字列に置換する、
ことを特徴とする付記7に記載の地名抽出装置。
(付記9)
前記文字列に含まれる地名以降の文字列の内、該文字列の先頭から丁目または番地として登録された文字以外の文字の手前の文字までを丁目または番地を示す文字として特定する、
ことを特徴とする付記7または8に記載の地名抽出装置。
(付記10)
丁目または番地として登録された文字は、丁目または番地として登録された文字と混同し易い文字を含み、
混同し易い文字については、丁目または番地として登録された文字に置換する、
ことを特徴とする付記9に記載の地名抽出装置。
(付記11)
前記丁目または番地として登録された文字以外の文字以降を削除する、
ことを特徴とする付記9または10に記載の地名抽出装置。
(付記12)
文字列を受け付けた直後に、地名文字列と該地名文字列に含まれる文字との対応付けを記憶した記憶部を参照して、前記文字列に含まれる文字に合致する文字を含む地名文字列に投票を行い、
投票数が多い上位所定数の地名文字列に、その後の処理に用いる地名文字列の候補を絞り込む、
ことを特徴とする付記7乃至11のいずれか一項に記載の地名抽出装置。
(付記13)
文字列を受け付け、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する、
処理をコンピュータが実行することを特徴とする地名抽出方法。
(付記14)
前記文字列に含まれる地名の文字列を、出力する前記地名文字列に置換する、
ことを特徴とする付記13に記載の地名抽出方法。
(付記15)
前記文字列に含まれる地名以降の文字列の内、該文字列の先頭から丁目または番地として登録された文字以外の文字の手前の文字までを丁目または番地を示す文字として特定する、
ことを特徴とする付記13または14に記載の地名抽出方法。
(付記16)
丁目または番地として登録された文字は、丁目または番地として登録された文字と混同し易い文字を含み、
混同し易い文字については、丁目または番地として登録された文字に置換する、
ことを特徴とする付記15に記載の地名抽出方法。
(付記17)
前記丁目または番地として登録された文字以外の文字以降を削除する、
ことを特徴とする付記15または16に記載の地名抽出方法。
(付記18)
文字列を受け付けた直後に、地名文字列と該地名文字列に含まれる文字との対応付けを記憶した記憶部を参照して、前記文字列に含まれる文字に合致する文字を含む地名文字列に投票を行い、
投票数が多い上位所定数の地名文字列に、その後の処理に用いる地名文字列の候補を絞り込む、
ことを特徴とする付記13乃至17のいずれか一項に記載の地名抽出方法。
Regarding the above description, the following items are further disclosed.
(Appendix 1)
Accepts strings,
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string A place name character string in which the character is included in the character string is output as a place name included in the character string.
A place name extraction program that causes a computer to execute processing.
(Appendix 2)
The place name character string included in the character string is replaced with the place name character string to be output.
The place name extraction program according to attachment 1, wherein
(Appendix 3)
Among the character strings after the place name included in the character string, the character string from the beginning of the character string to the character before the character other than the character registered as chome or address is specified as a character indicating the chome or address.
The place name extraction program according to appendix 1 or 2, characterized by the above.
(Appendix 4)
Characters registered as a chome or street address include characters that are easily confused with characters registered as a street or street address,
For characters that are easily confused, replace them with characters registered as chome or street address.
The place name extraction program according to supplementary note 3, characterized by:
(Appendix 5)
Delete characters other than those registered as the chome or address,
The place name extraction program according to appendix 3 or 4, characterized by the above.
(Appendix 6)
Immediately after receiving the character string, referring to the storage unit storing the correspondence between the place name character string and the character included in the place name character string, the place name character string including the character that matches the character included in the character string Vote for
Narrow down the place name character string candidates to be used for subsequent processing to the predetermined number of place name character strings with the highest number of votes.
The place name extraction program as described in any one of the supplementary notes 1 thru | or 5 characterized by the above-mentioned.
(Appendix 7)
A reception unit that accepts a character string;
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string An output unit that outputs a place name character string in which the character is included in the character string as a place name included in the character string;
A place name extraction device characterized by comprising:
(Appendix 8)
The place name character string included in the character string is replaced with the place name character string to be output.
The place name extraction device according to appendix 7, characterized in that.
(Appendix 9)
Among the character strings after the place name included in the character string, the character string from the beginning of the character string to the character before the character other than the character registered as chome or address is specified as a character indicating the chome or address.
The place name extraction apparatus according to appendix 7 or 8, characterized in that.
(Appendix 10)
Characters registered as a chome or street address include characters that are easily confused with characters registered as a street or street address,
For characters that are easily confused, replace them with characters registered as chome or street address.
The place name extraction device according to appendix 9, characterized in that.
(Appendix 11)
Delete characters other than those registered as the chome or address,
The place name extraction apparatus according to appendix 9 or 10, characterized in that.
(Appendix 12)
Immediately after receiving the character string, referring to the storage unit storing the correspondence between the place name character string and the character included in the place name character string, the place name character string including the character that matches the character included in the character string Vote for
Narrow down the place name character string candidates to be used for subsequent processing to the predetermined number of place name character strings with the highest number of votes.
The place name extraction device according to any one of appendices 7 to 11, characterized in that:
(Appendix 13)
Accepts strings,
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string A place name character string in which the character is included in the character string is output as a place name included in the character string.
A place name extraction method, wherein a computer executes processing.
(Appendix 14)
The place name character string included in the character string is replaced with the place name character string to be output.
The place name extraction method according to supplementary note 13, characterized by:
(Appendix 15)
Among the character strings after the place name included in the character string, the character string from the beginning of the character string to the character before the character other than the character registered as chome or address is specified as a character indicating the chome or address.
15. The place name extraction method according to appendix 13 or 14, characterized in that:
(Appendix 16)
Characters registered as a chome or street address include characters that are easily confused with characters registered as a street or street address,
For characters that are easily confused, replace them with characters registered as chome or street address.
The place name extraction method according to supplementary note 15, characterized by:
(Appendix 17)
Delete characters other than those registered as the chome or address,
The place name extraction method according to supplementary note 15 or 16, characterized in that.
(Appendix 18)
Immediately after receiving the character string, referring to the storage unit storing the correspondence between the place name character string and the character included in the place name character string, the place name character string including the character that matches the character included in the character string Vote for
Narrow down the place name character string candidates to be used for subsequent processing to the predetermined number of place name character strings with the highest number of votes.
18. The place name extraction method according to any one of supplementary notes 13 to 17, characterized in that:

認識結果入力部101は受付部の一例である。住所・所在地表記出力部109は出力部の一例である。   The recognition result input unit 101 is an example of a reception unit. The address / location notation output unit 109 is an example of an output unit.

1 住所・所在地表記抽出装置
101 認識結果入力部
102 地名候補絞り込み部
103 地名照合部
104 地名区切り文字判定部
105 地名決定部
106 地名修正部
107 丁目番地区切り文字検出部
108 丁目番地修正・決定部
109 住所・所在地表記出力部
111 地名文字情報
112 地名情報
113 丁目番地文字情報
DESCRIPTION OF SYMBOLS 1 Address / address notation extraction apparatus 101 Recognition result input part 102 Place name candidate narrowing down part 103 Place name collation part 104 Place name delimiter character determination part 105 Place name determination part 106 Place name correction part 107 Street address delimiter character detection part 108 Order number correction / determination part 109 Address / location notation output section 111 Place name character information 112 Place name information 113 Chome street address character information

Claims (8)

文字列を受け付け、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する、
処理をコンピュータに実行させることを特徴とする地名抽出プログラム。
Accepts strings,
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string A place name character string in which the character is included in the character string is output as a place name included in the character string.
A place name extraction program that causes a computer to execute processing.
前記文字列に含まれる地名の文字列を、出力する前記地名文字列に置換する、
ことを特徴とする請求項1に記載の地名抽出プログラム。
The place name character string included in the character string is replaced with the place name character string to be output.
The place name extraction program according to claim 1 characterized by things.
前記文字列に含まれる地名以降の文字列の内、該文字列の先頭から丁目または番地として登録された文字以外の文字の手前の文字までを丁目または番地を示す文字として特定する、
ことを特徴とする請求項1または2に記載の地名抽出プログラム。
Among the character strings after the place name included in the character string, the character string from the beginning of the character string to the character before the character other than the character registered as chome or address is specified as a character indicating the chome or address.
The place name extraction program according to claim 1 or 2, characterized in that.
丁目または番地として登録された文字は、丁目または番地として登録された文字と混同し易い文字を含み、
混同し易い文字については、丁目または番地として登録された文字に置換する、
ことを特徴とする請求項3に記載の地名抽出プログラム。
Characters registered as a chome or street address include characters that are easily confused with characters registered as a street or street address,
For characters that are easily confused, replace them with characters registered as chome or street address.
The place name extraction program according to claim 3 characterized by things.
前記丁目または番地として登録された文字以外の文字以降を削除する、
ことを特徴とする請求項3または4に記載の地名抽出プログラム。
Delete characters other than those registered as the chome or address,
The place name extraction program according to claim 3 or 4, characterized in that.
文字列を受け付けた直後に、地名文字列と該地名文字列に含まれる文字との対応付けを記憶した記憶部を参照して、前記文字列に含まれる文字に合致する文字を含む地名文字列に投票を行い、
投票数が多い上位所定数の地名文字列に、その後の処理に用いる地名文字列の候補を絞り込む、
ことを特徴とする請求項1乃至5のいずれか一項に記載の地名抽出プログラム。
Immediately after receiving the character string, referring to the storage unit storing the correspondence between the place name character string and the character included in the place name character string, the place name character string including the character that matches the character included in the character string Vote for
Narrow down the place name character string candidates to be used for subsequent processing to the predetermined number of place name character strings with the highest number of votes.
The place name extraction program according to any one of claims 1 to 5, wherein
文字列を受け付ける受付部と、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する出力部と、
を備えたことを特徴とする地名抽出装置。
A reception unit that accepts a character string;
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string An output unit that outputs a place name character string in which the character is included in the character string as a place name included in the character string;
A place name extraction device characterized by comprising:
文字列を受け付け、
地名文字列を記憶する記憶部を参照して、前記文字列に含まれる文字と前記地名文字列に含まれる文字とが共通する文字数がより多く、且つ、前記地名文字列の末尾から少なくとも1以上の文字が前記文字列に含まれる地名文字列を、前記文字列に含まれる地名として出力する、
処理をコンピュータが実行することを特徴とする地名抽出方法。
Accepts strings,
Referring to the storage unit that stores the place name character string, the number of characters in common between the character included in the character string and the character included in the place name character string is larger, and at least one or more from the end of the place name character string A place name character string in which the character is included in the character string is output as a place name included in the character string.
A place name extraction method, wherein a computer executes processing.
JP2016199447A 2016-10-07 2016-10-07 Place name extraction program, place name extraction device and place name extraction method Active JP6759955B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016199447A JP6759955B2 (en) 2016-10-07 2016-10-07 Place name extraction program, place name extraction device and place name extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2016199447A JP6759955B2 (en) 2016-10-07 2016-10-07 Place name extraction program, place name extraction device and place name extraction method

Publications (2)

Publication Number Publication Date
JP2018060474A true JP2018060474A (en) 2018-04-12
JP6759955B2 JP6759955B2 (en) 2020-09-23

Family

ID=61908648

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2016199447A Active JP6759955B2 (en) 2016-10-07 2016-10-07 Place name extraction program, place name extraction device and place name extraction method

Country Status (1)

Country Link
JP (1) JP6759955B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021022261A (en) * 2019-07-30 2021-02-18 富士通フロンテック株式会社 Correction candidate determination device, correction candidate determination method and program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06325204A (en) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd Character recognition post processor
JPH07262320A (en) * 1994-03-18 1995-10-13 Matsushita Electric Ind Co Ltd Address recognition device
JPH1196308A (en) * 1997-09-19 1999-04-09 Toshiba Corp Character information reader and address reader
JP2004258950A (en) * 2003-02-26 2004-09-16 Canon Inc Character recognition method
JP2007042097A (en) * 2005-07-29 2007-02-15 Fujitsu Ltd Key character extraction program, key character extraction device, key character extraction method, collective place name recognition program, collective place name recognition device and collective place name recognition method
JP2014067302A (en) * 2012-09-26 2014-04-17 Buffalo Inc Image processing device and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06325204A (en) * 1993-05-14 1994-11-25 Sanyo Electric Co Ltd Character recognition post processor
JPH07262320A (en) * 1994-03-18 1995-10-13 Matsushita Electric Ind Co Ltd Address recognition device
JPH1196308A (en) * 1997-09-19 1999-04-09 Toshiba Corp Character information reader and address reader
JP2004258950A (en) * 2003-02-26 2004-09-16 Canon Inc Character recognition method
JP2007042097A (en) * 2005-07-29 2007-02-15 Fujitsu Ltd Key character extraction program, key character extraction device, key character extraction method, collective place name recognition program, collective place name recognition device and collective place name recognition method
JP2014067302A (en) * 2012-09-26 2014-04-17 Buffalo Inc Image processing device and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021022261A (en) * 2019-07-30 2021-02-18 富士通フロンテック株式会社 Correction candidate determination device, correction candidate determination method and program
JP7215975B2 (en) 2019-07-30 2023-01-31 富士通フロンテック株式会社 Correction candidate determination device, correction candidate determination method, and program

Also Published As

Publication number Publication date
JP6759955B2 (en) 2020-09-23

Similar Documents

Publication Publication Date Title
US8468167B2 (en) Automatic data validation and correction
WO2020218512A1 (en) Learning model generating device, character recognition device, learning model generating method, character recognition method, and program
JP6527410B2 (en) Character recognition device, character recognition method, and program
JP2008276766A (en) Form automatic filling method and device
US9613299B2 (en) Method of identifying pattern training need during verification of recognized text
US9286526B1 (en) Cohort-based learning from user edits
TW200407796A (en) Character recognition apparatus and method
US10438097B2 (en) Recognition device, recognition method, and computer program product
US10706581B2 (en) Image processing apparatus for clipping and sorting images from read image according to cards and control method therefor
JP4672692B2 (en) Word recognition system and word recognition program
JP6759955B2 (en) Place name extraction program, place name extraction device and place name extraction method
JP5669041B2 (en) Document processing apparatus and document processing method
JP7021496B2 (en) Information processing equipment and programs
JP2008282094A (en) Character recognition processing apparatus
CN114677689B (en) Text image recognition error correction method and electronic equipment
JP5169648B2 (en) Original image search device and original image search program
JP2019057115A (en) Ledger sheet information recognition device and ledger sheet information recognition method
JP6320089B2 (en) Recognition device, recognition method and program
JP5752073B2 (en) Data correction device
JP4677750B2 (en) Document attribute acquisition method and apparatus, and recording medium recording program
JP5188290B2 (en) Annotation apparatus, annotation method and program
JP2010237909A (en) Knowledge correction program, knowledge correcting device and knowledge correction method
JP7268316B2 (en) Information processing device and program
US20050213819A1 (en) Form recognition system, method, program, and storage medium
JP4261831B2 (en) Character recognition processing method, character recognition processing device, character recognition program

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20190709

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20200424

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20200512

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20200709

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20200804

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20200817

R150 Certificate of patent or registration of utility model

Ref document number: 6759955

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150